Working on a tip from an anonymous screenwriter about ChatGPT and other AI chatbots and Large Language Models (i.e., LLMs) being trained on the works of screenwriters, The Atlantic conducted a major investigative effort. Its reporting staff wanted to uncover a possible behind the scenes effort by the biggest figures and companies in AI to train their tech on the works of others, possibly against their consent. Their tip told them that they had "seen generative AI reproduce close imitations of The Godfather and the 1980s TV show Alf" and been able to replicate the results several times in a row, though, they clarified that they could not prove if the AI programs they tried were trained on the source material in question. Today, The Atlantic relayed the results of their research (Paywall Warning) and the results are genuinely horrifying:
I can now say with absolute confidence that many AI systems have been trained on TV and film writers’ work. Not just on The Godfather and Alf, but on more than 53,000 other movies and 85,000 other TV episodes: Dialogue from all of it is included in an AI-training data set that has been used by Apple, Anthropic, Meta, Nvidia, Salesforce, Bloomberg, and other companies. I recently downloaded this data set, which I saw referenced in papers about the development of various large language models (or LLMs). It includes writing from every film nominated for Best Picture from 1950 to 2016, at least 616 episodes of The Simpsons, 170 episodes of Seinfeld, 45 episodes of Twin Peaks, and every episode of The Wire, The Sopranos, and Breaking Bad. It even includes prewritten “live” dialogue from Golden Globes and Academy Awards broadcasts.
The Atlantic also stated that 346 scripts from Ryan Murphy were also used to train AI and LLMs. So, in general, if a film or television show has won an award or been in award contention with the Oscars, Golden Globes, or Emmys, there's a high chance that its script and written dialogue has already been at least partially used to train AI systems.
Now, you may recall that in 2023 the entire entertainment industry was striking and one of the major negotiation points involved studios and the possibility of them using film, television, and multimedia scripts to train AI. Under the contract that writers and actors bargained in good faith to get, they got major amendments strictly prohibiting studios from using AI to write scripts or to edit scripts that have already been written by a writer. The contract also prevented studios from treating AI-generated content as "source material." Unfortunately, what appears to be the case here are tech companies like OpenAI, Apple, Microsoft, Meta, and others taking text from scripts and are operating freely because they are not film studios in and of themselves. The contract screenwriters and actors negotiated in 2023 will do nothing to stop these tech companies from operating as they have unless individuals wish to bring up their own lawsuits contending copyright violations. That has been the approach by newspaper companies which content that ChaptGPT and Google's AI systems summarizing their original reporting without paying for a license amounts to a copyright violation. It seems, especially if this report is verified, that we can expect lawsuits from The Screen Actor's Guild and The Writer's Guild against tech companies.
Log in to comment