The Atlantic Did A Damning Report That AI Is ABSOLUTELY Already Being Trained On Thousands Of Movie Scripts

Avatar image for zombiepie
ZombiePie

9547

Forum Posts

94891

Wiki Points

0

Followers

Reviews: 3

User Lists: 19

#1 ZombiePie  Staff

Working on a tip from an anonymous screenwriter about ChatGPT and other AI chatbots and Large Language Models (i.e., LLMs) being trained on the works of screenwriters, The Atlantic conducted a major investigative effort. Its reporting staff wanted to uncover a possible behind the scenes effort by the biggest figures and companies in AI to train their tech on the works of others, possibly against their consent. Their tip told them that they had "seen generative AI reproduce close imitations of The Godfather and the 1980s TV show Alf" and been able to replicate the results several times in a row, though, they clarified that they could not prove if the AI programs they tried were trained on the source material in question. Today, The Atlantic relayed the results of their research (Paywall Warning) and the results are genuinely horrifying:

I can now say with absolute confidence that many AI systems have been trained on TV and film writers’ work. Not just on The Godfather and Alf, but on more than 53,000 other movies and 85,000 other TV episodes: Dialogue from all of it is included in an AI-training data set that has been used by Apple, Anthropic, Meta, Nvidia, Salesforce, Bloomberg, and other companies. I recently downloaded this data set, which I saw referenced in papers about the development of various large language models (or LLMs). It includes writing from every film nominated for Best Picture from 1950 to 2016, at least 616 episodes of The Simpsons, 170 episodes of Seinfeld, 45 episodes of Twin Peaks, and every episode of The Wire, The Sopranos, and Breaking Bad. It even includes prewritten “live” dialogue from Golden Globes and Academy Awards broadcasts.

The Atlantic also stated that 346 scripts from Ryan Murphy were also used to train AI and LLMs. So, in general, if a film or television show has won an award or been in award contention with the Oscars, Golden Globes, or Emmys, there's a high chance that its script and written dialogue has already been at least partially used to train AI systems.

Now, you may recall that in 2023 the entire entertainment industry was striking and one of the major negotiation points involved studios and the possibility of them using film, television, and multimedia scripts to train AI. Under the contract that writers and actors bargained in good faith to get, they got major amendments strictly prohibiting studios from using AI to write scripts or to edit scripts that have already been written by a writer. The contract also prevented studios from treating AI-generated content as "source material." Unfortunately, what appears to be the case here are tech companies like OpenAI, Apple, Microsoft, Meta, and others taking text from scripts and are operating freely because they are not film studios in and of themselves. The contract screenwriters and actors negotiated in 2023 will do nothing to stop these tech companies from operating as they have unless individuals wish to bring up their own lawsuits contending copyright violations. That has been the approach by newspaper companies which content that ChaptGPT and Google's AI systems summarizing their original reporting without paying for a license amounts to a copyright violation. It seems, especially if this report is verified, that we can expect lawsuits from The Screen Actor's Guild and The Writer's Guild against tech companies.

Avatar image for bigsocrates
bigsocrates

6948

Forum Posts

196

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

Is there any doubt that companies are using basically EVERYTHING in their data sets?

These companies have a very long history of lying, stealing, showing zero ethics. Remember when Facebook fabricated engagement data in the pivot to video and destroyed countless outlets and careers? These people absolutely do not care and they do not think they will be held accountable, and they probably won't. At least not to the point where it will make their behavior not worth it.

Avatar image for ben_h
Ben_H

4989

Forum Posts

1628

Wiki Points

0

Followers

Reviews: 1

User Lists: 5

Is there any doubt that companies are using basically EVERYTHING in their data sets?

Microsoft's AI head basically said as much in a public interview. He said that if it's available on the internet, they consider it fair game. He was forced to slightly walk back his comments a bit but you just know behind the scenes that's the attitude. It seemed like some in the AI industry were cautious for a short time but that's all been thrown to the wind and now they don't seem to care. They seem to be banking on those whose data they are stealing not being dedicated or funded enough to be able to take them to court over this stuff.

The entire AI industry is disgusting. It's like if you took every one of the worst traits of the last decade of the tech industry and condensed it into one sector. Anyone with ethics has been chased out and the people running a lot of the biggest companies and projects are the worst kind of true believers or poisoned business brains. They're boiling the planet, spreading misinformation (or usually outright lies), stealing rampantly from those who actually do the work, and destroying the world's information repository purely because they think they can create the next big thing. Of course they stole all of the written work from TV shows and movies. The acting assumption should always be that they already stole it and will ask for forgiveness if caught. That's just how most of the industry operates.

We're fast approaching the two year mark since ChatGPT went public and we still haven't seen a single actual killer app for AI beyond being a slightly useful resource for programmers and data analysts and kinda helpful for editing writing. Instead, when you look up anything of certain topics on Bing image search, you're now probably mostly going to get shitty AI images. If you try to Google something on Google, the top result has been replaced by a Gemini answer that's wrong more often than not (I've had so many blatantly wrong Gemini answers in the last few weeks. Holy shit. Do they not feel shame about how bad that thing is? It's bad enough that calling it a product beta feels inaccurate. It's like pre-alpha at best. This thing should not be unleashed on the public). Huge portions of the internet are useless now because they're being flooded with garbage, useless content that exists for no reason other than to boost a number.

I'm still just waiting for the Emperor's New Clothes moment to happen with AI. It'll happen eventually. Hopefully sooner rather than later before it does even more damage. It's clear we can't shame the people who are doing this stuff into cleaning up their act so we have to hope that the investor class flip out when they finally realize they threw away a trillion dollars on what has amounted to a bad parlour trick. Goldman Sachs already tried to warn investors about AI in the summer but that doesn't seem to have slowed anyone down.

Avatar image for mach_go_go_go
mach_go_go_go

578

Forum Posts

144

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

Previously I would be able to use Google Assistant to read the text from a webpage aloud to me (such as this one), say, when I'm driving or doing dishes. Now, that totally functional baked in command had been replaced on my phone with Gemini, completely tossing that functionality away in favor of a factory of uselessness.

"Read this page".

"I can't do that right now but you should be able to do this using the + button in the browser. "

(The + opens a new tab)

Avatar image for mellotronrules
mellotronrules

3672

Forum Posts

26

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

well, i for one am absolutely gobsmacked that corporate entities would rather ask forgiveness than permission. /s

it's going to feel really fucking stupid if we bring all these nuclear reactors online to power GPU farms just so they can charge us another subscription. but yes, bring on the paradigm shift.

-written from my driverless car