The CEO of Microsoft’s new AI division, Mustafa Suleyman told CNBC’s Andrew Ross that anything published on the internet becomes ‘freeware’ and can be copied to train AI models. Recently, There has been growing concern and investigation into the practices of generative AI companies, particularly regarding their use of YouTube videos and transcripts from independent creators to train AI models.
AI Using Vast Amounts of YouTube’s Content
The issue of YouTube content being used to train generative AI models has become a matter of concern, a complex issue related to consent, compensation, and the rights of creators. In July, the online publication 404 Media reported that the generative AI video company Runway had trained its models on thousands of videos without even getting a consent.
Many reports suggest that AI companies have been utilizing vast amounts of content from YouTube, including audio, visuals, and transcripts, to develop their proprietary models. Although the major tech companies have not openly admitted to this practice, it raises significant ethical, legal, and financial concerns. Many creators feel uneasy, and some even feel exploited. David Millette, a YouTuber David filed a lawsuit against chipmaker Nvidia, accusing the company of creating a video model by scraping YouTube content without obtaining the consent from him
In a report by Proof News revealed that subtitles from 1,73, 536 YouTube Videos from over 48,000 channels were used by tech companies including Nvidia, Apple, Anthropic, and Salesforce to train their models. Moreover, the popular videos of creators like Marques Brownlee, MrBeast, PewDiePie, etc., were also used to train AI models.
Granting YouTube a License
When a creator uploads a video on YouTube, they agree to the terms of service, giving YouTube a broad license to use the content. As per the terms YouTube can reproduce, distribute, and even create derivative works from the content. But they never mentioned that the content can also be used to train AI models.
“By providing Content to the Service, you grant to YouTube a worldwide, non-exclusive, royalty-free, transferable, sublicensable license to use that Content (including to reproduce, distribute, prepare derivative works, display and perform it). YouTube may only use that Content in connection with the Service and YouTube’s (and its successors’ and Affiliates) business, including for the purpose of promoting and redistributing part or all of the Service,” an excerpt from the Terms of Service as seen on YouTube at present.
Tech Giants Remark
In May, Mohan told Emily Chang in an interview, “When a creator uploads their hard work to our platform, they have certain expectations. One of those expectations is that the terms of service are going to be abided by. Our terms of service does allow for YouTube content, some YouTube content like the title of a video or the channel name or the creator’s name, to be scrapped because that’s how you enable the open web…But it does not allow for things like transcripts or video bits to be downloaded, and that is a clear violation of our terms of service.”
“I think that with respect to content that is already on the open web, the social contract of that content since the ’90s has been that it is fair use. Anyone can copy it, recreate with it, reproduce with it—that has been free, as you like,” Suleyman told CNBC in an interview
GIPHY App Key not set. Please check settings