OpenAI’s Use of YouTube Videos for Training Sora Could Violate Platform Rules

YouTube’s CEO, Neal Mohan, has stated that if OpenAI used YouTube videos to train its video creation tool, Sora, it would be a clear violation of the platform’s terms of service. This statement was made in response to public debates over the materials OpenAI uses to train AI models for popular content creation products like ChatGPT and DALL-E.

Generative AI tools like Sora work by absorbing various content from the web and using this data as a foundation to generate new content, including videos, photos, narrative text, and more. As companies like OpenAI, Google, and others race to develop more powerful artificial intelligence, they are looking to source as much content as possible to train their AI models for better-quality results.

Mohan emphasized that when a creator uploads their work to YouTube, they have certain expectations, one of which is that the terms of service will be adhered to. The terms do not allow for things like transcripts or video bits to be downloaded, which would be a clear violation if OpenAI did use YouTube videos to train Sora.

OpenAI’s Chief Technology Officer, Mira Murati, stated in an interview that she wasn’t sure whether Sora was trained on user-generated videos from platforms like YouTube, Facebook, and Instagram. It was also reported that OpenAI has discussed training its next-generation large language model, GPT-5, on transcriptions of public YouTube videos.

Mohan further stated that Google adheres to YouTube’s individual contracts with creators before deciding whether to use videos from the platform in training the company’s own powerful AI model, Gemini. He mentioned that many creators have different sorts of licensing contracts for their content on the platform, and Google ensures that using the videos as training data for Google’s AI is in line with the terms of service or the contract that the creator has signed beforehand.

NIMBUS27

read more > www.bloomberg.com