The Hidden Race in Big Tech to Acquire AI Training Data

In the early 2000s, Photobucket was the world’s leading image-hosting site, boasting 70 million users and accounting for nearly half of the U.S. online photo market. Today, it has only 2 million users, but the rise of generative AI may breathe new life into it. Photobucket’s CEO, Ted Leonard, revealed that he is negotiating with several tech companies to license Photobucket’s 13 billion photos and videos for training generative AI models. The rates discussed range from 5 cents to $1 per photo and more than $1 per video, with prices varying widely by buyer and content type.

Photobucket could be sitting on billions of dollars’ worth of content and provides a glimpse into a burgeoning data market driven by the rush to dominate generative AI technology. Tech giants like Google, Meta, and Microsoft-backed OpenAI initially used vast amounts of data scraped from the internet for free to train generative AI models like ChatGPT. They claim this practice is both legal and ethical, despite facing lawsuits from numerous copyright holders.

Simultaneously, these tech companies are quietly paying for content locked behind paywalls and login screens, leading to a hidden trade in everything from chat logs to long-forgotten personal photos from obsolete social media apps. This trend indicates a rush to secure copyright holders with private collections of content that cannot be scraped.

NIMBUS27

read more > www.reuters.com