Google was hit with a wide-ranging lawsuit on Tuesday alleging the tech giant scraped data from millions of users without their consent and violated copyright laws in order to train and develop its artificial intelligence products.
The proposed class action suit against Google, its parent company Alphabet, and Google’s AI subsidiary DeepMind was filed in a federal court in California on Tuesday, and was brought by Clarkson Law Firm. The firm previously filed a similar suit against ChatGPT-maker OpenAI last month. (OpenAI did not previously respond to a request for comment on the suit.)
The complaint alleges that Google “has been secretly stealing everything ever created and shared on the internet by hundreds of millions of Americans” and using this data to train its AI products, such as its chatbot Bard. The complaint also claims Google has taken “virtually the entirety of our digital footprint,” including “creative and copywritten works” to build its AI products.
Halimah DeLaine Prado, Google’s general counsel, called the claims in the suit “baseless” in a statement to CNN. “We’ve been clear for years that we use data from public sources — like information published to the open web and public datasets — to train the AI models behind services like Google Translate, responsibly and in line with our AI Principles,” DeLaine Prado said.
“American law supports using public information to create new beneficial uses, and we look forward to refuting these baseless claims,” the statement added.
Alphabet and DeepMind did not immediately respond to a request for comment.
In response to an earlier Verge report on the update, the company said its policy “has long been transparent” about this practice and “this latest update simply clarifies that newer services like Bard are also included.”
The lawsuit comes as a new crop of AI tools have gained tremendous attention in recent months for their ability to generate written work and images in response to user prompts. The large language models underpinning this new technology are able to do this by training on vast troves of online data.
In the process, however, companies are also drawing mounting legal scrutiny over copyright issues from works swept up in these data se