Databricks and Shutterstock are trying to remove the copyright risk from AI image generation (exclusive)

Databricks and Shutterstock have developed a new text-to-image diffusion model similar to DALL-E and Stable Diffusion, but without the risk of copyright violations.



Most image generators are trained on images scraped from the web, but ImageAI, as it’s called, is trained only on Shutterstock images. That means there’s no risk of copyright infringement lawsuits resulting from the AI producing images closely derived from those of unpaid creators. 



Databricks head of generative AI Naveen Rao believes ImageAI may do for AI-generated images what Apple did for MP3s in the early 2000s. Sites like Napster made MP3s essentially free—not a great deal for the record labels and artists who owned the music. 



“Apple solved this by figuring out a way to pay creators and providing a platform that enables people to buy music seamlessly,” Rao says. “I think there’s something similar to that going on here, where the difference is now that we’re not just taking the images directly and selling them, we’re actually building models out of them that can produce content based upon that data–it’s like a derivative of the data.” 



While iTunes was consumer-focused, ImageAI is more geared toward businesses that use image generators and have a far greater motivation to actually pay for content . A company might turn to  AI-generated images for campaigns that span print, social media, display, and other media. Canceling such a campaign because of copyright fears could be very costly.



“I don’t want to get into legal waters by producing content that has Mickey Mouse in it by accident; I want to make sure everything that went into that is absolutely kosher in terms of permission,” Rao says. “All of that needs to be done upfront.”



The ImageAI model is hosted on Databricks servers, and enterprises can call on the image generator through their own apps using an API, the companies say. Databricks is known as a data cloud for large companies, but has in recent years become active in developing and hosting AI models, which are hosted and maintained in the same secure environment with the data. 



Databricks and Shutterstock will each sell access to ImageAI, Shutterstock chief enterprise officer Aimee Egan said, and both will use their own fee schedules. Shutterstock gets paid an undisclosed amount for every image generated. 



At launch, customers won’t be able to fine-tune the image model with their own brand imagery and style guides, but that’s very likely to be added in the near future, Egan said. 



Many companies are using AI image generators to quickly create custom marketing collateral, ad content, websites, and app content. Many use generative models that are trained on a combination of proprietary imagery and images scraped from the internet in large quantities. Creators whose images have been scraped from the web for training are just beginning to test their copyright cases in court.



Companies that train general AI models on the scraped images argue that since the model “transforms” the images into something new, they are legally protected by the fair-use clauses in U.S. copyright law. 



To mitigate the risk of legal exposure, AI model developers like OpenAI have been striking paid arrangements with certain content owners and publishers. 



For companies like Shutterstock, this type of content licensing has become a significant revenue generator. Shutterstock earned $104 million in 2023 from licensing its image and video catalogs to OpenAI, Meta, and others, Bloomberg reports. 

Top Articles