A million bears walking on the streets of Hong Kong. A strawberry frog. A cat made out of spaghetti and meatballs.
These are just a few of the text descriptions that people have fed to cutting-edge artificial intelligence systems in recent weeks, which these systems — notably OpenAI’s DALL-E 2 and Google Research’s Imagen — can use to produce incredibly detailed, realistic-looking images.
The resulting pictures can be silly, strange, or even reminiscent of classic art, and they’re being shared widely (and sometimes breathlessly) on social media, including by influential figures in the tech community. DALL-E 2 (which is a newer version of a similar, less capable AI system OpenAI rolled out last year) can also edit existing images by adding or taking out objects
It’s not hard to imagine such on-demand image generation eventually serving as a powerful tool for making all kinds of creative content, whether it be art or ads; DALL-E 2 and a similar such system, Midjourney, have already been used to help create magazine covers. OpenAI and Google have pointed to a few ways the technology might be commercialized, such as for editing images or creating stock images.
Neither DALL-E 2 nor Imagen is currently available to the public. Yet they share an issue with many others that already are: they can also produce disturbing results that reflect the gender and cultural biases of the data on which they were trained — data that includes millions of images pulled from the internet.
The bias in these AI systems presents a serious issue, experts told CNN Business. The technology can perpetuate hurtful biases and stereotypes. They’re concerned that the open-ended nature of these systems — which makes them adept at generating all kinds of images from words — and their ability to automate image-making means they could automate bias on a massive scale. They also have the potential to be used for nefarious purposes, such as spreading disinformation.
“Until those harms can be prevented, we’re not really talking about systems that can be used out in the open, in the real world,” said Arthur Holland Michel, a senior fellow at Carnegie Council for Ethics in International Affairs who researches AI and surveillance technologies.
AI has become common in everyday life in the past few years but it’s only recently that the public has taken notice — both of how common it is, and how gender, racial, and other types of biases can creep into the technology. Facial-recognition systems in particular have been increasingly scrutinized for concerns about their accuracy and racial bias.
OpenAI and Google Research have acknowledged many of the issues and risks related to their AI systems in documentation and research, with both saying that the systems are prone to gender and racial bias and to depicting Western cultural stereotypes and gender stereotypes.
OpenAI, whose mission is to build so-called artificial general intelligence that benefits all people, included in an online document titled “Risks and limitations” pictures illustrating how text prompts can bring up these issues: A prompt for “nurse”, for instance, resulted in images that all appeared to show stethoscope-wearing females, while one for “CEO” showed images that all appeared to be men and nearly all of them were white.
Lama Ahmad, policy research program manager at OpenAI, said researchers are still learning how to even measure bias in AI, and that OpenAI can use what it learns to tweak its AI over time. Ahmad led OpenAI’s efforts to work with a group of outside experts earlier this year to better understand issues within DALL-E 2 and offer feedback so it can be improved.
Google declined a request for an interview from CNN Business. In its research paper introducing Imagen, the Google Brain team members behind it wrote that Imagen appears to encode “several social biases and stereotypes, including an overall bias towards generating images of people with lighter skin tones and a tendency for images portraying different professions to align with Western gender stereotypes.”
The contrast between the images these systems create and the thorny ethical issues is stark for Julie Carpenter, a research scientist and fellow in the Ethics and Emerging Sciences Group at California Polytechnic State University, San Luis Obispo.
“One of the things we have to do is we have to understand AI is very cool and it can do some things very well. And we should work with it as a partner,” Carpenter said. “But it’s an imperfect thing. It has its limitations. We have to adjust our expectations. It’s not what we see in the movies.”
Holland Michel is also concerned that no amount of safeguards can prevent such systems from being used maliciously, noting that deepfakes — a cutting-edge application of AI to create videos that purport to show someone doing or saying something they didn’t actually do or say — were initially harnessed to create faux pornography.
“It kind of follows that a system that is orders of magnitude more powerful than those early systems could be orders of magnitude more dangerous,” he said.
Hint of bias
Because Imagen and DALL-E 2 take in words and spit out images, they had to be trained with both types of data: pairs of images and related text captions. Google Research and OpenAI filtered harmful images such as pornography from their datasets before training their AI models, but given the large size of their datasets such efforts are unlikely catch all such content, nor render the AI systems unable to produce harmful results. In its Imagen paper, Google researchers pointed out that, despite filtering some data, they also used a massive dataset that is known to include porn, racist slurs, and “harmful social stereotypes.”
Filtering can also lead to other issues: Women tend to be represented more than men in sexual content, for instance, so filtering out sexual content also reduces the number of women in the dataset, said Ahmad.
And truly filtering these datasets for bad content is impossible, Carpenter said, since people are involved in decisions about how to label and delete content — and different people have different cultural beliefs.
“AI doesn’t understand that,” she said.
Some researchers are thinking about how it might be possible to reduce bias in these types of AI systems, but still use them to create impressive images. One possibility is using less, rather than more, data.
Alex Dimakis, a professor at the University of Texas at Austin, said one method involves starting with a small amount of data — for example, a photo of a cat — and cropping it, rotating it, creating a mirror image of it, and so on, to effectively turn one picture into many different images. (A graduate student Dimakis advises was a contributor to the Imagen research, but Dimakis himself was not involved in the system’s development, he said.)
“This solves some of the problems, but it doesn’t solve other problems,” Dimakis said. The trick on its own won’t make a dataset more diverse, but the smaller scale could let people working with it be more intentional about the images they’re including.
For now, OpenAI and Google Research are trying to keep the focus on cute pictures and away from images that may be disturbing or show humans.
There are no realistic-looking images of people in the vibrant sample images on either Imagen’s nor DALL-E 2’s online project page, and OpenAI says on its page that it used “advanced techniques to prevent photorealistic generations of real individuals’ faces, including those of public figures.” This safeguard could prevent users from getting image results for, say, a prompt that attempts to show a specific politician performing some kind of illicit activity.
OpenAI has provided access to DALL-E 2 to thousands of people who signed up to a waitlist since April. Participants must agree to an extensive content policy, which tells users to not try to make, upload, or share pictures “that are not G-rated or that could cause harm.” DALL-E 2 also uses filters to prevent it from generating a picture if a prompt or image upload violates OpenAI’s policies, and users can flag problematic results. In late June, OpenAI started allowing users to post photorealistic human faces created with DALL-E 2 to social media, but only after adding some safety features, such as preventing users from generating images containing public figures.
“Researchers, specifically, I think it’s really important to give them access,” Ahmad said. This is, in part, because OpenAI wants their help to study areas such as disinformation and bias.
Google Research, meanwhile, is not currently letting researchers outside the company access Imagen. It has taken requests on social media for prompts that people would like to see Imagen interpret, but as Mohammad Norouzi, a co-author on the Imagen paper, tweeted in May, it won’t show images “including people, graphic content, and sensitive material.”
Still, as Google Research noted in its Imagen paper, “Even when we focus generations away from people, our preliminary analysis indicates Imagen encodes a range of social and cultural biases when generating images of activities, events, and objects.”
A hint of this bias is evident in one of the images Google posted to its Imagen webpage, created from a prompt that reads: “A wall in a royal castle. There are two paintings on the wall. The one on the left a detailed oil painting of the royal raccoon king. The one on the right a detailed oil painting of the royal raccoon queen.”
The image is just that, with paintings of two crowned raccoons — one wearing what looks like a yellow dress, the other in a blue-and-gold jacket — in ornate gold frames. But as Holland Michel noted, the raccoons are sporting Western-style royal outfits, even though the prompt didn’t specify anything about how they should appear beyond looking “royal.”
Even such “subtle” manifestations of bias are dangerous, Holland Michel said.
“In not being flagrant, they’re really hard to catch,” he said.