Generative visual AI in newsrooms Considerations related to production, presentation, and audience interpretation and impact

By T. J. Thomson and Ryan J. Thomas

Abstract: AI services that provide responses to prompts, such as ChatGPT, have ignited passionate discussions over the future of learning, work, and creativity. AI-enabled text-to-image generators, such as Midjourney, pose profound questions about the purpose, meaning, and value of images yet have received considerably less research attention, despite the implications they raise for both the production and consumption of images. This essay explores key considerations that journalists and news organizations should be aware of when conceiving, sourcing, presenting, or seeking to fact-check AI-generated images. Specifically, it addresses transparency around how algorithms work, discusses provenance and algorithmic bias, touches on labor ethics and the displacement of traditional lens-based workers, explores copyright implications, identifies the potential impacts on the accuracy and representativeness of the images audiences see in their news, and muses about the lack of regulation and policy development governing the use of AI-generated images in news. We explore these themes through the insights provided by eight photo editors or equivalent roles at leading news organizations in Australia and the United States. Overall, this study articulates some of the key issues facing journalists and their organizations in an age of AI and synthetic visual media.

An image of Pope Francis wearing a luxury fashion house’s puffer jacket (see Figure 1) went viral in March 2023. It was created using text-to-image generator Midjourney and posted on Reddit before being extensively posted and seen elsewhere online (Di Placido 2023). That same month, AI-generated images depicting former U.S. President Donald Trump being arrested also spread widely online (Devlin/Cheetham 2023). The rapid circulation of the images online and the extent to which they were treated as credible has raised concern about how online audiences can’t always discern truth from falsehood (Stokel-Walker 2023; Vincent 2023). The images also provide a useful entry point into a discussion about what journalists and newsrooms need to be aware of as generative visual AI becomes increasingly widespread.

Figure 1
Twitter Screenshot

Screenshot of a Tweet showing an AI-generated image (left) of Pope Francis wearing a luxury puffer jacket

In this essay, we examine relevant domains – production, presentation, and audience interpretation and impact – of generative visual AI and its implications for newsrooms, journalists, and their publics.

Our essay joins other recent work (see Becker 2023; Cools/Diakopoulos 2023) that examines newsroom policies (primarily in Europe and North America) in relation to AI. Those studies found that transparency, accountability, and responsibility are often mentioned in AI-focused editorial guidelines but that questions around legal compliance and algorithmic bias, for example, are less prominent. The present essay contributes to the literature by the addition of newsrooms in Australia as well as evaluating editor perspectives at different newsrooms in North America than Becker (2023) and Cools/Diakopoulos (2023) studied. Our essay also differentiates itself by its central focus on the visual aspects of generative AI rather than treating these as peripheral or ignoring them entirely; by expanding beyond questions of production to also consider the domains of presentation and audience interpretation and impact; and by exploring internal thinking on policy and practice rather than on only evaluating publicly available policies.

Considerations for the production domain

It is inexpensive and straightforward to harness the power of generative visual AI through online tools like Midjourney, DALL·E, and Nightcafe. All a user needs to do is imagine the scene they want to visualize and describe it through words so the underlying algorithm can return one or more results that it thinks match the provided description. This is called »prompting« or »prompt engineering« and the prompts can be simple, one-word labels (e.g., »girl« or »restaurant«) or lengthy descriptions that specify particular attributes of the scene and the equipment used to visualize it (e.g., »a 12-year-old girl sitting on a stool in an empty restaurant in Berlin, cinematic, 85 mm lens, f/1.8, accent lighting, global illumination, –ar 2:3«). In this second example, the user has provided more clarity about what they want to see (namely, a person of a certain gender with a specific age in a certain location and shown using a specific focal length with a specific aperture value). They have also specified a visual style (»cinematic«), lighting conditions (»global illumination«), and an aspect ratio (»2:3«), which is the width-to-height relationship of the image’s frame.

Potential problems emerge, however, due to the ways algorithms are developed and the available source material the algorithm draws on to generate images (Sun/Wei/Sun/Suh/Shen/Yang 2023). In the above example, for instance, while we have specified the person’s gender and age, we have not specified their ethnic background or ability status. The AI is left to fill in these gaps and, often, returns results that reinforce existing biases and stereotypes (Thomas/Thomson 2023), including those related to gender, age, ethnicity, ability, and location.

Because of the ease and cheapness of tools like Midjourney and DALL·E, a journalist or editor (or their potentially more budget-conscious business colleagues) might ask themselves if their newsroom can turn to AI to generate images rather than paying staff or freelancers to go out and photograph a scene. So-minded newsrooms could buy an annual subscription to a text-to-image generator like Midjourney for the cost of a single freelancer’s day rate. Indeed, the use of AI to create content is increasingly a problematic feature of written journalism. For example, the tech news site CNET was found to have errors in over half of the stories it had relied on AI to write (Sato/Roth 2023), while the newspaper publishing company Gannett was widely criticized for the turgid prose of its AI-written sports stories (Wu 2023).

Between March and July 2023, we interviewed photo editors or equivalent in newsrooms in Australia and the U.S.A. about how they regard and use generative visual AI in their newsrooms. We promised our participants anonymity so can’t disclose the names of the outlets they worked for. However, we can say that the eight brands in our sample were primarily large organisations (with an average of around 3,000 employees) and primarily reached national or international audiences rather than regional or local ones. Our rationale for studying the largest outlets with the biggest audiences was that these organizations are likely the most resourced and most likely to have the opportunity to develop guidelines related to generative visual AI. We hypothesized that smaller and less-resourced outlets would either lack policies entirely or would adopt or adapt those published by larger organizations or professional journalistic associations.

Most of the editors we spoke with said they only use generative visual AI for creative brainstorming or to illustrate stories specifically about generative visual AI. Some editors differentiated between using generative visual AI for news and for other »feature« or opinion content where photo illustrations and concept art was more common. These editors felt more comfortable with the idea of using generative visual AI for these latter tasks compared to using them in news stories. Most editors said they were concerned about the labor implications of generative visual AI and its potential to displace traditional, lens-based storytellers. They said they felt responsible to their industry to continue investing in the lens-based storytelling craft and to support lens-based workers even when colleagues in other departments or with different backgrounds might not appreciate the difference between AI-generated and traditional lens-based production methods or results.Another production-related consideration editors raised was copyright. Text-to-image generators work by training on vast and often copyrighted sets of imagery. The question arises of whether services like Midjourney are impinging the intellectual property rights of photographers, artists, and other visual communicators by learning from their images to make their own. This is a matter made more complex by how opaque most text-to-image generators are about where their training data come from and how their underlying algorithms work. It is also a matter currently before the courts in various jurisdictions (Brittain 2023). A notable exception is Adobe’s answer to generative visual AI, Firefly. Adobe claims its Firefly model is trained on its own Adobe Stock repository, openly licensed content, or public domain imagery, which reduces the legal risk of using the resulting generations for commercial use.

Considerations for the presentation domain

Journalists and editors enjoy more freedom to customize the presentation of elements on their own websites (though journalists we have interviewed bemoan that this is still often a time-intensive, expensive, and frustrating process). However, news publishers enjoy significantly less relative freedom when posting their content to social media platforms. They can control aspects like the number of images in a post and what the accompanying textual description says but aspects like the absolute size of posts, the color of post frames, and other features of the user interface are determined by the platform, leading to a relatively homogenous viewing experience (Sutcliffe 2016). The content from a respected news brand can and does appear next to the content from a stranger and the two posts can »look« relatively similar in terms of the basic elements being used. Verification methods and statuses exist on some platforms but are absent or only denote users who have paid for verification and meet certain criteria on others (Brandtzaeg/Lüders/Spangenberg/Rath-Wiggins/Følstad 2016).

The relative uniformity in the design of social media feeds can lead to issues with transparency when AI-generated images are used and outlets wish to inform their audiences of this fact. Editors we spoke to said news publishers are often left noting these details in the post description and hoping that the user will read that context. Yet, depending on the platform, text descriptions are often truncated and the user must click or tap on an »expand« or »more« button to read the full post, which can present challenges for deciding where to position relevant contextual information about an underlying image’s production circumstances. This was the case when American documentary photographer Michael Christopher Brown, known for his visual reportage for outlets like National Geographic and the New York Times, posted to Instagram in April 2023 a series of Midjourney-created images (Terranova 2023). He described the imagery as a »post-photography AI reporting illustration experiment« and later edited the caption to include »THIS IMAGERY IS NOT REAL« at the beginning but many commenters noted how they didn’t read the caption and were initially fooled about the images’ provenance.

A potential for watermarking exists to denote synthetic or partially synthetic content; however, no industry standard annotation exists and the potential for this annotation or symbol to be weaponized and used by nefarious actors to try to discredit non-synthetic content also exists. Some platforms add tags, labels, and notices to content with AI-generated elements, if platform employees think the posts in question have the potential to mislead. However, these additions are not automatic nor uniformly applied.

Considerations for the audience interpretation and impact domain

One of the chief considerations related to audience interpretation and impact is whether audiences will be misled by seeing AI-generated content. The potential for being misled should be discussed in concert with aspects like visual literacy, the viewing conditions of an audience and their typical behaviors, and how suitable traditional fact-checking practices are to AI-generated visual content.

Audiences’ visual and media literacies vary widely and are affected by attributes such as age, location, education, socioeconomic status, and ability (Notley/Chambers/Park/Dezuanni 2021). The editors we spoke to were, overall, pessimistic about audiences’ abilities to detect images produced by generative AI and thought that this detection was difficult even for visual experts. The difficulty in detecting unethical production or editing practices is not unique to generative visual AI, however, extending to photographs and other types of traditional visual media (Thomson et al. 2020).

Regarding audience viewing conditions, although exact figures will vary depending on the country under study, audiences in countries like the U.S.A. and Germany tend to consume social media content on mobile devices compared to desktops (Broadband Search 2023). This has implications for the size of the viewing window and of the nested content presented within on social media platforms. Newsrooms and fact-checking organizations that publish guides on »how to spot an AI-generated image« encourage audiences to look for irregularities in places like eyes, hands, and other inconsistencies where AI hasn’t preserved the internal logic of the image (Devlin/Cheetham 2023). Yet, considering viewing patterns that suggest a relatively low audience attention span, the small size of the content, and the number of posts being consumed in a sitting (Med­vedskaya 2022), detecting such details while casually scrolling through a social media feed becomes increasingly difficult. Rapid advances in AI technology also mean that these irregularities will become less frequent over time.

The potential for audiences to just see a headline or image and keep scrolling rather than clicking or tapping through also complicates the amount of context they are able to consume in a standard viewing environment (Fletcher/Nielsen 2018). Platforms will sometimes add a contextual note about potentially misleading content but this process is not automatic and is troubled by the scale of information online and the speed at which it is produced (Thomson/Angus/Dootson/Hurcome/Smith 2022). These factors are two of the persistent key threats to the work of those concerned with stemming the tide of mis/disinformation.

Fact-checking organizations sometimes suggest as an image verification strategy considering whether there are multiple angles of a purported scene or whether the fact-checker can request them from the source (Weikmann/Lecheler 2023). However, such techniques are frustrated by recent advances in generative visual AI. In late June 2023, Midjourney announced a new and highly discussed feature, »Zoom out,« which allows the user to generate variations of the same object, scene, or person from different focal lengths. This can lead to a perception of authenticity as some previous visual manipulations were one-offs rather than being part of a series of manipulations.

Beyond concerns about algorithmic bias and whether one’s audience is represented in resulting AI-created outputs, it is worthwhile to consider the effect of generic representations on how an audience perceives content and its quality. Scholars such as Thurlow, Aiello, and Portmann (2020) have investigated how stock photography is deployed in news contexts and the impact this can have on audiences. These scholars have argued that such generic visuals present a narrow, sometimes pessimistic, and almost always reductionist view of people, places, and issues. It is worthwhile considering the degree to which AI-generated visuals function in ways similar or distinct to stock photographs and whether audiences appreciate the differences between generic and more specific types of imagery.

Conclusion and next steps

Various moral panics have accompanied each wave of successive technologies from photography and moving images in the 1800s to aerial drone imagery and virtual reality in the 1900s (Thomson 2019). The same is true for generative visual AI and related techniques that have become far more accessible in the 2020s. We do observe considerable risks and challenges related to this technology but also reflect on the creative possibilities and potential this technology offers for ushering in the next generation of imaging practices. To manage the risks and guide the technology’s creative potential in responsible and ethical ways, we see a pressing need for news organizations to have clear guidelines governing their use. The editors we spoke to at leading outlets in Australia and the U.S. echoed this desire and are hungry for guidelines and policies that can inform how they can responsibly use generative visual AI technologies.

While some of the editors we spoke to could articulate principles that shaped whether and how they or their staff used generative visual AI, none of their newsrooms had formal policies that governed if or how this technology should be used. By June 2023, we were only aware of a single outlet, the technology oriented brand, Wired, with publicly facing guidance on how its staff should and should not use generative AI. Wired states its staff won’t use AI to write stories (unless the article is about AI generators and then it will disclose this and flag any errors) nor will it use AI to edit stories. It allows AI when writing headlines, generating short social media posts, for inspiration when generating story ideas, and for research or analysis. On the visual front, Wired states it does not »use AI-generated images instead of stock photography« but can use AI to spark ideas or for publishing AI-generated images or video but only when the generation involves »significant creative input by the [commissioned] artist and does not blatantly imitate existing work or infringe copyright« and only then with appropriate disclosure for how generative AI was used. The Guardian followed in July by publishing a policy on the use of generative AI (Ribbans 2023) and the Associated Press, as discussed more below, followed this with its own policy in August (Barrett 2023).

It bears noting that responsible use of AI is not an obligation of journalists and editors alone. Too often, scholars and critics assume newsroom personnel have more »allocative control« (Murdock 1982) over strategy and resource use than they in fact possess, raising the question of whether ethics codes are addressing the wrong audience or are even moot (see, e.g., Adam/Craft/Cohen 2004; Borden 2000; Craft 2010; McManus 1997). The above-mentioned examples of CNET and Gannett are troubling instances of AI use through management fiat. Therefore, the economic contexts of news production and the tension between journalism’s democratic ideals and the economic imperatives driving its owner-vulture class must always be at the forefront of discussions about technological adoption (Pickard 2019, 2020).

Overall, generative visual AI continues to evolve massively in the span of mere months with industry lagging to catch up and provide guidance. The Associated Press, for example, only issued guidance on using generative AI in August 2023, roughly one year after text-to-image generators like Midjourney entered the open beta phase (Barrett 2023). Many other outlets are either ignorant of generative visual AI entirely or are searching for guidance in this space. Reflecting on the various issues that exist in the production, presentation, and audience interpretation and impact domains – and situating these discussions in the concrete economics contexts of contemporary news – can help start or advance the conversation in developing guidelines for appropriate and ethical use of generative visual AI within newsrooms.

About the authors

Dr. T. J. Thomson is a senior lecturer in visual communication and digital media at RMIT University and holder of a three-year research fellowship from the Australian Research Council. T. J.’s research is united by its focus on visual communication. A majority of his research centers on the visual aspects of news and journalism and on the concerns and processes relevant to those who make, edit, and present visual news. He has broader interests in digital media, journalism studies, and visual culture and often focuses on under-represented identities, attributes, and environments in his research. T. J. is committed to not only studying visual communication phenomena but also working to increase the visibility, innovation, and quality of how research findings are presented, accessed, and understood. Contact: contact@tjthomson.com.

Dr. Ryan J. Thomas is an Associate Professor of Journalism and Media Production and Director of Graduate Studies in the Edward R. Murrow College of Communication at Washington State University, U.S.A. His research program addresses the intersection of journalism ethics and the sociology of news, focusing on journalism amid processes of change. Contact: ryan_thomas@wsu.edu

References

Adam, G. Stuart: Craft, Stephanie; Cohen, Elliott D. (2004): Three essays on journalism and virtue. In: Journal of Mass Media Ethics, 19(3–4), pp. 247-275. DOI: 10.1080/08900523.2004.9679691

Barrett, Amanda: Standards around generative AI. In: The Associated Press, 16 August 2023. https://blog.ap.org/standards-around-generative-ai (5 September 2023)

Borden, Sandra L. (2000): A model for evaluating journalist resistance to business constraints. In: Journal of Mass Media Ethics, 15(3), pp. 149-166. DOI: 10.1207/S15327728JMME1503-2

Becker, Kim Björn (2023): New game, new rules: An investigation into editorial guidelines for dealing with artificial intelligence in the newsroom. In: Journalistik/Journalism Research, 6(2), pp. 133-152. https://journalistik.online/en/paper-en/new-game-new-rules/

Brandtzaeg, Petter Bae; Lüders, Marika; Spangenberg, Jochen; Rath-Wiggins, Linda; Følstad, Asbjørn (2016): Emerging journalistic verification practices concerning social media. In: Journalism Practice, 10(3), pp. 323-342. DOI: 10.1080/17512786.2015.1020331

Brittain, Blake: AI companies ask U.S. court to dismiss artists’ copyright lawsuit. In: Reuters, 19 April 2023. https://www.reuters.com/legal/ai-companies-ask-us-court-dismiss-artists-copyright-lawsuit-2023-04-19/ (5 September 2023)

Broadband Search (2023): Mobile vs. desktop Internet usage (latest 2023 data). https://www.broadbandsearch.net/blog/mobile-desktop-internet-usage-statistics (5 September 2023)

Craft, Stephanie (2010): Press Freedom and Responsibility. In: Meyers, Christopher (ed.): Journalism Ethics: A Philosophical Approach. Oxford: Oxford University Press, pp. 39-51.

Cools, Hannes; Diakopoulos, Nicholas (2023): Towards Guidelines for Guidelines on the Use of Generative AI in Newsrooms. Medium. https://generative-ai-newsroom.com/towards-guidelines-for-guidelines-on-the-use-of-generative-ai-in-newsrooms-55b0c2c1d960

Devlin, Kayleen; Cheetham, Joshua: Fake Trump arrest photos: How to spot an AI-generated image. In: BBC, 24 March 2023. https://www.bbc.com/news/world-us-canada-65069316 (5 September 2023)

Di Placido, Dani: Why did »Balenciaga Pope« go viral? In: Forbes, 27 March 2023. https://www.forbes.com/sites/danidiplacido/2023/03/27/why-did-balenciaga-pope-go-viral/?sh=10ef5f124972 (5 September 2023)

Fletcher, Richard; Nielsen, Rasmus Kleis (2018): Are people incidentally exposed to news on social media? A comparative analysis. In: New Media & Society, 20(7), pp. 2450-2468. DOI: 10.1177/1461444817724170

McManus, John (1997): Who’s responsible for journalism? In: Journal of Mass Media Ethics, 12(1), pp. 5-17. DOI: 10.1207/s15327728jmme1201_1

Medvedskaya, Elena I. (2022): Features of the attention span in adult Internet users. In: RUDN Journal of Psychology & Pedagogics, 19(2), pp. 304-319. DOI: 10.22363/2313-1683-2022-19-2-304-319

Murdock, Graham (1982): Large Corporations and the Control of the Communications Industries. In: Gurevitch, Michael; Bennett, Tony; Curran, James; Woollacott, Janet (eds.): Culture, Society, and the Media. London: Methuen, pp. 118-150.

Notley, Tanya; Chambers, Simon; Park, Sora; Dezuanni, Michael (2021): Adult Media Literacy in Australia: Attitudes, Experiences, and Needs. Sydney: Western Sydney University; Brisbane: Queensland University of Technology; Canberra, University of Canberra.

Pickard, Victor (2019): Digital Journalism and Regulation: Ownership and Control. In: Eldridge, Scott A.; Franklin, Bob (eds.): The Routledge Handbook of Developments in Digital Journalism Studies. London: Routledge, pp. 211-222.

Pickard, Victor (2020): Democracy Without Journalism? Confronting the Misinformation Society. New York: Oxford University Press.

Ribbans, Elisabeth: The Guardian’s editorial code has been updated – here’s what to expect. In: The Guardian, 27 July 2023. https://www.theguardian.com/commentisfree/2023/jul/27/guardian-observer-editorial-code-update-journalism (6 September 2023)

Sato, Mia; Roth, Emma: CNET found errors in more than half of its AI-written stories. In: The Verge, 25 January 2023. https://www.theverge.com/2023/1/25/23571082/cnet-ai-written-stories-errors-corrections-red-ventures (6 September 2023)

Stokel-Walker, Chris (2023): The problem with an unusually fashionable Pope. In: New Scientist, 257(3432), p. 13. DOI: 10.1016/S0262-4079(23)00555-9

Sun, Luhang; Wei, Mian; Sun, Yibing; Suh, Yoo Ji; Shen, Liwei; Yang, Sijia (2023): Smiling women pitching down: Auditing representational and presentational gender biases in image generative AI. In: arXiv:2305.10566. DOI: 10.48550/arXiv.2305.10566

Sutcliffe, Alistair (2016): Designing for User Experience and Engagement. In: O’Brien, Heather; Cairns, Paul (eds.): Why Engagement Matters: Cross-Disciplinary Perspectives of User Engagement in Digital Media. Cham: Springer, pp. 105-126.

Terranova, Amber: How AI imagery is shaking photojournalism. In: Blind Magazine, 26 April 2023. https://www.blind-magazine.com/stories/how-ai-imagery-is-shaking-photojournalism/ (5 September 2023)

Thomas, Ryan J.; Thomson, T. J. (2023): What does a journalist look like? Visualizing journalistic roles through AI. In: Digital Journalism. Advance online publication. DOI: 10.1080/21670811.2023.2229883

Thomson, T. J. (2019) To See and Be Seen: The Environments, Interactions, and Identities Behind News Images. London: Rowman & Littlefield.

Thomson, T. J.; Angus, Daniel; Dootson, Paula; Hurcombe, Edward; Smith, Adam (2020): Visual mis/disinformation in journalism and public communications: Current verification practices, challenges, and future opportunities. In: Journalism Practice, 16(5), pp. 938-962. DOI: 10.1080/17512786.2020.1832139

Thurlow, Crispin; Aiello, Giorgia; Portmann, Lara (2020): Visualizing teens and technology: A social semiotic analysis of stock photography and news media imagery. In: New Media & Society, 22(3), pp. 528-549. DOI: 10.1177/1461444819867318

Vincent, James: The swagged-out Pope is an AI fake – and an early glimpse of a new reality. In: The Verge, 27 March 2023. https://www.theverge.com/2023/3/27/23657927/ai-pope-image-fake-midjourney-computer-generated-aesthetic (6 September 2023)

Weikmann, Teresa; Lecheler, Sophie (2023): Cutting through the hype: Understanding the implications of deepfakes for the fact-checking actor-network. In: Digital Journalism. Advance online publication. DOI: 10.1080/21670811.2023.2194665

Wu, Daniel: Gannett halts AI-written sports recaps after readers mocked the stories. In: Washington Post, 31 August 2023. https://www.washingtonpost.com/nation/2023/08/31/gannett-ai-written-stories-high-school-sports/ (6 September 2023)


About this article

Copyright

This article is distributed under Creative Commons Atrribution 4.0 International (CC BY 4.0). You are free to share and redistribute the material in any medium or format. The licensor cannot revoke these freedoms as long as you follow the license terms. You must however give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits. More Information under https://creativecommons.org/licenses/by/4.0/deed.en.

Citation

T. J. Thomson, Ryan J. Thomas: Generative visual AI in newsrooms. Considerations related to production, presentation, and audience interpretation and impact. In: Journalism Research, Vol. 6 (3_4), 2023, pp. 318-328. DOI: 10.1453/2569-152X-3_42023-13639-en

ISSN

2569-152X

DOI

https://doi.org/10.1453/2569-152X-3_42023-13639-en

First published online

December 2023