Stuff blocks Open AI

Photo: Getty Images

New Zealand's biggest publisher of news this week joined big global names in blocking Open AI from using its content to power generative artificial intelligence tool Chat GPT. Stuff says it is being harvested without permission for AI products already turning out low-quality results. Mediawatch asks Stuff if keeping AI at arm's length is a good move.

This story was first published by RNZ

“It’s not that we're stealing your content. We want to just be a neutral librarian helping you find the right book - but not not having written in the book,” Google’s chief technology advocate Michael Jones told Mediawatch in 2012.

Back then, Google itself was only eight years old, but already news publishers were worried about how many people were finding their news through it.

Critics complained its online search had created a ‘walled garden’ around the online content it had indexed so well.

"I cannot imagine us saying; 'Get the news from Google - and we'll tell you what the news was'. It feels very awkward," said Jones at Project [R]evolution, a conference at AUT all about the growing impact of online digital technology and social media.

Jones died two years ago, shortly before the launch of the first generative AI applications. And among those which are creating a version of the news for users is Google's own AI service Bard.

Last month, the New York Times reported Google was testing an AI tool called Genesis, which uses AI technology to write news articles. Google reportedly pitched this to US news outlets as an aid for journalists, rather than a replacement for them.

Earlier this month at another Auckland University of Technology event - the AI + Communications Symposium - former journalist and PR strategist Catherine Arrow warned Google's walled garden could become something much more restricted.

"Search engines created a walled garden where we can pick and choose what they've decided are the best blooms. As we get into search generative experience (SGE), we find ourselves only shown the flowers that they decide that we can look at. There's a real danger there," she said.

AI services like Google's Bard and Microsoft Bing Chat and Open AI’s ChatGPT respond to simple prompts from users and then summarise information scraped from the internet - including news produced in the first place by publishers.

But at the AUT symposium this month, senior lecturer in journalism and media Dr Merja Myllilahti showed they were not that good at sourcing news from New Zealand. She found Google Bard most often linked to the wrong sources and even random stories.

But some makers of news are not unhappy if their news cannot be found by AI.

CNN, Reuters, the Washington Post, Bloomberg, The New York Times and The Guardian have all recently decided to prevent Open AI from harvesting their content - and other AI apps which punch through their online paywalls.

This week, New Zealand's biggest publisher of news - Stuff - joined them.

"Content produced by journalists here and around the world is the cornerstone of what makes these generative AI tools valuable to the user. Without it, the models would be left to train on a sea of dross, misinformation and unverified information on the internet," Stuff chief executive Laura Maxwell said in a statement.

"That will become information that has itself been already generated by AI. There is a risk the whole thing will end up eating itself," she said.

A pretty bleak prospect. But is keeping those generative AI apps at arm's length the right option?

AI tools can also be handy for gathering and publishing news digitally.

The BBC is using AI to adapt content quickly to take account of users locations within the UK.

Local subscriber service BusinessDesk, for example, already creates articles in seconds with ChatGPT from basic data supplied by the NZX.

"Articles which used to take us a minimum of 30 minutes now takes less than 30 seconds," BusinessDesk publisher Matt Martell told the AUT’s AI+ Communication Symposium.

Martell also said AI products had breached media copyright "so obviously" that their makers would have to come to an arrangement with news publishers in the end.

But Microsoft, the maker of the AI-powered Bing Chat, recently announced it might effectively indemnify its clients using its AI services from the risk of copyright infringement.

Last month, Stuff owner Sinead Boucher told an Asia Pacific summit of the International News Media Association (INMA) generative AI could become "degenerative" for the media and society.

"We have to make sure that it generates value for journalism. Because if we don't get it right, in this current wave of disruption, I think that wave is going to wash right over us," she told the INMA.

"The biggest issue is that these global tech companies have created models that can hoover up all of that content, all of their IP," Boucher told Mediawatch.

"That's great for the tech companies because it's high quality fact-checked work that goes into training those models. But it's terrible for the creators of that content, because the essence of their work has been extracted. And none of that value flows back at all, to the original creators," she said.

Google has crawled the web for years to index all the information they can for people to search. So what's the difference?

"There is no licensing of it, there's no permission given. We've heard the argument from some that it's already out there on the internet for people to search it and read it and do whatever they want. But (AI) technology allows this to happen in the blink of an eye on an industrial scale - and then commercial products being created out of that," Boucher said.

One of the world's biggest news agencies - AP - has already done a deal with Open AI for the right to use its world news, which costs a fortune to create and curate for its own media customers.

"We're creating something that has high value to these companies. Of course we want it to be licensed or paid for in some way," Boucher said.

“I think the news industry has learned - or needs to make sure it does learn - the lessons of the era of the rise of search and of social media. We didn't seek value for our content. And in a lot of cases we actually adapted our whole business model to try and serve what the platforms needed. We don't want to repeat those mistakes,” she said.

“We did not see early enough that the platforms would suck up all the value for that - and they didn’t tell us they were collecting all the data from our users to create their products.

"There's also control of other aspects of the digital news ecosystem and advertising technology. They've become the gateway to the internet, but they’re still a private enterprise themselves. They're not a public service.

"If these models don't have access to this journalism - and quality academic content - then they're training increasingly on AI-generated dross .The whole model probably starts to eat itself and it just becomes a sea of slime."

‘Pink-slime’ journalism is poor-quality or fake content purporting to be real news. Sometimes its intent is to misinform readers, but most often it is created to attract cheap but plentiful online 'programmatic' advertising placed by automated services.

"Nobody wants that to happen. But it's not our job to fix it for the tech companies. We just need fair value for the content we've created that will help them fix the models," Boucher told Mediawatch.

Would Stuff be taking this stance on generative AI if Boucher hadn’t just put up its first paywalls after 25 years?

"That's totally irrelevant. We're at the very early days of this technology. And I feel like we are having to make profound decisions about how to treat content and how to participate in it without possibly knowing how this is all going to develop," she said.

The Fair Digital News Bargaining Bill, recently introduced to Parliament by Broadcasting and Media Minister Willie Jackson, is designed to get the likes of Google and Facebook to pay news media for their news which circulates on their platforms.

If it becomes law, it may also apply to AI services which scan local news published online.

"That has been the intention of those behind the Bill, It refers to how news content is 'used or made available for use'. In Australia, the legislation was successful in addressing the bargaining power imbalance, which is what the legislation here aims to do," Boucher said.

But a change of government might kill the Bill.

The National Party’s spokesperson Melissa Lee has said she does not favour forcing companies to make deals - and even labelled it as "a tax".

"We support the intent of the Bill to provide that level playing field (so) we can have a commercial discussion. I hope the incoming government, whoever they are, get to learn about how things are working and the intent behind this," Boucher told Mediawatch.

"It is not as a tax on tech companies or a way of breaking the internet. It's simply a way of ... arranging deals that sort of give us some security over payment and use of our content now in different ways."

By Colin Peacock

RNZ