Using artificial intelligence (AI) to write content and news reports is nothing new at this stage. The Associated Press began publishing AI-generated financial reports as far back as 2014, and since then, outlets, including the Washington Post and Reuters, have developed their own AI writing technology.
Generally, it was first used to create templated copy, such as sports reports. The AI can simply grab data such as team and player names, times, dates, and scores from feeds, then augment it with natural language generation, adding color and flavor to turn it into a readable article.
Just a few short years ago, this technology was entirely proprietary and only available to the media corporations that could afford to buy and run it. Today, anyone can use AI to generate an article in seconds, and with just a little bit of technical knowledge, can set up a “content farm” designed to churn out and publish online content 24/7.
Just recently, an investigation by NewsGuard uncovered nearly 50 websites publishing content entirely created by generative AI. It describes the articles as "low quality" and "clickbait." Some appear to simply be designed to generate money by showing advertising and affiliate links to readers. Others may have a more nefarious purpose, such as spreading disinformation, conspiracy theories, or propaganda.
So, let’s take a look at some of the threats posed by this new breed of automated content farm and explore some of the steps that can be taken to protect ourselves from them.
Disinformation and Propaganda
Even without robots churning out content day and night, there’s a lot of bad information on the internet. Given the speed that which AI can generate articles, it’s likely that this is only going to increase. The real danger comes when this information is used to maliciously deceive or advance a false narrative. Conspiracy theories exploded during the global Covid-19 pandemic, causing confusion and alarm among an already-scared general public. We’ve also seen a huge increase in the emergence of “deepfakes” – convincing AI-generated images or videos of people saying or doing things that they never did. In combination, these tools can be used to deceive us by those who want to push a political or social agenda in ways that could potentially be very damaging.
Many of the websites highlighted by NewsGuard obfuscate their ownership as well as the details of those who have editorial control. This can make it difficult to determine when agendas might be in play, as well as to establish accountability for defamation, disseminating dangerous information, or malicious falsehoods.
Some of the content farms that have been identified so far appear to exist solely to rewrite and re-publish articles generated by mainstream media outlets, such as CNN. It also has to be noted that the training data it uses to learn how to create these copycat articles are often taken from copyrighted works created by writers and journalists.
This can make life difficult for those who rely on writing and content creation of all sorts, including artists and musicians, to make a living. This has already led to the creation of The Human Artistry Campaign, aimed at protecting the rights of human songwriters and musicians to protect their work from being plagiarized by AI. As already noted, many of these content farms are effectively anonymous, making it very difficult to find and take action against humans who are using AI to infringe copyright. As things stand, this can be considered a legal "grey area" as there is nothing to stop AI-created works that are “inspired” by human works, but society has yet to establish exactly how this will be tolerated in the long run.
The Spread of Clickbait
Many of the AI-farmed articles that have been found are clearly just there to put adverts in front of audiences. By telling the AI to include keywords, it’s hoped that the articles will rank highly on search engines and attract an audience. AI can be instructed to give the articles intriguing, shocking, or frightening headlines that will encourage users to click on them.
The danger here is that it makes it more difficult for us to get genuine, valuable information. Distributing advertisements via the internet obviously isn’t a crime – it funds a huge amount of the media we consume and the services we use online. But the speed and consistency with which AI content can be churned out create a risk that search results will be muddied and our ability to find real content will be diluted. It’s already far cheaper to create AI content than it is to create human content, and the output of these farms can be scaled almost infinitely at very little cost. This leads to a homogenization of content and makes it more difficult for us to get unique perspectives and valuable, in-depth investigative journalism.
The Consequences of Biased Data
Bias is an ever-present danger when working with AI. But when it is present in the training data used to power algorithms creating farmed content at scale, it could have particularly insidious consequences. An AI system is only as good as the data it’s trained on, and the old computing adage that “garbage in=garbage out” is magnified when applied to smart, thinking computers producing content at scale. This means that any bias contained in the training data will infect the generated content, perpetuating the misinformation or prejudice it creates.
For example, if a badly-constructed survey that forms part of an AI’s training data over-represents the views of one segment of society while minimizing or under-representing the views of another, the AI-generated content will reflect this same bias. This can be particularly harmful if those whose views are marginalized are vulnerable or a minority. We’ve already seen that operators of these content farms appear to take little oversight of their output, so it’s possible that dissemination of this type of biased, prejudiced, or harmful output could go unnoticed.
In the end, biased AI output is bad for society as it perpetuates inequality and creates division. Amplifying this by publishing it across thousands of articles churned out day and night is unlikely to lead anywhere good.
What Can Be Done?
No one would claim that there's never an agenda behind human-authored journalism or that human-powered media outlets never make mistakes. But most countries and jurisdictions have safeguards in place, such as guidelines that stipulate news reporting and opinions must be kept apart and laws regarding libel, slander, and editorial accountability.
Regulators and legislators need to ensure that these frameworks are still fit for purpose in an age where content can be created and distributed autonomously at a massive scale.
Additionally, the responsibility for mitigating harm clearly lies with the tech companies that create AI tools. They must take steps to ensure that they reduce the impact of bias as much as possible and build in systems that encompass accuracy, fact-checking, and recognition of copyright.
And as individuals, we need to take steps to protect ourselves, too. An important skill we all need in the age of AI is critical thinking. This simply means being able to evaluate the information we come across and make a judgment on its accuracy, truthfulness, and value, particularly if we aren’t sure whether it was created by a human or a machine. Education certainly has a part to play here, and an awareness that not everything we read may have been written with our best interests at heart should be instilled at a young age.
Altogether, addressing the dangers posed by large-scale, autonomous, and often anonymous content distributors is likely to require smart regulators, responsible businesses, and a well-informed general public. This will ensure we are able to continue to enjoy the benefits of ethical, responsible AI while mitigating the harm that can be done by those looking to make a quick buck or mislead us.