This talk explores some of my favorite use cases for artificial intelligence/machine learning in journalism and touches on what’s coming next. This post contains the video and a full transcript of the talk.
Presented on October 28, 2019 at TrondheimDC.
Good morning Trondheim! Thank you all for joining me so early in the day. My name is Carolyn and I’m a frontend developer based in Germany and I’m also a Mozilla TechSpeaker. I’m currently working at a tiny machine learning startup called Meeshkan as an open source engineer.
Now you may be thinking - perfect! She works at a machine learning startup and this talk is about that too… well, I guess this is a good time for a disclaimer.
At Meeshkan, I’m building and maintaining an open-source library called Unmock that helps you fuzz test REST API calls. It’s cool in its own way and if you’re into testing then I’d love to talk more later… but I’m still a newbie when it comes to machine learning and actually building algorithms. Which means that this talk won’t go deep into technical implementations - to be honest, it won’t be very technical at all. I should also probably warn that this talk is 30 minutes and not 45 like the schedule says.
So if you want to leave, no hard feelings - but if you stick around I can guarantee you this: This talk will be part love letter, part cautionary tale and if you haven’t thought about this topic before, it’ll change the way you approach and consume news.
Anyway, let’s get back on track! Today I’m here to talk to you about the effect that artificial intelligence ("AI") and machine learning has had on modern journalism and reporting. And I’m particularly interested this topic because before getting into tech I actually used to be a journalist.
Transitioning from print journalism to software development made me realize all of the possibilities that technology could unlock - especially in how we think about our news cycle. I gravitated towards AI and machine learning because what I found was just really, really fascinating.
It’s important to understand though that, even before AI and machine learning became a modern trend, data has always played critical role in journalism.
Catherine Gicheru, an International Center for Journalists Knight Fellow, put it well. She said that, "Data can be used to provide deeper insights into what is happening around us and how it might affect us... Combined with traditional reporting techniques, data can help you tell stories in more compelling and innovative ways and give citizens actionable information."
She also mentioned that, "Data can help journalists speak truth to power and challenge misinformation... Data means there’s less guesswork about what the facts are."
This is a large part of why AI and machine learning practices are slowly being explored, assessed and introduced in newsrooms. This technology can absorb huge data sets, analyse that information and identify trends. Then also make decisions on what to report and put those trends into context.
Many journalists, however, fear this technology. And understandably so, they don’t want their jobs to become obsolete. But instead of fearing it, I believe that journalists should be learning and understanding it. And it’s not just me.
According to Maria Ronderos, the Director of Open Society Foundation’s Program on Independent Journalism, these technologies can actually empower journalists. It enables them to become more thorough, data-driven and allow them to better report on the increasingly globalised and information-rich world we live in.
She says, "Intelligent machines can turbo-power journalists' reporting, creativity and ability to engage audiences... Following predictable data patterns and programmed to 'learn' variations in these patterns over time, an algorithm can help reporters arrange, sort and produce content at a speed never thought possible."
Human journalists will continue to be necessary. They are the ones who, at least in theory, have a specific goal in mind and should be asking relevant questions about the data. Plus most journalists in the world do not have access to a team of programmers and data scientists, but that’s a different issue.
But the thing is, through software and natural language processing (a common subfield of AI), computers could take over tasks that - while important - seem pretty mundane to humans. Like analysing these massive data sets, fact-checking, organizing tips (story ideas solicited from the public), making rough cuts of videos and most of the other tedious tasks that I used to do when I was an unpaid intern at my local paper.
To put this all into context, I want to show some of my favorite use cases for how this technology is being implemented in bigger newsrooms. I’ve tried to sort them a bit by the type of technology, but it’s pretty much just a giant list.
Let’s start with bots. The Washington Post, for example, has a robot reporting program called Heliograf. In its first year, it produced around 850 articles and earned The Post an award for its “Excellence in Use of Bots” from its work on the 2016 US election coverage. Heliograf covers topics ranging from The Rio Olympics, where it created around 300 short reports and alerts. It covers local high school football games and continues to dabble in Election Day reporting for congressional and other races. Along with reporting, Heliograf can write its own tweets for topics it reports on and also alerts reporters when it detects trends in finance and big data.
Forbes took a slightly different approach. Last year, Forbes launched a new site that was powered by a content management system they named Bertie - after their founder B.C. Forbes. As an AI publishing platform, Bertie is designed specifically for their in-house newsroom and partners. As you can see in this demo, Bertie recommends ways to make headlines more compelling, suggests relevant imagery to accompany stories, assesses the reading complexity, provides short tweetable content summaries and provides real-time trending hashtags and topics to cover.
The LA Times also does a lot of compelling work with AI. For instance, Quakebot. Back in 2014, a huge earthquake hit the city and the LA Times was the first to report on it. Basically because the earthquake happened, the reporter woke up, went to his computer and published the article in 3 minutes because it was already there waiting.
The way it works is that whenever an alert comes in from the U.S. Geological Survey about an earthquake above a certain size threshold, Quakebot is programmed to extract the relevant data from the report and plug it into a pre-written template. The story goes into their content management system, where it’ll be reviewed and published by a human editor. While the code isn’t open source, the creator of Quakebot wrote this gist about how it works (including code samples).
The LA Times also has a robot that collects information on every homicide committed in Los Angeles. In its reports, it includes tons of data like the victim’s gender and race, cause of death, officer involvement, neighborhood and year of death.
The last of the bots comes from The Guardian Australia. They have an automated system called ReporterMate, which published it’s first article earlier this year. It works basically the same as the others but what makes it special is that it's open-source!
But bots aren’t the only thing that AI can do for journalism. It can also dissect data sets, predict trends and automate tasks. We’ve seen this with Propublica and their analysis of what various members of the US congress talk about. They did this by taking thousands of press releases over the course of two years. Then they trained a computer model to extract what phrases each Congress member uses most frequently.
BuzzFeed also trained a computer but for a very different purpose. They focused on finding and tracking secret spy planes. The computer used a machine-learning algorithm sift for planes with flight patterns that resembled those operated by the FBI and the Department of Homeland Security. This technology allowed them to report on how the US Marshals hunted drug cartel kingpins in Mexico, how a military contractor that tracks terrorists in Africa is also flying over US cities and topics around aerial surveillance in general.
On the other hand, the New York Times uses these technologies for moderating comments on their articles. Back in 2017, only 10% of the Times’ articles were open to comments. Even with that small percentage, a desk of moderators had to examine around 11,000 comments each day. So they’re turning to AI to help automate that.
They’re using a tool called Perspective (from Google parent Alphabet’s tech incubator) to evaluate the comments at scale. The Perspective API uses machine learning models to score the perceived impact a comment might have on a conversation - with the first model identifying whether a comment could be perceived as “toxic” to a conversation (aka harmful or abusive). Previously they also implemented The Coral Project’s Talk tool to tackle these toxic comments. This utilizes the Perspective API to create a fully customizable moderation UI tool specifically built for a newsroom use case.
Both the New York Times and Wired Magazine use London-based Trint as their transcription service. Trint’s big selling point is that they use voice recognition to transcribe interviews in multiple languages.
And something in between and maybe the one that I’m the most excited about is a concept from Al Jazeera. Last year at Al Jazeera’s Future of Media Leaders’ Summit there were discussions around robot reporters deploying from drones in war zones. The idea is that the drone can fly into an area that is considered dangerous ground. The drone then deploys a robot that works for Al Jazeera.
In this example, they show the robot evaluating if it’s a bad situation and dodging an attack from a sniper. They note that human reporters aren’t typically trained for these type of environments, not to mention the ethics of sending a civilian into these hostile spaces.
But these are only a few examples. Many other larger news services like Bloomberg, Reuters, Associated Press, Yahoo and more are utilizing these technologies as well.
There are also organizations whose work is dedicated to merging these two fields, media and technology, and who are focusing almost entirely on artificial intelligence in journalism. To name a few...
Funded by a grant from Knight Foundation, Quartz AI Studio helps journalists to use machine learning in their reporting. What I like about their work is that they focus on making these practices more accessible to smaller media organizations - think local and regional papers with limited staff or individual freelance journalists.
There are a couple of areas where Google focused on this topic. I’ll highlight two. First is Facets. It’s a machine learning data visualisation tool from Google People + AI Research. It’s open-source, so you can play with data and create a visualisation of the information being presented.
And Google’s largest effort is focused in the Google News Initiative and more specifically Journalism AI, that they launched at the end of 2018. This initiative is in partnership with Polis (the international journalism think-tank at London School of Economics and Political Science). It aims to help the news industry use AI in more innovative ways through research and training.
Not always AI-centric, but Mozilla has also always pushed for a better relationship between tech and the news industry. They have a history of partnering with the Knight Foundation, New York Times, Washington Post and other global news organizations to drive open innovation in news.
They also have specific projects like OpenNews, which connects a network of developers, designers, journalists and editors to collaborate on open technologies and processes within journalism. This was incubated at Mozilla initially. There’s the Mozilla Information Trust Initiative (MITI), a collection of comprehensive efforts to keep the Internet credible and healthy - like developing products, research, communities and partnerships to fight misinformation online. And every year at Mozilla Festival, there are full days worth of programming dedicated to journalism tech.
So everything I just told you about is cool and great, sure… but I don’t know about you, and maybe it’s because I used to be a journalist, but it makes question the level of responsibility we should expect from the human journalists who these technologies are assisting.
Every journalism student I know was required to read Bill Kovatch and Tom Rosenstiel’s The Elements of Journalism. In this book, they claim that, "Journalism should serve as an independent monitor of power" and "offer a voice to the voiceless." They argue that a journalist's first obligation is to the truth because when citizens have reliable access to information they can trust, they make better decisions.
However, there’s a disconnect between these principles and reality when technology is introduced. Which is interesting because the internet and journalism actually overlap on many of these foundational elements. But for this collaboration to be effective - there must be a measure of accountability. Which at the moment, and particularly in regards to AI and machine learning, there isn’t.
Earlier in this talk, I mentioned how transitioning from print journalism to software development made me realize just how much technology can change how we think about our news cycle. The way our society consumes news is rapidly changing, but one thing remains the same: All journalism is biased.
However, you used to be able to pick up a newspaper and know what you were getting. The Guardian, left-leaning. The Daily Mail, right-leaning. You were aware of these biases and chose your preferred paper despite that. Today, people rarely read a physical newspaper. Instead, they rely on tech products like social media or news aggregators. But many aren’t tech-savvy enough to decipher how these products are choosing the news they see. On top of that, there’s a lack of communication between the tech and media industries.
As gatekeepers to the news, I believe that the journalists creating the content should be at the heart of this collaboration. And we should be able to rely on them to make ethical decisions around data.
Traditionally in journalism, there are many layers to ethical decision making. There are long-held industry values for assessing whether a choice is ethical. Making the decision is guided by strictly set editorial standards. Then later, these decisions are enforced by a code of ethics - usually internal or maybe one like the Society of Professional Journalists Code of Ethics.
With machine learning and AI creeping into the news cycle, these ethical foundations require a revisit. As of now, that I know of, there is no publicly available code of conduct that contains principles on the ethical use of these technologies in journalism. There are also limited resources that enable journalists to become proficient at machine learning processes or even just how to ask questions about the information collected by data models.
Storyteller and technologist Latoya Peterson once wrote, "Journalists need to develop a fluency in AI before it disrupts both our newsrooms and our society." She says that, to develop fluency, the journalist needs to have a solid understanding of the infrastructure that makes artificial intelligence work - both the datasets that feed the systems and knowledge of how that data is being collected, used and potentially compromised, influencing the results. They should be asking questions like “What information is in the training data for this AI model?” So much of machine learning and artificial intelligence is about the framing. If you ask better questions and set better parameters, you receive a better result.
Latoya also mentions a need to understand the larger implications of biased systems. Because when journalists don’t understand the basics of how AI works, they are prone to missing the larger picture or over sensationalising a story. She argues that one doesn’t need to become a programmer or gain proficiency in a programming language to report on AI or use AI tooling. She says, rather, that just looking at how developers approach solving problems will greatly aid the understanding of how these systems are built and designed.
As far as educational programs in this area, there are university degrees that are incorporating both computer science and journalism - for example, at Coloumbia University in New York City.
Google News Initiative also has a training center with over 40 lessons designed specifically for journalists on Google products and tools. But even this is pretty limited and many journalists either don’t have access to this education or don’t know that it exists.
Then there’s the ethical question of whether or not it’s on the journalist to expose their data source and the algorithm they use. Which I’m a fan of because I’m into open source, but again - not always an available option.
Before we accidentally spiral down an ethics rabbit hole, trust me it’s easy to do, I want to touch on another question… What does artificial intelligence mean for the future of journalists?
In the true spirit of technology, the future is totally open. Many of the initiatives I mentioned earlier are also tackling this issue more head-on.
In November 2018, it was decided that 60% of the Mozilla Foundation’s internet health efforts will focus on what they call “better machine decision making” with some of those efforts overlapping into the media field.
Journalism AI from Google News Initiative plans to publish a global survey before the end of 2019 about how the media is currently using - and could further benefit from - these technologies. In July, they released some of the initial findings in a blog post on Polis that I’ve linked here.
But one convincing statistic I’ll leave you with is that the Associated Press estimates that AI helps to free up about 20% of reporters’ time - giving them more time to concentrate on story-telling rather than fact-checking and research. This percentage is only going to increase and I choose to be (cautiously) optimistic that this will enable us to craft higher-quality journalism.
And very quickly before I get off the stage, just want to say: Think twice when reading the news. Because it’s good to be aware that you might be reading a story generated from a bot or algorithm.
Also if you’re in a position to use AI and ML to convey a message, make sure that you ask questions and that you understand where that data is coming from and how it’s being prioritized.
And finally, watch out for more cool things in this space coming soon.
Did you find this helpful or useful? If yes, please consider buying me a coffee so I can continue to give talks like this 🙌