Fake news, defined by the New York Times as “a made-up story with an intention to deceive” 1, often for a secondary gain, is arguably one of the most serious challenges facing the news industry today. In a December Pew Research poll, 64% of US adults said that “made-up news” has caused a “great deal of confusion” about the facts of current events 2.
The goal of the Fake News Challenge is to explore how artificial intelligence technologies, particularly machine learning and natural language processing, might be leveraged to combat the fake news problem. We believe that these AI technologies hold promise for significantly automating parts of the procedure human fact checkers use today to determine if a story is real or a hoax.
Assessing the veracity of a news story is a complex and cumbersome task, even for trained experts 3. Fortunately, the process can be broken down into steps or stages. A helpful first step towards identifying fake news is to understand what other news organizations are saying about the topic. We believe automating this process, called Stance Detection, could serve as a useful building block in an AI-assisted fact-checking pipeline. So stage #1 of the Fake News Challenge (FNC-1) focuses on the task of Stance Detection.
Stance Detection involves estimating the relative perspective (or stance) of two pieces of text relative to a topic, claim or issue. The version of Stance Detection we have selected for FNC-1 extends the work of Ferreira & Vlachos 4. For FNC-1 we have chosen the task of estimating the stance of a body text from a news article relative to a headline. Specifically, the body text may agree, disagree, discuss or be unrelated to the headline.
“Robert Plant Ripped up $800M Led Zeppelin Reunion Contract”
Example snippets from body texts and correct classifications
“… Led Zeppelin’s Robert Plant turned down £500 MILLION to reform supergroup. …”
Correct classification: Agree
“… No, Robert Plant did not rip up an $800 million deal to get Led Zeppelin back together. …”
Correct classification: Disagree
“… Robert Plant reportedly tore up an $800 million Led Zeppelin reunion deal. …”
Correct classification: Discusses
“… Richard Branson’s Virgin Galactic is set to launch SpaceShipTwo today. …”
Correct classification: Unrelated
Data: The dataset and a brief description of the data is provided at the FNC-1 github.
Source: The data is derived from the Emergent Dataset created by Craig Silverman.
Teams will be evaluated based on a weighted, two-level scoring system:
Level 1: Classify headline and body text as related or unrelated 25% score weighting
Level 2: Classify related pairs as agrees, disagrees, or discusses 75% score weighting
Rationale: The related/unrelated classification task is expected to be much easier and is less relevant for detecting fake news, so it is given less weight in the evaluation metric. The Stance Detection task (classify as agrees, disagrees or discuss) is both more difficult and more relevant to fake news detection, so is to be given much more weight in the evaluation metric
Concretely, if a [headline, body text] pair in the test set has the target label unrelated, a team’s evaluation score will be incremented by 0.25 if it labels the pair as unrelated.
If the [headline, body text] test pair is related, a team’s score will be incremented by 0.25 if it labels the pair as any of the three classes: agrees, disagrees, or discusses.
The team’s evaluation score will so be incremented by an additional 0.75 for each related pair if gets the relationship right by labeling the pair with the single correct class: agrees, disagrees, or discusses.
A simple baseline using hand-coded features and a GradientBoosting classifier is available on Github.
The baseline implementation also includes code for pre-processing text, splitting data carefully to avoid bleeding of articles between training and test, k-fold cross validation, scorer, and most of the crud you will need to write to experiment with this data. The hand-crafted features include word/ngram overlap features, and indicator features for polarity and refutation.
With these features and a gradient boosting classifier, the baseline achieves a weighted accuracy score of 79.53% (as per the evaluation scheme described above) with a 10-fold cross validation. The baseline is for simply for your reference. You are welcome to use it any way you like it (or not).
The top 3 teams with an evaluation score greater than baseline performance on test data will receive cash prizes.
The exact amounts and the baseline performance criteria will be announced later.
December 1st, 2016
February 1st, 2017
March 1st, 2017
May 1st, 2017
June 1st, 2017 (12:01 AM GMT)
June 2nd, 2017 (11:59 PM GMT)
June 15th, 2017
There are several reasons Stance Detection makes for a good first task for the Fake News Challenge:
Our extensive discussions with journalists and fact checkers made it clear both how difficult “truth labeling” of claims really is, and how they’d rather have reliable semi-automated tool to help them in do their job better rather than fully-automated system whose performance will inevitably fall far short of 100% accuracy.
Truth labeling also poses several large technical / logistical challenge for a contest like the FNC:
Together these make the truth labeling task virtually impossible with existing AI / NLP. In fact, even people have trouble distinguishing fake news from real news.
There are two important ways the Stance Detection task is relevant for fake news.
From our discussions with real-life fact checkers, we realized that gathering the relevant background information about a claim or news story, including all sides of the issue, is a critical initial step in a human fact checker’s job. One goal of the Fake News Challenge is to push the state-of-the-art in assisting human fact checkers, by helping them quickly gather the information they need to make their assessment.
In particular, a good Stance Detection solution would allow a human fact checker to enter a claim or headline and instantly retrieve the top articles that agree, disagree or discuss the claim/headline in question. They could then look at the arguments for and against the claim, and use their human judgment and reasoning skills to assess the validity of the claim in question. Such a tool would enable human fact checkers to be fast and effective.
It should be possible to build a prototype post-facto “truth labeling” system from a “stance detection” system. Such a system would tentatively label a claim or story as true/false based on the stances taken by various news organizations on the topic, weighted by their credibility.
For example, if several high-credibility news outlets run stories that Disagree with a claim (e.g. “Denmark Stops Issuing Travel Visas to US Citizens”) the claim would be provisionally labeled as False. Alternatively, if a highly newsworthy claim (e.g. “British Prime Minister Resigns in Disgrace”) only appears in one very low-credibility news outlet, without any mention by high-credibility sources despite its newsworthiness, the claim would be provisionally labeled as False by such a truth labeling system.
In this way, the various stances (or lack of a stance) news organizations take on a claim, as determined by an automatic stance detection system, could be combined to tentatively label the claim as True or False. While crude, this type of fully-automated approach to truth labeling could serve as a starting point for human fact checkers, e.g. to prioritize which claims are worth further investigation.
Yes! Here are two recent papers on related stance detection tasks to the one we’re using for FNC-1:
There is also a very good whitepaper on the state-of-the-art in automated fact checking available from the UK fact-checking organization FullFact.org.
Participants are free to use any unlabeled data (as pretrained embeddings or as manifold regularization), but any kind of direct or indirect supervision is not allowed other than the labels Fake News Challenge provides.
We will be providing an evaluation script, but other than that there will be no autoscoring system or a leaderboard.
We are limiting the duration of the testing phase of FNC-1 to make it extremely difficult for teams to cheat by labeling the test set manually. Given the test set size, this would be very difficult to do in the two day window we are providing. We apologize to teams who cannot work with the timeline we’ve outlined.
No, not necessarily. Real world doesn’t make i.i.d. assumptions :)
Shortly after the uproar over fake news and its potential impact on the US elections, Dean Pomerleau proposed using artificial intelligence to address the problem as a casual bet / dare to his friends and colleagues in the machine learning community on Twitter. The initial idea was inspired by the fact AI-based filtering techniques has been quite effective at conquering email spam - a problem that seems on the surface to be quite similar to fake news. Why can’t we address fake news the same way?
Dean was certainly not the first to have this idea. He quickly learned from others who joined the effort to organize the FNC that much fundamental research in AI, ML and NLP has been happening in recent years. The convergence of this groundbreaking research and the widespread recognition that fake news is an important real-world problem resulted in an explosion of interest in our efforts by volunteers, teams and the technology press. The FNC has grown dramatically since that initial bet between friends, to the point where it now includes over 100 volunteers and 72 teams from around the world. While the details of the challenge have evolved from that initial (rather naive) wager, the goal has always remained the same - foster the use of AI, machine learning and natural language processing to help solve the fake news problem.
The answer depends on what kind of facts/statements you are talking about fact checking. Well defined, narrow-scoped statements like:
“US Unemployment went up during the Obama years”
could be fact checked (debunked) automatically now with a reasonably amount of additional research.
But a statement like:
“The Russians under Putin interfered with the US Presidential Election”
won’t be possible to fact check automatically until we’ve achieved human-level artificial intelligence capable of understanding subtle and complex human interactions, and conducting investigative journalism.
That’s why we’re focusing in round 1 of the Fake News Challenge (FNC-1) on the stance detection task that is tractable now (we think) and could serve as a useful tool for human fact checkers today if we had it.
A great source about the state-of-the-art in automated fact checking and what the future holds, is this 36-page white paper from FullFact.org.
In the eyes of some, ‘fake news’ means “whatever I don’t agree with.” This is not the definition adopted for the FNC. We’ve extensively investigated the various ways credible media experts have defined ‘fake news’ and have boiled it down to what they virtually all share in common. For the purposes of the FNC, we are defining fake news as follows:
Fake News: “A completely fabricated claim or story created with an intention to deceive, often for a secondary gain.”
The “secondary gain” is most often monetary (i.e. to capture clicks and/or ‘eyeballs’ to generate ad revenue), but sometimes the secondary gain may be political.
However several important distinctions need to be made when it comes to the definition of fake news.
First, claims made by newsworthy individuals, even demonstrably false claims, are by definition newsworthy and therefore not considered fake news for the FNC. This is opposed to fabricated claims about newsworthy individuals made by obscure sources seeking to make money and/or a political statement, which are considered fake news by our definition.
Second, our operative definition of fake news explicitly excludes humorous or satirical stories designed to entertain rather than deceive. The same goes from opinion pieces or editorials - they too are excluded from the category of fake news. To qualify for these exemptions, these types of stories must be clearly labeled as such in the story itself, and not, for example, buried somewhere else on the website where the story appears.
From a practical perspective, we guarantee none of the headlines or stories in the FNC-1 task will consist of recent controversial claims made by well-known individuals. Nor will they be humor, satire or OpEd pieces.
Fake News Challenge was conceived to inspire AI researchers and practitioners to work on fact-checking related problems. We are in touch with our journalist and fact-checker colleagues to understand what other problems they encounter in their day-to-day work and how that can inform FNC-2. Stay tuned for the next challenge. If you have suggestions, please stop by our Slack and leave a comment. We would love to hear from you!
Fifty of the 80 participants made submissions for FNC-1 by the end of June 2nd using a wide array of techniques. The teams got access to the (unlabeled) test data and were scored automatically using the Codalab submission platform. The scoring system produces a raw score based on the differentially weighted scoring metric. The relative score the raw score normalized by the maximum possible score on the test set.
|Rank||Team name||Affliation||Score||Relative Score|
|1||SOLAT in the SWEN||Talos Intelligence||9556.50||82.02|
|2||Athene (UKP Lab)||TU Darmstadt||9550.75||81.97|
|3||UCL Machine Reading||UCL||9521.50||81.72|
Congratulations to our top-3 teams! The top-3 teams will also get a cash prize of USD 1000, USD 600, and USD 400 respectively. In addition to the top-3 teams, we would also like to give a special shoutout to ranks 4 and 5 held by teams at UIUC and U. Arizona. For a complete leaderboard, visit the competition’s Codalab page.
Fake News Challenge is a grassroots effort of over 100 volunteers and 71 teams from academia and industry around the world. Our goals is to address the problem of fake news by organizing a competition to foster development of tools to help human fact checkers identify hoaxes and deliberate misinformation in news stories using machine learning, natural language processing and artificial intelligence.
Digital Products Manager
Hal Daumé III
NLP Researcher/Associate Professor
U. of Sheffield
First Draft News
Adjunct Faculty, CMU
FNC Volunteer Community
Challenge participants: Please headover to the Slack team to get your questions answered in a timely manner.
Not on FNC Slack? Click here for an invite.
For media and other inquiries: email@example.com
New York Times. “As Fake News Spreads Lies, More Readers Shrug at the Truth” ↩
Pew Research Center. “Many Americans Believe Fake News Is Sowing Confusion” ↩
Dhruv Ghulati, Co-Founder, Factmata. “Introducing Factmata—Artificial intelligence for automated fact-checking” ↩
William Ferreira and Andreas Vlachos, “Emergent: a novel data-set for stance classification” ↩