Evaluating Web Content Using the W3C Credibility Signals

The credibility and trustworthiness of online content has become a major societal issue as human communication and information exchange continues to evolve digitally. The prevalence of misinformation, circulated by fraudsters, trolls, political activists and state-sponsored actors, has motivated a heightened interest in automated content evaluation and curation tools. We present an automated credibility evaluation system to aid users in credibility assessments of web pages, focusing on the automated analysis of 23 mostly language- and content-related credibility signals of web content. We find that emotional characteristics, various morphological and syntactical properties of the language, and exclamation mark and all caps usage are particularly indicative of credibility. Less credible web pages have more emotional, shorter and less complex texts, and put a greater emphasis on the headline, which is longer, contains more all caps and is frequently clickbait. Our system achieves a 63\% accuracy in fake news classification, and a 28\% accuracy in predicting the credibility rating of web pages on a five-point Likert scale.

Speakers: