机器学习cnn_CNN和Fox新闻真的有偏见吗? 机器学习研究。

机器学习cnn_CNN和Fox新闻真的有偏见吗? 机器学习研究。机器学习cnnAccordingtostatistics,theanswerisno,andhere’swhy.据统计,答案是否定的,这就是原因

机器学习cnn

According to statistics, the answer is no, and here’s why.

据统计,答案是否定的,这就是原因。

About a year ago, I was scrolling through Twitter and reading some replies to a CNN tweet. As you would expect, it was full of people referring to the media conglomerate as “fake news” and pointing out details in the way CNN chose to frame that specific story. And to some extent, you really can’t dismiss their claims. Perhaps Fox News, CNN, and the “mainstream media” don’t purposefully publish outright fake information, but that isn’t a great standard to hold them at. When millions of Americans vote based on what they read online, the media is partially responsible for popular representation in our democracy. When exaggeratory clickbait drives the headlines of even the most prestigious news organizations (e.g. the New York Times), we are right to tweet back at them.

大约一年前,我在Twitter上滚动并阅读了对CNN推文的一些回复。 如您所料,到处都是人们称媒体集团为“假新闻”,并指出了CNN选择构筑特定故事的方式的细节。 在某种程度上,您真的不能否认他们的主张。 也许Fox News,CNN和“主流媒体”没有故意发布彻头彻尾的虚假信息,但这并不是一个很好的标准。 当数以百万计的美国人根据他们在网上阅读的内容进行投票时,媒体对我们民主中的民众代表性负有部分责任。 当夸张的点击诱饵成为甚至最负盛名的新闻机构(例如《纽约时报》)的头条新闻时,我们也应该对它们发回推文。

So this begs the question: are CNN and Fox News lying to me when they say an article is not an editorial/op-ed? Who can I actually trust for reliable information, and through what medium?

这就引出了一个问题:当CNN和Fox News说一篇文章不是社论/专栏文章时,他们对我说谎吗? 我实际上可以信任谁,以及通过哪种媒介获得可靠的信息?

看看BLUFFNet算法 (A Look at the BLUFFNet Algorithm)

Before I go into my results, I want to give you a sense of how they were created. At the core of any machine learning algorithm or statistical analysis is data, with the caveat that the data usually must be annotated. In Natural Language Processing terms, the machine learning task at hand is known as Subjectivity Analysis (similar to Sentiment Analysis): identifying sentences that represent the author’s opinion. Unfortunately, labelled news–article–specific data for this task is quite scarce; the best dataset I could find is the MPQA corpus, containing a total of ~11,000 sentences from 692 articles.

在介绍结果之前,我想向您介绍它们是如何创建的。 数据是任何机器学习算法或统计分析的核心,但要注意的是,通常必须对数据进行注释。 用自然语言处理术语来说,手边的机器学习任务称为主观分析(类似于情感分析):识别代表作者观点的句子。 不幸的是,用于此任务的带有标签的新闻文章专用数据十分匮乏; 我能找到的最好的数据集是MPQA语料库,包含692条文章中的〜11,000个句子。

These sentences are converted into vectors of numbers which are then fed into a machine learning model, a neural network. Neural networks have the benefit of being able to learn complex patterns from complex data types, namely natural language. The numbers in these vectors represent the grammatical structure of a sentence as well as identify relevant keywords such as “amazing”, “evil”, or “factual”. By creating patterns from the MPQA sentences with these vectors, we are well equipped to classify unseen sentences as subjective or objective.

这些句子被转换成数字向量,然后被输入到机器学习模型(神经网络)中。 神经网络的好处是能够从复杂的数据类型(即自然语言)中学习复杂的模式。 这些向量中的数字表示句子的语法结构,并标识相关的关键字,例如“惊人”,“邪恶”或“事实”。 通过使用这些向量从MPQA句子创建模式,我们可以很好地将看不见的句子归类为主观或客观。

Now we can determine whether a sentence in a news article is biased, but we still need to find out how biased the entire article is. One way to do this is by taking the average of all the sentences’ biases, but even biased news articles tend to be mostly filled with facts and quotes. A better way is to scale the averages according to how similar they are to objective vs opinionated articles. Since we are asking whether non-editorial articles are biased, it wouldn’t make sense to use news articles for the objective sources. Instead, we take a corpus of Wikipedia articles and process their average biases. We also do the same for a corpus of editorials. This creates some baselines—a random article will likely fall somewhere in between these two extremes. The average bias is scaled between the two using a Logistic Regression model, and the result is the final bias score. If you want to learn more about the algorithm, BLUFFNet, feel free to read the paper preprint here.

现在我们可以确定新闻文章中的句子是否有偏见,但是我们仍然需要找出整个文章有多偏颇。 一种方法是对所有句子的偏见取平均值,但是即使偏颇的新闻文章也往往充满事实和引语。 更好的方法是根据平均数与客观文章或经过评论的文章的相似程度来缩放平均数。 由于我们在询问非编辑文章是否有偏见,因此将新闻文章用作客观来源是没有意义的。 相反,我们采用了维基百科文章的语料库并处理了它们的平均偏见。 对于社论集,我们也这样做。 这创建了一些基准-随机文章可能会介于这两个极端之间。 使用Logistic回归模型在两者之间缩放平均偏差, 结果是最终偏差得分 。 如果您想了解有关算法BLUFFNet的更多信息,请随时在此处阅读纸质预印本。

Now, in order to judge a news source as a whole, we simply take the average of their articles’ bias scores. I recently developed a Chrome Extension which labels subjective articles on the Google Search page (you can find it here). In order to pre-evaluate articles, I created a web crawler that catches RSS feeds and scrapes the Google News website. I was able to use my database of 16,000 classified articles to gather samples for this study.

现在,为了从整体上判断新闻来源,我们只取其文章的偏见得分的平均值即可。 我最近开发了一个Chrome扩展程序,该扩展程序在Google搜索页上标记了主观文章(您可以在此处找到)。 为了预先评估文章,我创建了一个Web搜寻器,可以捕获RSS提要并抓取Google新闻网站。 我能够使用我的16,000篇分类文章的数据库来收集本研究的样本。

美国人如何看待媒体偏见? (What do Americans think about media bias?)

According to a Gallup/Knight Foundation poll of Republicans and Democrats, Fox News and Breitbart tie as the most biased, with MSNBC, HuffPost, and CNN trailing behind. On the other hand, PBS, APNews, NPR, and WSJ are considered the least biased.

根据盖洛普/奈特基金会对共和党和民主党的民意调查 ,福克斯新闻和布赖特巴特的关系最为偏颇,MSNBC,HuffPost和CNN紧随其后。 另一方面,PBS,APNews,NPR和WSJ被认为是偏差最小的。

那么,就目前而言,每个人真正在哪里排名? (So, for the moment of truth, where does everyone really rank?)

There’s a lot to unpack here, so I’ll summarize some of my key findings. According to BLUFFNet, the two least biased sources are CBS News and NPR, and the most biased sources are Vox and HuffPost. When we discount op-eds and editorials, CNN and Fox News are the third and fourth least biased news sources, respectively. Interestingly, Americans think of the two organizations as being among the most biased. The truth is that when they say they are reporting facts objectively, they usually are.

这里有很多要解压的内容,因此,我将总结一些主要发现。 根据BLUFFNet,偏差最小的两个来源是CBS新闻和NPR,偏差最大的来源是Vox和HuffPost。 当我们对杂志和社论进行打折时,CNN和福克斯新闻分别是偏爱新闻的第三和第四最少的新闻来源。 有趣的是,美国人认为这两个组织是最有偏见的组织。 事实是,当他们说他们客观地报告事实时,通常是这样。

While many claim that the New York Times has become increasingly biased over the past few years (just take a look at Bari Weiss’ resignation letter), they still remain in the safe zone of a bias score under 50, aptly falling into the fourth quadrant. On the other hand, the Washington Post tends to write more biased articles than Breitbart, which is very surprising. One (possibly post-hoc) explanation for this phenomenon is that Breitbart’s far-right articles are more extreme than the Washington Post’s left-leaning articles but represent a smaller proportion of their total website.

尽管许多人声称《纽约时报》在过去几年中变得越来越有偏见(只要看看巴里·韦斯的辞职信即可 ),但他们仍然处于偏倚得分低于50的安全区域,恰好落入了第四象限。 另一方面,《华盛顿邮报》倾向于写比布赖特巴特更多有偏见的文章,这非常令人惊讶。 对这种现象的一种解释(可能是事后解释)是,布雷特巴特的极右文章比《华盛顿邮报》的左倾文章更为极端,但在其整个网站中所占的比例较小。

Most of the rating pairs—machine and human—agree with each other, except for a few anomalies. This trend indicates that Americans are very aware of the current state of news and have an accurate understanding of which sources to trust. The ones that don’t agree fall into the fourth quadrant: The New York Times, NBC News, CNN, Fox News, and to some extent, Breitbart. These sites are considered highly biased by the American public, but really aren’t (OK, I’m also a little confused about Breitbart, but it isn’t totally in the quadrant). Conversely, none of the websites surveyed were thought to be objective, yet were found to be subjective. Interesting. In machine learning terms, human classification of news sources has high recall, but low precision.

除少数异常外,大多数评级对(机器对和人类对)彼此同意。 这种趋势表明,美国人非常了解新闻的当前状态,并且对信任哪些消息源有准确的了解。 那些不同意的则落入第四象限:《纽约时报》,NBC新闻,CNN,福克斯新闻以及某种程度上的布雷特巴特。 这些网站被美国公众高度偏见,但实际上并非如此(好的,我对布赖特巴特也有点困惑,但它并不完全在象限内)。 相反,没有一个被调查的网站被认为是客观的,但被发现是主观的。 有趣。 用机器学习的术语来说,新闻源的人工分类具有较高的回想度,但准确性较低。

最后的想法 (Last Thoughts)

Personally, I will rely more on sites like CBS, NPR, CNN, and Fox News for keeping up to date with current events. However, I still don’t recommend watching CNN and Fox News on TV — this study was restricted to what organizations post on their websites.

就个人而言,我将更多地依赖CBS,NPR,CNN和Fox News等网站来了解最新动态。 但是,我仍然不建议在电视上观看CNN和Fox新闻-该研究仅限于组织在其网站上发布的内容。

People simply don’t trust the information they receive from media, whether the medium be news websites, social media, or television. Media bias can be harmful if we are unaware of its effect on our votes as it ultimately skews election results. We have to spread awareness about which organizations are manipulating us simply to improve their bottom lines by creating drama and begging for attention. Americans need to be independent in their votes, yet receptive to new information.

人们只是不信任他们从媒体那里收到的信息,无论媒体是新闻网站,社交媒体还是电视。 如果我们不知道它对选票的影响,因为它最终会扭曲选举结果,那么媒体的偏见可能是有害的。 我们必须传播意识,了解哪些组织正在操纵我们,只是通过制造戏剧性的内容并乞求引起关注来改善他们的底线。 美国人需要在投票中保持独立,但要接受新的信息。

翻译自: https://medium.com/swlh/are-cnn-and-fox-news-really-biased-3ab3ef34bd28

机器学习cnn

今天的文章机器学习cnn_CNN和Fox新闻真的有偏见吗? 机器学习研究。分享到此就结束了,感谢您的阅读。

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 举报,一经查实,本站将立刻删除。
如需转载请保留出处:https://bianchenghao.cn/68161.html

(0)
编程小号编程小号

相关推荐

发表回复

您的电子邮箱地址不会被公开。 必填项已用*标注