Lade Inhalt...

Keyword Analysis of Biased Words Used by CNN and FoxNews

Hausarbeit 2017 10 Seiten

Anglistik - Linguistik


Table of Contents

1 Introduction

2 Theoretical Background

3 Corpus Compilation

4 Keyword Analysis
4.1 Relative Frequency of Biased Keywords
4.2 Relative Frequency of Potent Biased Keywords

5 Results

6 Conclusion



1 Introduction

Throughout the last century, the presentation of news has changed considerably. Media like radio and television opened it to a new field of technological progress and therefore a greater accessibility for the population. The increasing importance of news and its ubiquitous presence induced a field of linguistic research that occupies itself with the critical analysis of language in news. In recent years the internet contributed to the many variations of news presentation, as it catalyzed the digital revolution. Newspapers and networks can now further publish their news in the world wide web. I want to analyze the linguistic features the networks use to present their news by scrutinizing linguistic bias of two networks that cover different sides of the political spectrum – CNN and FOX News. I will perform a keyword analysis on a corpus that consists of texts from mentioned networks’ websites with the topic Donald Trump. The analysis will display the different rate of use of biased words by both networks by comparing the keyword lists to a bias lexicon (Recasens et al., 2013).

2 Theoretical Background

Linguistic features of bias are a topic one can approach in different practices. Recasens et al. developed “linguistic models for analyzing and detecting biased language”. Therefor, they compiled results of previous research on biased language and incorporated it into their models. A thorough list considered attributes such as implicatives, subjectives and bias lexica. This research will only elaborate on the bias lexicon Recasens et al. compiled during their study, based on NPOV edits of Wikipedia entries (Recasens et al., 2013).

In 2004, Paul Baker published an article on querying keywords, which examined different practices of keyword analysis. In this research, the method of comparing frequency lists of corpora and the question of statistical relevance of contrarily queried keywords are revisited.

This research combines keyword analysis and Recasens et al.’s bias lexicon to detect whether there are differences in the frequency of use of biased words in the networks’ online articles. The topic Donald Trump was chosen because it is a topic causing polarizing opinions. A polarizing topic is needed for provoking biased language.

3 Corpus Compilation

The corpus compiled, furthermore referred to as the Network corpus, consists of two subcorpora which are the FOX News subcorpus and the CNN subcorpus. Each of the subcorpora furthermore contain 60 subcorpora. The texts were gethered using the search function of the corresponding network’s website. Altogether, the corpus contains 1,033,341 words: the FOX News subcorpus consisting of 791,890 and the CNN subcorpus consisting of 241,451 words. The FOX News subcorpus contains 19 transcripts of FOX News programs that were broadcasted. This explains why the FOX News subcorpus is significantly larger.

The texts from were compiled by using the search term “Donald Trump” and the advanced search function. The general filter set was for content, only selecting “story” (which excludes videos etc.). The filter for section was ignored(such as politics, travel, opinion etc.) to get a broader spectrum of articles. Then, three times spans were set: the 20 most recent articles on August 13, 2017 which reached back to August 11, 2017, the 20 most relevant articles in the year before the election of Donald Trump (November 7, 2015- November 7, 2016) and the 20 most relevant articles after the election. To counteract an overlapping of the 20 most recent and the 20 most relevant articles after the election, the date for the advanced search was set after the election to November 8, 2016 to August 10, 2017. The search on e was similar. Only included were “stories”, excluding “videos” and “photos”, and each of the sections (here: “All CNN”). The website does not include an advanced search function for selecting a time span, so for the 20 most relevant articles before and after the election, the articles were manually selected from the list of the general most relevant articles.

Next, the content was manually downloaded by copying and pasting the body of the articles from the website into text files. This did not include any other part of the website, for example links to other articles, navigation panels et cetera, or tweets that were embedded in the article.

4 Keyword Analysis

The analysis will compare the corpus of network language of FOX News and CNN on the topic Donald to a neutral reference corpus. The reference corpus will be a random 1.7-million-word sample of the Corpus of News on the Web (NOW) (Davies, 2013), which originally comprises 4.9 billion words and grows about 4 to 5 million words each day by updating itself every night with URLs from Google News. The NOW corpus as a representation of internet news language is therefore suitable as a reference corpus to scrutinize the language CNN and FOX News explicitly use in online articles about Donald Trump.

The tool used for keyword analysis is the freeware corpus analysis toolkit AntCONC, programed by Laurence Anthony.

To create a keyword list, the NOW corpus sample was set as the reference corpus in the tool preferences and its frequency list was collated to the network corpus’ frequency list, using the log likelihood statistical test . AntCONC generated three keyword lists: one for the network corpus and one for both of the CNN and the FOX News subcorpus. The program compared the resulting keyword lists to the bias lexicon (Recasens et al., 2013) . The bias lexicon is a wordlist that “contains 654 bias-inducing lemmas” generated from Wikipedia NPOV edits (Recasens et al., 2013).

4.1 Relative Frequency of Biased Keywords

By comparing the above-mentioned keyword list of the network corpus to the bias lexicon, AntCONC found 250 biased lemmas among the keywords. These are now referred to as biased keywords. Then, the program searched the biased keywords in each of the 120 subcorpora. An absolute and a relative frequency of the appearing biased keywords in each subcorpora was ranked. First, by relating the number of appearance of biased keywords in each individual subcorpus to the amount of words of the subcorpus. The relative frequency of each subcorpus was summarized and the average relative frequency was calculated. Second, by relating the number of appearance of biased keywords in the entire FOX News subcorpus to the amount of words of the entire FOX News subcorpus. The same was done with the CNN subcorpus. Last, by calculating the average amount of biased keywords in the subcorpora of both the FOX News and the CNN subcorpus. The average amount of biased keywords was related to the average amount of words in each subcorpus of the FOX News and the CNN subcorpus.

The first method generated an average relative frequency of 2.09% of biased keywords for the FOX News subcorpus and an average relative frequency of 2.00% of biased keywords for the CNN subcorpus. Method 2 and 3 generated an average relative frequency of 2.10% of biased keywords for FOX News and 1.95% of biased keywords for CNN (cf. Appendix).

Calculating an average relative frequency of biased keywords by using the network corpus keyword list, displays a marginally higher frequency of biased keywords in FOX News articles (2.09%, respectively 2.10% as opposed to 2.00%, respectively 1.95% for CNN). This margin is too insignificant to qualify any statement regarding the use of biased language in online network articles on Donald Trump. Irrelevant hits will have to be excluded.

4.2 Relative Frequency of Potent Biased Keywords

The difference to the method in Section 4.1 is, that this method includes only relevant hits to exclude statistical outliers. Biased keywords with low keyness factors were not considered, as they originate from

a) biased keywords that are used unfrequented throughout the entire (sub)corpus or
b) biased keywords that are used extremely frequented in one or few texts of the (sub)corpus but extremely unfrequented or non-existing in the rest of the corpus.

Both cause values that are statistical outliers and insignificant to the data. One way to exclude such data would be to query key keywords (Baker, 2004). However, in this research, biased keywords with high keyness factors are chosen to minimize the chance of irrelevant hits.

Another method to exclude statistical outliers is to exclude biased keywords that are relevant to one subcorpus (e.g. FOX News subcorpus) but not to the other (e.g. CNN subcorpus). Biased keywords like these score high keyness factors in a list of the entire network corpus but are irrelevant to the subcorpus that would score low keyness factors on them if regarded individually.



ISBN (eBook)
ISBN (Buch)
Institution / Hochschule
Freie Universität Berlin
keyword analysis biased words used foxnews




Titel: Keyword Analysis of Biased Words Used by CNN and FoxNews