Mining Twitter for Airline Consumer Sentiment

by Jeffrey Breen

Airlines, Consumers, and Twitter

Anyone who travels regularly recognizes that airlines struggle to deliver a consistent, positive customer experience. Through extensive interview and survey work, the American Customer Satisfaction Index ( quantifies this impression. As a group, airlines falls at the bottom of their industry rankings, below the Post Office and insurance companies:

Meanwhile, the immediacy and accessibility of Twitter provides a real-time glimpse into consumer's frustration:

This tutorial demonstrates how to use R to collect tweets and apply a (very) naive algorithm to estimate their emotional sentiment. Despite the simplicity of our algorithm and the very small size of our sample of tweets, we are nevertheless able to find an interesting result which compares well to ACSI's respected study of customer satisfaction.

This tutorial was originally presented as a first-time introduction to R for the savvy audience of the Boston Predictive Analytics Meetup Group. As such, its primary focus is to highlight R as a tool to get data easily and synthesize results quickly.

Presentation slides are available from my blog ( and the complete code is available on github (

This work is also featured in Elsevier's forthcoming book Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications by Gary Miner et al. (

Loading Data into R

One of R’s strengths is reading data from other packages and sources. The base distribution includes functions to read a variety of file formats including comma-separated and fixed-width text files, along with files from other traditional statistical software such as SAS, SPSS, and Stata. Add-on packages are available to access many other file formats, relational databases, Hadoop, and specialized data sources on the web and elsewhere.

The twitteR package

One such package is the twitteR package by Jeff Gentry. Designed to access Twitter’s JSON API and supporting OAuth authentication through its companion ROAuth package, twitteR makes searching Twitter as simple as can be.

The library is available from the Comprehensive R Archive Network. Most IDEs has a menu command to find and install new packages, but even from the command line, it’s a one-line affair:

> install.packages('twitteR', dependencies=T)

R will download and install the twitteR package and any package it depends on, such as the library to parse JSON. (The “>” above is R’s command prompt which is included throughout this tutorial to distinguish our input from R’s output.)

Once installed, just load twitteR with the library() command:

> library(twitteR)
Loading required package: RCurl
Loading required package: bitops
Loading required package: RJSONIO

Again, notice that R will automatically resolve dependencies and load any required packages. Once loaded, one line is all you need to search Twitter and fetch up to 1,500 results:

Update: You must now first authenticate your twitteR session using setup_twitter_oauth("API key", "API secret") command. Please see the Getting Started section at for details
> delta.tweets = searchTwitter('@delta', n=1500)

The searchTwitter() function returns an R list, in this case 1,500 elements long:

> length(delta.tweets)
[1] 1500

> class(delta.tweets)
[1] "list"

A list in R is a collection of objects. Its elements may be referenced by name or by number (position). Double square brackets are used to refer to individual elements.

Let’s take a closer look at the first element in the list:

> tweet = delta.tweets[[1]]

> class(tweet)
[1] "status"
> attr(,"package")
[1] "twitteR"

So tweet is an object of type status which is provided by the twitteR package. The documentation for the status class (accessed by typing ?status) describes accessor methods like getScreenName() and getText() which do what you would expect:

> tweet$getScreenName()
[1] "Alaqawari"

> tweet$getText()
[1] "I am ready to head home. Inshallah will try to get on the earlier flight to Fresno. @Delta @DeltaAssist"

Extracting Text from Tweets

So now that we know how to extract the text from one tweet, how should we process all 1,500? In most traditional programming languages, we would write a loop. We can do that in R, too, but it’s not recommended. R is an interpreted language, so any code you write to perform a loop is likely to run much more slowly than equivalent code in compiled language like C. (Most of R’s built-in functions are compiled and are therefore very fast.)

Fortunately, R provides a facility to iteratively apply functions to each object in a collection through the “apply” family of functions: apply(), lapply(), mapply(), sapply(), tapply(), vapply(), etc. Unfortunately this family has grown organically over time and lacks consistent naming and calling conventions.

The plyr package

Rice University’s Hadley Wickham has written the plyr package to overcome the limitations of the apply family of functions. While the package provides some nice bells and whistles which we will see as we work with it, its primary advantage is that it provides a simple and consistent naming convention to its apply()-like functions.

The first letter specifies the data type you’re starting with (“d” for data.frame, “l” for list, “a” for array, etc.), the second letter specifies the data type you want as output, and the rest of the name is always “ply”. So, if you have an array (“a”) and you want a list (“l”) as output, use “a” + “l” + “ply” = alply().

In our case, we have a list (“l”) but only need a simple array (“a”) as output, so we use laply(). But we don’t have a simple function like sum() or median() to run on each element. Instead, we need to call a function—the getText() accessor method—on each status object. Such a function is so easy to write we can even write it as an “anonymous” function, in place:

> delta.text = laply(delta.tweets, function(t) t$getText() )

> length(delta.text)
[1] 1500

> head(delta.text, 5)
[1] "I am ready to head home. Inshallah will try to get on the earlier flight to Fresno. @Delta @DeltaAssist"
[2] "@Delta Releases 2010 Corporate Responsibility Report - @PRNewswire (press release) :"
[3] "Another week, another upgrade! Thanks @Delta!"
[4] "I'm not able to check in or select a seat for flight DL223/KL6023 to Seattle tomorrow. Help? @KLM @delta"
[5] "In my boredom of waiting realized @deltaairlines is now @delta seriously..... Stil waiting and your not even unloading status yet"

By using laply(), we requested an array as output, but since our result had only one dimension (column), plyr automatically simplified it to a vector, which you can think of as a one-dimensional array:

> class(delta.text)
[1] "character"

(If we really need an array for some reason, this default behavior can be overridden with a .drop=F parameter.)

With the text of our tweets extracted into a simple vector, let’s turn our attention to estimating its emotional sentiment.

Estimating Sentiment

Sentiment analysis is an active area of research involving complicated algorithms and subtleties. For the purposes of this tutorial, we err on the side of simplicity and estimate a tweet’s sentiment by counting the number of occurrences of “positive” and “negative” words.

To assign a numeric score to each tweet, we’ll simply subtract the number of occurrences of negative words from the number of positive. Larger negative scores will correspond to more negative expressions of sentiment, neutral (or balanced) tweets should net to zero, and very positive tweets should score larger, positive numbers.

Loading the opinion lexicon

First we need to find a source which categorizes words by sentiment. A Google search for “sentiment analysis” or “opinion mining” yields a number of sources of such word lists. Hu and Liu’s “opinion lexicon” categorizes nearly 6,800 words as positive or negative and can be downloaded from Bing Liu’s web site:

The lexicon consists of two text files, one containing a list of positive words and the other containing negative words. Each file begins with some documentation, which we need to skip and is denoted by initial semi-colon (“;”) characters.

R’s built in scan() function makes short work of reading these files:

> hu.liu.pos = scan('data/opinion-lexicon-English/positive-words.txt',
                        what='character', comment.char=';')

> hu.liu.neg = scan('data/opinion-lexicon-English/negative-words.txt',
                        what='character', comment.char=';')

These objects are simple character vectors, just like our delta.text:

> class(hu.liu.neg)
[1] "character"

> class(hu.liu.pos)
[1] "character

> length(hu.liu.neg)
[1] 4783

> length(hu.liu.pos)
[1] 2006

R’s c() function (for “combine”) allows us to add a few industry- and Twitter-specific terms to form our final pos.words and neg.words vectors:

> pos.words = c(hu.liu.pos, 'upgrade')
> neg.words = c(hu.liu.neg, 'wtf', 'wait', 'waiting',
                    'epicfail', 'mechanical')

Implementing our sentiment scoring algorithm

To score each tweet, our score.sentiment() function uses laply() to iterate through the input text. It strips punctuation and control characters from each line using R’s regular expression-powered substitution function, gsub(), and uses match() against each word list to find matches:

score.sentiment = function(sentences, pos.words, neg.words, .progress='none')

    # we got a vector of sentences. plyr will handle a list
    # or a vector as an "l" for us
    # we want a simple array of scores back, so we use
    # "l" + "a" + "ply" = "laply":
    scores = laply(sentences, function(sentence, pos.words, neg.words) {
        # clean up sentences with R's regex-driven global substitute, gsub():
        sentence = gsub('[[:punct:]]', '', sentence)
        sentence = gsub('[[:cntrl:]]', '', sentence)
        sentence = gsub('\\d+', '', sentence)
        # and convert to lower case:
        sentence = tolower(sentence)

        # split into words. str_split is in the stringr package
        word.list = str_split(sentence, '\\s+')
        # sometimes a list() is one level of hierarchy too much
        words = unlist(word.list)

        # compare our words to the dictionaries of positive & negative terms
        pos.matches = match(words, pos.words)
        neg.matches = match(words, neg.words)

        # match() returns the position of the matched term or NA
        # we just want a TRUE/FALSE:
        pos.matches = !
        neg.matches = !

        # and conveniently enough, TRUE/FALSE will be treated as 1/0 by sum():
        score = sum(pos.matches) - sum(neg.matches)

    }, pos.words, neg.words, .progress=.progress )

    scores.df = data.frame(score=scores, text=sentences)

Algorithm sanity check

Let’s quickly test our score.sentiment() function and word lists with some sample sentences:

> sample = c("You're awesome and I love you",
                "I hate and hate and hate. So angry. Die!",
                "Impressed and amazed: you are peerless in your
                    achievement of unparalleled mediocrity.")

> result = score.sentiment(sample, pos.words, neg.words)
> result
  score      text
1     2      You're awesome and I love you
2    -5      I hate and hate and hate. So angry. Die!
3     4      Impressed and amazed: you are peerless in your
                    achievement of unparalleled mediocrity.

Not surprisingly, our simple algorithm completely misses the sarcasm of the third sentence, but our code seems to be working as intended.

Let’s try it with a couple of real tweets:

  score      text
1    -4      @Delta I'm going to need you to get it together. Delay on
                    tarmac, delayed connection, crazy gate changes... #annoyed
2     5      Surprised and happy that @Delta helped me avoid the 3.5 hr
                    layover I was scheduled for.  Patient and helpful agents.

Our algorithm again misses the tinge of sarcasm in the second tweet, but at least this one is, on balance, mostly positive.

data.frames hold tabular data

Our score.sentiment() function returns tabular data with multiple columns and multiple rows. In R, the data.frame is the workhorse for such spreadsheet-like data:

> result
  score      text
1     2      You're awesome and I love you
2    -5      I hate and hate and hate. So angry. Die!
3     4      Impressed and amazed: you are peerless in your
                    achievement of unparalleled mediocrity.
> class(result)
[1] "data.frame"

A data.frame is technically composed of lists, so each column, row, and element can be accessed by position (a number) or a name (a string). While we normally only use column names, rows are named too, defaulting to their sequential position within the data.frame (as a string):

> colnames(result)<br>
[1] "score" "text" 

> rownames(result)
[1] "1" "2" "3"

We can extract the score column by name using array notation and the row, column convention:

> result[,'score']
[1]  2 -5  4

The dollar sign ($) convention is more common:

> result$score
[1]  2 -5  4

Or we can reference its position (again using the row, column convention):

> result[,1]
[1]  2 -5  4

Note that a missing specification returns all elements, so result[,1] yields values from each row in the first column.

Similarly, individual elements can be accessed by name, position, or any combination:

> result[1,1]
[1] 2

> result[1,'score']
[1] 2

> result['1','score']
[1] 2

Finally, positions can also be specified as ranges or vectors of offsets:

> result[1:2, 'score']
[1]  2 -5</r>

> result[c(1,3), 'score']
[1] 2 4

Scoring the tweets

To score the text of Delta’s tweets, just feed it into our score.sentiment() function:

> delta.scores = score.sentiment(delta.text, pos.words,
                                neg.words, .progress='text')
|================================================================| 100%

The .progress=‘text’ parameter is passed to laply() to provide a text progress bar as feedback—a handy feature for long-running processes and provided by all of plyr’s functions.

Now that we have our results in a data.frame, we should add columns to identify the airline, since we will later combine scores from other airlines. To create a new column in a data.frame, simply refer to it while assigning a value:

> delta.scores$airline = 'Delta'
> delta.scores$code = 'DL’

R’s built-in hist() function will create and plot a histogram of sentiment scores for our tweets about Delta:

> hist(delta.scores$score)

hist() of delta.scores

ggplot2 is an alternative graphics package by Hadley Wickham (the author or plyr) which generates much more refined output. It is based on Wilkerson’s “grammar of graphics”, building visualizations from individual layers. While ggplot2 normally requires data in a data.frame, its qplot() function provides a simplified interface, accepting vectors just like base R’s plotting functions:

> q = qplot(delta.scores$score)
> q = q + theme_bw()

histogram of delta.scores generated by ggplot2's qplot()

Thanks to R’s object orientation, the normal “+” operator is used to combine plot options and layers.

Repeat for each airline

We can similarly capture, extract, and score tweets for American Airlines, JetBlue, Southwest, United, and US Airways. R’s rbind() function can then combine all the rows into a single data.frame:

> all.scores = rbind( american.scores, delta.scores, jetblue.scores,
                        southwest.scores, united.scores, us.scores )

Compare the score distributions

Let’s use ggplot2 to take a look at the score distributions for all the airlines. Since our results are already in a single data.frame, we will eschew qplot() and build our visualization in the normal ggplot2 way: layer by layer.

First, create the graph with a call to the ggplot() function, at which time we can specify the data.frame to use and how to map its data columns to plot aesthetics (e.g., x, y, line color, fill color, symbol size, etc.). In this case, we want to plot the sentiment score along the x axis (x=score) and will differentiate each airline’s bars with a different fill color (fill=airline)

> g = ggplot(data=all.scores, mapping=aes(x=score, fill=airline) )

The geom_bar() function creates the bar graph layer itself. Since geom_bar() is often used to display histograms, by default it will automatically bin and compute frequencies for you. We haven’t computed anything, so we’re happy to let the graph layer do it for us, but we specify binwidth=1 so it doesn’t try to pick bins smaller or bigger than our simple integer scores warrant: > g = g + geom_bar(binwidth=1)

For any given score, we will have several bars to display, one for each airline. By default, geom_bar() will stack the bars on top of one another, but that would make it very difficult to compare them. We can easily move them next to one another by specifying position=”dodge”, but ggplot2’s faceting feature is better yet. It will move each airline into its own separate subplot, and it couldn’t be easier to invoke:

> g = g + facet_grid(airline~.)

Finally, we ask for a cleaner display (with a plain white background) and a nice color palette from Cindy Brewer’s

> g = g + theme_bw() + scale_fill_brewer()

comparison of airline score distributions thanks to ggplot2's faceting

Looking at the score distributions, some asymmetry is evident. For example, the bars at +1 are much larger than the -1 bars for Southwest and JetBlue. But given the simplicity of our sentiment scoring algorithm, let’s focus on the extreme tails since bigger differences should be more likely to capture real differences in sentiment.

Ignore the middle

Let’s create two new boolean columns to focus on tweets with very negative (score <= -2) and very positive (score >= 2) sentiment scores:

> all.scores$very.pos.bool = all.scores$score >= 2
> all.scores$very.neg.bool = all.scores$score <= -2

> all.scores[c(1,6,47,99), c(1, 3:6)]
   score  airline code very.pos.bool very.neg.bool
1     -1 American   AA         FALSE         FALSE
6     -3 American   AA         FALSE          TRUE
47     2 American   AA          TRUE         FALSE
99    -5 American   AA         FALSE          TRUE

We want to count the occurrence of these these strong sentiments for each airline. We can easily cast these TRUE/FALSE values to numeric 1/0, so we can then use sum() to count them:

> all.scores$very.pos = as.numeric( all.scores$very.pos.bool )
> all.scores$very.neg = as.numeric( all.scores$very.neg.bool )

> all.scores[c(1,6,47,99), c(1, 3:8)]
   score  airline code very.pos.bool very.neg.bool very.pos very.neg
1     -1 American   AA         FALSE         FALSE        0        0
6     -3 American   AA         FALSE          TRUE        0        1
47     2 American   AA          TRUE         FALSE        1        0
99    -5 American   AA         FALSE          TRUE        0        1

We can use plyr’s ddply() function to aggregate the rows for each airline, calling the summarise() function to create new columns containing the counts:

> twitter.df = ddply(all.scores, c('airline', 'code'), summarise,
                     very.pos.count=sum( very.pos ),
                     very.neg.count=sum( very.neg ) )

As a single, final score for each airline, let’s calculate the percentage of these “extreme” tweets which are positive:

> twitter.df$very.tot = twitter.df$very.pos.count +

> twitter.df$score = round( 100 * twitter.df$very.pos.count /
                                  twitter.df$very.tot )

The orderBy() function from the doBy package makes it easy to sort the results. Note that it preserves the original row names:

> orderBy(~-score, twitter.df)
     airline code very.pos.count very.neg.count very.tot score
3    JetBlue   B6            146             28      174    84
4  Southwest   WN            207             72      279    74
1   American   AA             80             57      137    58
2      Delta   DL            152            185      337    45
6     United   UA             82            102      184    45
5 US Airways   US             38             62      100    38

Compare with ACSI’s customer satisfaction index

Now these results are from a very small sample of consumers who chose to tweet at a particular point in time. It would be neither fair nor valid to draw any firm conclusions from such a small sample. But it is tantalizing to think that such a simple measure might capture something real—especially with JetBlue and Southwest leading the legacy airlines by such a clear margin.

But rather than relying on our personal brand experience or other anecdotal evidence, let’s compare our results with the American Customer Satisfaction Index. Each year the ACSI conducts tens of thousands of interviews to measure consumer satisfaction with hundreds of companies and organizations. They kindly publish their top-level results on their web site (

Airline customer satisfaction scores published by the American Customer Satisfaction Index (ACSI)

Scrape the ACSI web site

Duncan Temple Lang’s XML package provides many useful functions for scraping and parsing data from the web. But one really shines; a single call to readHTMLTable() will download a web page from a URL, parse the HTML, extract any tables, and return a list of populated data.frames, complete with headers.

We specify which=1 to retrieve only the first table on the page, and header=T to indicate that the table headings should be used as column names:

> acsi.url = ';view=article&amp;id=147&amp;catid=&amp;Itemid=212&amp;i=Airlines'

> acsi.df = readHTMLTable(acsi.url, header=T, which=1, stringsAsFactors=F)

Contents of the acsi.df data.frame scraped from ACSI web site

Since we are only interested in the most recent results, we only need to keep the first column (containing the airline names) and the nineteenth (containing 2011’s scores):

> acsi.df = acsi.df[,c(1,19)]

Unfortunately, the headings in the original HTML table do not make very good column names, but they are easy to change:

> colnames(acsi.df)
[1] ""   "11"

> colnames(acsi.df) = c('airline', 'score')

> colnames(acsi.df)
[1] "airline" "score"

As some final clean up, add two-letter airline codes and ensure that the scores are treated as numbers:

> acsi.df$code = c('WN', NA, NA, 'CO', 'AA', 'UA', 'US', 'DL', 'NW')

> acsi.df$score = as.numeric(acsi.df$score)
Warning message:
NAs introduced by coercion

The “NAs introduced by coercion” warning message indicates that the now-defunct Northwest’s score of “#” couldn’t be translated into a number, so R changed it to NA (as in “not applicable”). R was built with real data in mind, so its support of NA values is robust and (nearly) universal.

> acsi.df
             airline score code
1          Southwest    81   WN
2         All Others    76
3           Airlines    65
4        Continental    64   CO
5           American    63   AA
6             United    61   UA
7         US Airways    61   US
8              Delta    56   DL
9 Northwest Airlines    NA   NW

Compare Twitter results with ACSI scores

In order to compare our Twitter results with the ACSI scores, let’s construct a new data.frame which contains both. The merge() function joins together two data.frames using common fields (as specified with the by parameter). Columns with different data but conflicting names (like our two “scores” columns) can be renamed according to the suffixes parameter:

> compare.df = merge(twitter.df, acsi.df, by=c('code', 'airline'),
                     suffixes=c('.twitter', '.acsi'))

compare.df data.frame compares Twitter sentiment score and its 2011 ACSI score

Unless you specify all=T, non-matching rows will be dropped by merge()—like SQL’s INNER JOIN— and that’s what happened to top-scoring JetBlue.

Graph the results

We will again use ggplot2 to display our results, this time on a simple scatter plot. We will plot our Twitter score along the x-axis (x=score.twitter), the ACSI customer satisfaction index along the y (y=score.acsi), and will use color to distinguish the airlines (color=airline):

> g = ggplot( compare.df, aes(x=score.twitter, y=score.acsi) ) +
       geom_point( aes(color=airline), size=5 ) +
       theme_bw() + opts( legend.position=c(0.2, 0.85) )

compare.df plotted with ggplot2

Like R itself, ggplot2 was built for performing analyses, so it can do a lot more than just display data. Adding a geom_smooth() layer will compute and overlay a running average of your data. But specify method=”lm” and it will automatically run a linear regression and plot the best fitting model (lm() is R’s linear modeling function):

> g = g + geom_smooth(aes(group=1), se=F, method="lm")

geom_smooth() layer is added to perform a linear regression and overlay it upon the scatterplot of compare.df

Considering how little data we collected, over such a short time, and the crudeness of our sentiment scoring algorithm, this correspondence to the highly regarded ACSI survey is remarkable indeed!

Notes and Acknowledgements

R’s community is the source of much of its strength. This tutorial would not have been possible without the packages contributed by Hadley Wickham, Duncan Temple Lang, and especially Jeff Gentry.

Thanks to John Verostek for organizing the very interesting Boston Predictive Analytics Meetup Group and for providing valuable input to this presentation. Thanks also to Gary Miner for the invitation to include this tutorial in his forthcoming book.


M.A. Harrower and C.A. Brewer, 2003, “ An Online Tool for Selecting Color Schemes for Maps,” The Cartographic Journal, 40(1): 27-37

Minqing Hu and Bing Liu, 2004, "Mining and Summarizing Customer Reviews." Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2004), Aug 22-25, 2004, Seattle, Washington, USA.

Bing Liu, Minqing Hu and Junsheng Cheng, 2005, "Opinion Observer: Analyzing and Comparing Opinions on the Web." Proceedings of the 14th International World Wide Web conference (WWW-2005), May 10-14, 2005, Chiba, Japan.

Hadley Wickham. ggplot2: Elegant Graphics for Data Analysis (Use R). Dordrecht: Springer, 2009.

Hadley Wickham, 2011, “The Split-Apply-Combine Strategy for Data Analysis.” Journal of Statistical Software, 40(1), 1-29.

Leland Wilkerson. The Grammar of Graphics. Dordrecht: Springer, 1999.

Slocum et al. Thematic Cartography and Visualization. Upper Saddle River: Prentice Hall, 2008.

Posted On Oct 25, 2011. Originally posted on