This is an old revision of the document!

Text Analysis

Initializing text

To start doing any type of text analysis, you need to initialize the text so R knows how to interpret it. The way to do this is by using the InitializeText() function,

TwitterText = InitializeText(twitterwithdate$message)

As with everything else, you need to give your new data a name so it doesn’t just print out in the Console. After running this command, you should be able to see your new variable in the Workspace tab, Text initialized However, if you click on it, you’ll likely get a warning like this: Text preview warning I recommend saying “No”

Looking at Text Data

If you want to look at text data and not get a warning like the one above, use the command inspect() and the head() command,

## A corpus with 6 text documents
## The metadata consists of 2 tag-value pairs and a data frame
## Available tags are:
##   create_date creator
## Available variables in the data frame are:
##   MetaID
## [[1]]
## Fat Jack s Erratic Rants: WARM WEATHER AND GAS SAVINGS: BUY A MOTORCYCLE: Spring is here. or at least
## [[2]]
## "Spring is here. And I am a flower. with nothing interesting to say." Who said it? :-)
## [[3]]
## Ants are starting to invade my bathroom. Guess that means spring is here. Grrrrr
## [[4]]
## Im so bored spring is here.and so far it s going bad! :/
## [[5]]
## Guess Spring is here and must figure out what type of veg. to plant and more flowers.
## [[6]]
## @Royboi pretty good. finally feeling like spring is here. feeling all randy and ready for patio drink

Processing Text

After you’ve initialized the text, you probably want to process it to convert all the words to lowercase or remove numbers or stem words. To do that, use the function ProcessText(),

GoodText = ProcessText(TwitterText)

By default, this function converts all the text to lowercase, removes punctionation, and removes numbers. It does not automatically remove whitespace, remove stopwords, or “stem” words (e.g. convert both “walked” and “walking” to “walk”). If you want to do any of those things (or if you don’t want to use the defaults), you can change the options,

GoodText = ProcessText(TwitterText, removenumbers = FALSE, removestopwords = TRUE)

Bar Plot of Words

Much like the normal barplot, the command MakeWordBar creates a bar plot of frequently occurring words in the dataset.


Word Bar By default, it only shows words that appeared at least 2 times, but you can change that, too.

MakeWordBar(GoodText, min.freq = 10)

Word Bar 2

Since only a few words are appearing, you might want to consider rotating them with las=2, as described in plot options.

Word Cloud

To make a word cloud, use the function MakeWordCloud.


Word Cloud

This function has two parameters you can modify The first is color, which should be a specification for a color range. Unlike most plots, we want to have a range of colors instead of just one, so we need to specify the colors different. The default value is “BuGn”, but you can see the list by using display.brewer.all(),

display.brewer.all(type = "seq")

Display Colors

The other parameter is the same as for the MakeWordBar(), min.freq, which again defaults to only showing words that occurred two or more times.

MakeWordCloud(GoodText, col = "OrRd", min.freq = 10)

Word Cloud 2