Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
rstudio:text [2014/02/12 16:38]
james
rstudio:text [2016/05/13 20:45] (current)
Line 55: Line 55:
 </​code>​ </​code>​
  
-By default, this function converts all the text to lowercase, removes ​punctionation, and removes numbers. It does not automatically remove whitespace, remove stopwords, or “stem” words (e.g. convert both “walked” and “walking” to “walk”). If you want to do any of those things (or if you don’t want to use the defaults), you can change the options,+By default, this function converts all the text to lowercase, removes ​punctuation, and removes numbers. It does not automatically remove whitespace, remove stopwords ​(e.g. words like '​the',​ '​and',​ '​there'​), or “stem” words (e.g. convert both “walked” and “walking” to “walk”). If you want to do any of those things (or if you don’t want to use the defaults), you can change the options,
  
 <code r> <code r>
-GoodText = ProcessText(TwitterText, ​removenumbers ​FALSE, removestopwords = TRUE)+GoodText = ProcessText(TwitterText, ​stopwords.list ​stopwords("​SMART"​), removestopwords = TRUE, removenumbers = FALSE)
 </​code>​ </​code>​
 +
 +Something to note is that there are standard sets of stopword lists. The ProcessText() function uses the "​EN"​ list by default. The "​SMART"​ stopword is a more extensive alternative. To choose one or the other set the stopwords.list argument to be stopwords("​SMART"​) or stopwords("​EN"​).
  
 ==== Bar Plot of Words ==== ==== Bar Plot of Words ====
Line 67: Line 69:
 MakeWordBar(GoodText) MakeWordBar(GoodText)
 </​code>​ </​code>​
 +{{ :​rstudio:​wordbarchart.jpeg?​direct&​300 |}}
 +
  
-{{ :​rstudio:​wordbar.jpg?​direct&​300 |Word Bar}} 
 By default, it only shows words that appeared at least 2 times, but you can change that, too. By default, it only shows words that appeared at least 2 times, but you can change that, too.
  
 <code r> <code r>
-MakeWordBar(GoodText,​ min.freq = 10)+MakeWordBar(GoodText,​ min.freq = 25)
 </​code>​ </​code>​
 +{{ :​rstudio:​wordbarchart25.jpeg?​direct&​300 |}}
  
-{{ :rstudio:wordbar2.jpg?​direct&​300 |Word Bar 2}}+Finally, if you want to look at the top 5 words (by total counts) you can use 
 +<code r> 
 +MakeWordBar(GoodText,​ top=5, format='​count'​) 
 +</​code>​ 
 +{{ :rstudio:wordbarchart5c.jpeg?​direct&​300 |}}
  
-Since only a few words are appearingyou might want to consider rotating them with ''​las=2''​, as described in [[rstudio:plots#​Changing_things_about_plots| plot options]].+or you can look at the top 5% of words by using 
 +<code r> 
 +MakeWordBar(GoodTexttop=5, format='count'
 +</​code>​ 
 +{{ :rstudio:wordbarchart5p.jpeg?​direct&​300 |}}
  
 ==== Word Cloud ==== ==== Word Cloud ====
Line 85: Line 97:
 MakeWordCloud(GoodText) MakeWordCloud(GoodText)
 </​code>​ </​code>​
 +{{ :​rstudio:​wordcloud.jpeg?​direct&​300 |}}
  
-{{ :​rstudio:​wordcloud1.jpg?​direct&​300 |Word Cloud}} 
  
-This function has two parameters you can modify The first is color, which should be a specification for a color range. Unlike most plots, we want to have a range of colors instead of just one, so we need to specify the colors different. The default value is "​BuGn",​ but you can see the list by using ''​display.brewer.all()'',​+This function has a few parameters you can modifyThe first is color, which should be a specification for a color range. Unlike most plots, we want to have a range of colors instead of just one, so we need to specify the colors different. The default value is "​BuGn",​ but you can see the list by using ''​display.brewer.all()'',​
 <code r> <code r>
 display.brewer.all(type = "​seq"​) display.brewer.all(type = "​seq"​)
Line 95: Line 107:
 {{ :​rstudio:​displaycolors.jpg?​direct&​300 |Display Colors}} {{ :​rstudio:​displaycolors.jpg?​direct&​300 |Display Colors}}
  
-The other parameter is the same as for the ''​MakeWordBar()'',​ ''​min.freq'',​ which again defaults to only showing words that occurred two or more times.+Another ​parameter is the same as for the ''​MakeWordBar()'',​ ''​min.freq'',​ which again defaults to only showing words that occurred two or more times.
 <code r> <code r>
-MakeWordCloud(GoodText, ​col = "​OrRd",​ min.freq = 10)+MakeWordCloud(GoodText, ​color = "​OrRd",​ min.freq = 10)
 </​code>​ </​code>​
  
-{{ :rstudio:wordcloud2.jpg?​direct&​300 |Word Cloud 2}}+{{ :rstudio:wordcloudorrd.jpeg?​direct&​300 |}} 
 + 
 +And just like the MakeWordBar() function, we can specify to display the top 5 (by count) words 
 +<r code> 
 +MakeWordCloud(GoodText,​ top = 5, format = "​count"​) 
 +</​code>​ 
 +{{ :​rstudio:​wordcloud5c.jpeg?​direct&​300 |}} 
 + 
 +or you can display the top 5% of words 
 +<r code> 
 +MakeWordCloud(GoodText,​ top = 5, format = "​count"​) 
 +</​code>​ 
 +{{ :​rstudio:​wordcloud5p.jpeg?​direct&​300 |}}
Print/export