This is an old revision of the document!


Subsetting

subset() command

To subset data, we use the subset() command. The syntax of this function is that you tell it the name of the dataset you want to subset, then you tell it the rule for subsetting. So if we want to see the part of the labike dataset that has bike counts of more than 300, we do this:

subset(labike, bike_count_pm > 300)

As always, this is just printing into the Console, so if we want to save the subset for later use we need to name it something

LotsOfBikes = subset(labike, bike_count_pm > 300)

Now it will appear in your Workspace tab, and if you want to see it you can click on LotsOfBikes to see it.

Subsetting with text

If you want to subset using a word in a text variable, you have to use the grep1() command along with the subset() command. grepl() is part of a family of commands that do similar things, so check out the documentation for the function if you want to learn more.

grepl("bike", labike$type)

## [1] FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE
## [12] FALSE FALSE TRUE FALSE TRUE FALSE TRUE TRUE FALSE TRUE FALSE
## [23] TRUE TRUE FALSE FALSE TRUE TRUE TRUE FALSE TRUE FALSE TRUE
## [34] TRUE TRUE FALSE TRUE FALSE

This is telling us whether or not a particular row in the type column had the word “bike” in it. But, it’s just a logical vector with TRUE/FALSE values. To actually pull out the data, we need to use it with the subset() command.

subset(labike, grepl("bike", labike$type))
Print/export