Renaming columns with dplyr in R

Renaming columns with dplyr in R

Image for postPhoto by Jon Tyson on Unsplash

With dplyr, it?s super easy to rename columns within your dataframe. This can be handy if you want to join two dataframes on a key, and it?s easier to just rename the column than specifying further in the join.

Alternatively, from a data munging perspective, sometimes you can have unhelpful column names like x1, x2, x3, so cleaning these up makes your dataframes and work more legible. This is particularly handy if you?re sharing your work with others, or indeed if you?re in an environment where multiple people are working on the same data, meaning that clarity is key.

While there are numerous ways to rename columns within R, I?ve found that dplyr?s approach is arguably one of the most intuitive. We?ll take a look at it now with the UFOs dataset from Kaggle.

Using colnames() we can take a look at the existing column names:

colnames(ufos)[1] ?datetime? ?city? ?state?[4] ?country? ?shape? ?duration..seconds.?[7] ?duration..hours.min.? ?comments? ?date.posted?[10] ?latitude? ?longitude?

We might want to add more clarity around the ?comments? column, perhaps specifying that these aren?t metadata comments from the analyst, but an actual part of the dataset. In this instance, let?s change the ?comments? column to ?spotter.comments?:

To change the column name with dplyr, we can specify the following:

ufos <- ufos %>% rename(spotter.comments = comments)

From this example, we can note that the syntax of rename is as follows

rename(new variable name = existing variable name)

And that?s all there is to it! We can confirm that our change has been made by re-running colnames:

colnames(ufos)[1] ?datetime? ?city? ?state?[4] ?country? ?shape? ?duration..seconds.?[7] ?duration..hours.min.? ?spotter.comments? ?date.posted?[10] ?latitude? ?longitude?

What about if we wanted to rename more than one column in a single statement? Well this is easily done too. We simply pass multiple the columns as a comma separated list. In this example, we?ll rename latitude and longitude to lat and long respectively:

ufos <- ufos %>% rename(lat = latitude, long = longitude )

And calling colnames() to confirm:

colnames(ufos)[1] ?datetime? ?city? ?state?[4] ?country? ?shape? ?duration..seconds.?[7] ?duration..hours.min.? ?spotter.comments? ?date.posted?[10] ?lat? ?long?

And there we have it! As you can see, it?s super easy to rename columns with dplyr.

Useful Resources

Selecting/renaming variables: https://www.rdocumentation.org/packages/dplyr/versions/0.7.6/topics/select

http://www.sthda.com/english/wiki/renaming-data-frame-columns-in-r#renaming-columns-with-dplyrrename

22