Merging multiple dataframes in R

Merging multiple dataframes in R

Image for postPhoto by Laurenz Kleinheider on Unsplash

R has lots of handy functionality for merging and appending multiple dataframes. In particular, I?d like to cover the use case of when you have multiple dataframes with the same columns that you wish to merge together.

As an example, let?s break the UFOs dataset from Kaggle into multiple pieces and see how we can easily fit it back together.

To split the data, we?ll use the following function devised by kcha:

ufos <- read.csv(“scrubbed.csv”)splitDf <- split_k(ufos,4)

We split the data into four dataframes, ufo1, ufo2, ufo3 and ufo4.

ufo1 <- splitDf$`1`ufo2 <- splitDf$`2`ufo3 <- splitDf$`3`ufo4 <- splitDf$`4`

Now, if we wanted to merge just the first two dataframes back together, we could use rbind, as follows:

ufo1And2 <- rbind(ufo1,ufo2)

As we can confirm by using NROW that the data has been successfully merged:

> NROW(ufo1)[1] 20083> NROW(ufo2)[1] 20083> NROW(ufo1And2)[1] 40166

However, as we have split our data into four parts, it?s not inconceivable to think that we?ll want to merge four dataframes. Rbind as is only accepts two dataframes, so we have to adjust our code slightly to accommodate for more dataframes.

Merging more than two dataframes

To merge more than two dataframes within R, we can combine our rbind call with do.call.

do.call(“rbind”, list(dataframes to merge))

How this works:

The do.call function allows us to use any existing R functions, such as the aforementioned rbind, but instead of passing just two dataframes in this instance, we are able to pass multiple.

Therefore, when used alongside rbind, we can use do.call to merge more than two dataframes at once.

Putting this all together, we can merge our UFO dataframes back together as follows:

ufoMerged <- do.call(“rbind”, list(ufo1, ufo2, ufo3, ufo4))

We can confirm our data has been successfully merged by checking the number of rows:

> NROW(ufoMerged)[1] 80332> NROW(ufos)[1] 80332

And by performing an anti_join using dplyr, which returns null:

library(dplyr)ufoChecker <- anti_join(ufoMerged, ufos)ufoChecker [1] datetime city state country [5] shape duration..seconds. duration..hours.min. comments [9] date.posted latitude longitude <0 rows> (or 0-length row.names)

This is just one method for merging multiple dataframes, and over at R-Bloggers you can find speed comparisons for alternative methods. However, I find do.call in combination with rbind one of the most intuitive methods for joining multiple dataframes.

Useful Resources

The {do.call} function

allframes = sapply(1:20,function(x)read.csv(paste(x,’csv’,sep=’.’))) > head(allframes) [,1] [,2] [,3] [,4] [,5] [,6]?

www.stat.berkeley.edu

Get Best Software Deals Directly In Your Inbox

Image for post

16