# Match as an alternative to merge in R

One of the most common operations in data wrangling is joining two sets of data by a common variable. Probably the most popular method for this is the obscure `vlookup`

function in Excel (simply because it’s the most widely used software for data manipulaton). The closest alternative in base R is `merge`

and the dplyr package contains the join function family which is even more convenient. But there is a more simple and direct solution when only one variable needs to be added to a dataset.

Suppose we have two data frames, `df.a`

and `df.b`

, and we wish to get the values of `other.var`

from `df.b`

into `df.a`

so that each `id`

gets their “own” value. There are various methods for joining, each yielding a different result. But in my experience *left join* on a single variable is the most frequent and this is what we will explore here.

# Left join with ‘merge’

When using `merge`

, we specify the arguments of the function, run it and then through some *magic* a new dataset with requested columns is created. Note that we don’t need to specify by which variable we wish to merge if variable names are the same.

# Left join with ‘match’

A more hands-on approach involves first figuring out which rows in `df.a`

correspond to which rows in `df.b`

according to `id`

. The `match`

function allows us to do just that.

Now that we have the row numbers, we can simply return `other.var`

in `df.b`

where the matches occur. A useful side effect is that we can define the name for the new variable while matching.

Now let’s compare the results.

We can see that the result is essentially the same. What `merge`

has done is rearranged the rows which is something we might not want to happen. So I encourage the use of `match`

when possible since it allows the addition of a single column without running a function over entire data sets.

This post was originally written in *rmarkdown* and the code can be found here.