In `R`

, lists can be the most powerful yet most confusing objects, and specifically with respect to references. Essentially, lists are a general type of container that can hold almost any type of `R`

object. For example, let’s say we have a list of foods I ate for the day.

Note: whenever I’m talking about “referencing”, I’m referring to `R`

extracting things (usually called elements or objects) from another thing (an `R`

object).

mylist = list(breakfast = c("eggs"), lunch = c("salad", "dressing"), dinner = c("chicken"))

Conceptually, I will refer to a piece of tupperware as a list. Now, if we use this figure

(source http://3.bp.blogspot.com/-5vJn-RLtceE/UB_iAOhdw3I/AAAAAAAACPU/f9D1c4tXOE4/s1600/20120805_234940_rounded_corners.jpg), then the middle piece of tupperware would be `mylist`

.

Now, let’s say the smaller of the two tupperwares are `breakfast`

and `dinner`

and the two-compartment container is `lunch`

. In this case, I threw the salad and dressing together. Now we have:

So now, to reference the object, I can say:

mylist["breakfast"]

$breakfast [1] "eggs"

mylist[1]

$breakfast [1] "eggs"

class(mylist["breakfast"])

[1] "list"

This is telling `R`

that I want to grab the element named `breakfast`

in the first line, or the first element of the list (returns the same thing because breakfast is the first element). This returns an object of type `list`

. This would be like opening up the `mylist`

tupperware, and taking out the `breakfast`

tupperware. The food for breakfast is still `eggs`

, but it’s in the tupperware `breakfast`

. I’m hungry, so I just want to take out the `eggs`

. I would do this by:

mylist$breakfast

[1] "eggs"

mylist[["breakfast"]]

[1] "eggs"

mylist[[1]]

[1] "eggs"

class(mylist[["breakfast"]])

[1] "character"

This tells `R`

that I want you to return (aka give me) the objects in the element `breakfast`

. This is like saying, I want the food in the breakfast tupperware; give me those eggs. This is the “double bracket” `[[`

notation for referencing, and has the same behavior of using the dollar sign ($) referencing as with a `data.frame`

if your list has names. As always, you can use positional referencing by using `[[1]]`

saying I want the contents of the first list element.

The same applies to the other meals of the day, with the exception that `lunch`

returns a vector of length 2, instead of `breakfast`

and `dinner`

who have length 1 (only one piece of food).

### Lists of Lists

Now, let’s say I don’t want my salad and dressing all together in the salad, as it gets soggy by lunch time. So I put my salad in its own tupperware container and the dressing in its own:

In `R`

, this would be:

mylist2 = list(breakfast = c("eggs"), lunch = list("salad", "dressing"), dinner = c("chicken"))

Now, `breakfast`

and `dinner`

contained in the same way, but `lunch`

is different. Now let’s take out my `lunch`

:

mylist2["lunch"]

$lunch $lunch[[1]] [1] "salad" $lunch[[2]] [1] "dressing"

mylist2[2]

$lunch $lunch[[1]] [1] "salad" $lunch[[2]] [1] "dressing"

class(mylist2["lunch"])

[1] "list"

Ok, this is returning a `list`

as before. Let’s use the double bracket or ($) referencing:

mylist2$lunch

[[1]] [1] "salad" [[2]] [1] "dressing"

mylist2[["lunch"]]

[[1]] [1] "salad" [[2]] [1] "dressing"

mylist2[[2]]

[[1]] [1] "salad" [[2]] [1] "dressing"

class(mylist2[["lunch"]])

[1] "list"

What gives? I used the “$”! Yes, this takes out the element `lunch`

, but lunch is another `list`

! It’s kind of like those Matryoshka dolls:

(source http://en.wikipedia.org/wiki/File:Russian_Dolls.jpg.

Then `mylist2`

is a list of 2 vectors (`breakfast`

and `dinner`

), and a 2-element `list`

(`lunch`

). Now if we wanted to get the first element of `lunch`

, we could run:

mylist2$lunch[1]

[[1]] [1] "salad"

class(mylist2$lunch[1])

[1] "list"

mylist2$lunch[[1]]

[1] "salad"

class(mylist2$lunch[[1]])

[1] "character"

where we saw that `mylist2$lunch`

returned a `list`

, so we can handle referencing the same way we did with `mylist`

from the beginning of the article.

### Why lists? WHYYY?

Now, a lot of new users approach this as: “lists are complicated/dumb/useless/too confusing/whatever” and I like to use this example:

dataset <- data.frame(outcome = rnorm(100, mean = 2), x = rep(c(0, 1), each = 50)) mod = lm(outcome ~ x, data = dataset) smod = summary(mod) MSE = mean((dataset$outcome - predict(mod, newdata = dataset))^2) mod.results = list(model = mod, smod = smod, data = dataset, MSE = MSE)

The first element of the list is a model, the second element is the summary of the model. the third element is the dataset used to fit that model, and the fourth element is the mean squared error (MSE) of that model. Linear models in `R`

(fit using the `lm`

function) has the class `lm`

, but can be thought of as a list of elements:

names(mod)

[1] "coefficients" "residuals" "effects" "rank" [5] "fitted.values" "assign" "qr" "df.residual" [9] "xlevels" "call" "terms" "model"

Now let’s say I wanted to get the adjusted R^2 and MSE from my results:

c(mod.results$smod$adj.r.squared, mod.results$MSE)

[1] -0.005527 0.939894

So I can reference my large list of results, getting the summary of my model, then referencing the adjusted R^2 slot (aka position), and then getting the MSE that I created above.

### Label me: I want a name!

The general recommendation is to use labels whenever possible. This allows you to understand what you’re extracting when seeing the code (assuming you labeled in an informative way (“element1” is NOT informative)); it is more safe, especially if you change the order of the elements; it allows you to use the ($) referencing.

### Conclusion

Overall, lists are powerful, but can be confusing when you start doing referencing. You can do single brackets `[`

, which will return a list, which you would want to do if you want `mylist`

without `breakfast`

:

mylist[c("lunch", "dinner")]

$lunch [1] "salad" "dressing" $dinner [1] "chicken"

(don’t skip breakfast, it’s the most important meal of the day). Also, you can use a “$” or double bracket (`[[`

) referencing when you want to get the *contents* of the elements of a list, which may be a list as well. Complicated lists may not seem useful initially, but can be very convenient when storing results or things of many different types that don’t easily “fit together”.

PS. This is the way I think of σ-fields as well, but that is a whole other topic altogether.