Skip to main content

Section 5.5 Accessing and Modifying Data Frame Variables

Just one more topic to pack in before ending this chapter: How to access the stored variables in our new dataframe. R stores the dataframe as a list of vectors and we can use the name of the dataframe together with the name of a vector to refer to each one using the "$" to connect the two labels like this:
> myFam$myFamAge
[1] 43 42 12 8 5
If youโ€™re alert you might wonder why we went to the trouble of typing out that big long thing with the $ in the middle, when we could have just referred to "myFamAge" as we did earlier when we were setting up the data. Well, this is a very important point. When we created the myFam dataframe, we copied all of the information from the individual vectors that we had before into a brand new storage space. So now that we have created the myFam dataframe, myFam$myFamAge actually refers to a completely separate (but so far identical) vector of values. You can prove this to yourself very easily, and you should, by adding some data to the original vector, myFamAge:
Figure 5.5.1. This is an example that showcases a new value being added to an independent variable named myFamAge. After, the example prints both myFamAge and myFamAge that was stored in the data frame. The variable have differing values.
Look very closely at the five lines above. In the first line, we use the c() command to add the value 11 to the original list of ages that we had stored in myFamAge (perhaps we have adopted an older cat into the family). In the second line we ask R to report what the vector myFamAge now contains. Dutifully, on the third line above, R reports that myFamAge now contains the original five values and the new value of 11 on the end of the list. When we ask R to report myFam$myFamAge, however, we still have the original list of five values only. This shows that the dataframe and its component columns/vectors is now a completely independent piece of data. We must be very careful, if we established a dataframe that we want to use for subsequent analysis, that we donโ€™t make a mistake and keep using some of the original data from which we assembled the dataframe.
Hereโ€™s a puzzle that follows on from this question. We have a nice dataframe with five observations and four variables. This is a rectangular shaped data set, as we discussed at the beginning of the chapter. What if we tried to add on a new piece of data on the end of one of the variables? In other words, what if we tried something like this command:
myFam$myFamAge <- c(myFam$myFamAge, 11)
If this worked, we would have a pretty weird situation: The variable in the dataframe that contained the family membersโ€™ ages would all of a sudden have one more observation than the other variables: no more perfect rectangle! Try it out and see what happens. The result helps to illuminate how R approaches situations like this.
You have attempted of activities on this page.