3 Week 3
4 Data wrangling
R and RStudio make data cleaning fast and reproducible. The module this week is all about the dplyr package, which has lots of functions that make data wrangling fun! This week, there are a total of 8 videos to watch and each video is 10-15 min long so you can dip in and out across the week. The idea is for you to code along with the videos, stopping and starting and troubleshooting as you go. Try the exercises Danielle recommends AND to see whether you are really getting it, have a go at the SELF TEST challenge.
For this reason, you should allocate ~ 3 hours to your coding this week.
4.1 Resources
4.2 Learning outcomes
By the end of this module you will be able to…
- Quickly clean and
rename
variables - Use
filter
to look subsets of your dataframe and arrange to order by a given variable - Use
select
to make your dataframe smaller - Use
mutate
to create new variables - Join dataframes by row and common columns
- Convert data from wide to long and back again using pivot functions
4.3 dplyr, dance with data
4.3.1 dealing with variable names
In this video, Danielle introduces the small world of words dataset and shows you how to deal with variable names.
4.3.2 subset using filter()
In this video, Danielle introduces how to subset your dataset using the filter()
function.
4.3.3 sort using arrange()
In this video, Danielle shows you how to sort your data using the arrange()
function.
4.3.4 make your data smaller using select()
In this video, Danielle explains the select()
function for selecting columns in a data set.
4.3.5 make new variables with mutate()
In this video, Danielle talks about how to make new variables using the mutate()
function, and also a weird digression into measurement issues in psychology.
4.3.6 join data using bind_rows()
In this video, Danielle provides an introduction to the bind_rows()
function in dplyr.
4.3.7 make wide data long with pivot functions
In this video, Danielle’s unnecessarily long ramble about pivot_longer()
and pivot_wider()
, wherein it is obvious that the lecturer is very tired, but also manages to talk a little more about the pipe, order of operations, and other neat things.
4.3.8 joining data with left_join()
In this video, Danielle gives a brief wrap up to the dplyr series, in which left_join()
is briefly discussed. She also gets grumpy about sexism in language, and provides pointers to other resources.