part 2
Data Wrangling Basics
Introduction
Unfortunately, most of the time spent doing ‘data science’ are the mundane, boring, and occasionally frustrating bits known as data wrangling.
Part 2 Goals
- import data and identify data inconsistencies
- break coding goals into small tasks
- document your work using a .do file (script)
- understand and execute routine data manipulation operations such as:
- transformations (gen & egen)
- reshaping (melting & casting)
- conduct basic summaries, tabulations and visualizations on the FAD data set
Key Stata functions for tidying data
reshape
: converting data between wide and long datasetsgenerate
: creating new variablesegen
: extensions for generate (flexible and powerful)replace
: replace contents of an existing variablerename
: rename a variable (see alsorename group
for more flexiblity)drop
: drop variables or observations from a data framekeep
: keep variables or observations from a data framelabel
: manipulate variable labels or value labels
Up next…
Now that we have a tidy dataset, we will learn how to combine data and conduct basic summary statistics and tabulations.