Data Wrangling Basics


Unfortunately, most of the time spent doing ‘data science’ are the mundane, boring, and occasionally frustrating bits known as data wrangling.

Part 2 Goals

  1. import data and identify data inconsistencies
  2. break coding goals into small tasks
  3. document your work using a .do file (script)
  4. understand and execute routine data manipulation operations such as:
    • transformations (gen & egen)
    • reshaping (melting & casting)
  5. conduct basic summaries, tabulations and visualizations on the FAD data set

Key Stata functions for tidying data

  • reshape: converting data between wide and long datasets
  • generate: creating new variables
  • egen: extensions for generate (flexible and powerful)
  • replace: replace contents of an existing variable
  • rename: rename a variable (see also rename group for more flexiblity)
  • drop: drop variables or observations from a data frame
  • keep: keep variables or observations from a data frame
  • label: manipulate variable labels or value labels

Up next…

Now that we have a tidy dataset, we will learn how to combine data and conduct basic summary statistics and tabulations.