Tools
Stats and data quality resources
- Statistical Test Selector: UCLA’s guide to what statistical test should be used, with example code in Stata, R, SPSS, and SAS
- Bad data and how to fix them: Encyclopedia of all the things that can and do go wrong with data, and suggestions on how to fix.
- DataBasic: Suite of web tools for beginners to work with data
Stata resources
- If you’re new to Stata, check out UCLA’s Stata Page. It has a wealth of resources to get you started.
R resources
- If you’re new to R, one of the first things you should do is install RStudio. It’ll make your life a whole lot better.
- Introduction to R functions
- swirl: A package to learn R within R.
- RStudio cheatsheets
- R for Matlab users
- Analysis and Stats in R, Python (numpy), Matlab, and Julia
- Evaluating regressions in R
Markdown and Github resources
- Markdown encyclopedia
- Git cheatsheet
- Using R with Github
- Building an automatic Github.io page
- Using Jekyll in Github Pages
- Selector Gadget to identify CSS objects
Useful Stata packages
Stata has a number of user-written commands that are contributed by RePEc and housed at the Boston College Statistical Software Components (SSC) archive. As long as you are connected to the internet, you can download and install a package by simply typing ssc install estout
in the Stata command window. Once the package has installed, type help estout
to view the help file associated with the package. To view trending packges from SCC type ssc whatshot, n(25)
in Stata. This will return the top 25 packages at SSC.
Data Wrangling/Munging
- egenmore: Stata’s
egen
command can execute tons of useful data munging operations. Ifegen
is not enough for you, check out Nicholas Cox’s egenmore. The package includes variousegen
extensions. - reclink: probabilistically match records
- jarowinkler: calculate the Jaro-Winkler distance between strings
- carryforward: carry forward/backward previous observations
- valuesof: display and return in
r(values)
the varlues of a variable joined together in a single string. - use13: load datasets created with Stata 13 in Stata 10-12.
- usespss: load SPSS files
- usesas: load SAS files
- insheetjson: import tabular data from JSON sources on the web
- shp2dta: converts shape boundary files (shapefiles) to Stata datasets
- winsor2: winsorize a varlist
- trimmean: trimmed means as descriptive or inferential statistics
- nearmrg: provide nearest-match merging of datasets
- tscollap:
- mdesc: tabulate prevalence of missing values
- mkcorr: generate correlation table formatted for easy inclusion in articles
- sxpose: transpose string variable dataset
- fs: show names of files in a compact form
- confirmdir: confirm if a directory exists
- extremes: list extreme values of a variable
- nsplit: split numeric variables into components
- kountry: standardize country names across datasets
- :
- :
- :
Useful R packages
R has a long list of libraries that extend the functionality of base R and make it easier to use. Here’s a running list of packages that we find particularly helpful, broken down by category. Core libraries are indicated with an asterisk, and are part.icularly recommended for all users.
To install any of the packages, use install.packages("<package name>")
, as in: install.packages("ggplot2")
. All packages can be found on R’s CRAN
A lazy way to install and load all these packages
- Laura is in the process of creating an R library called llamar to make it easier to load the most useful libraries at once (and also to create some custom plotting themes and functions). It’s under development now, so apologies for any lack of documentation and/or anything that breaks in the future.
- To load the packages listed here, copy this code into R:
install.packages("devtools")
library(devtools)
devtools::install_github("flaneuse/llamar")
library(llamar)
loadPkgs()
- If you have any comments, feel free to email us
Data Wrangling
- *dplyr: filter, create new variables, summarise, … Basically, anything you can think to do to a dataset
- *tidyr: reshape and merge datasets
- data.table: similar to dplyr but good for large datasets; some extra functionality
- stringr: string manipulation
- lubridate: better way to work with dates
- zoo: running averages, amongst other things
Visualization and Interactive plots
- **ggplot2: Hadley Wickham’s incredibly powerful plotting library built off of the Grammar of Graphics. So useful and well-designed it gets two asterisks.
- ggplot2 extension packages: Running list of extensions to ggplot2
- ggrepel: extends ggplot2 to avoid overlapping text
- ggvis: data visualization package that enables interactive graphics
- d3heatmap: creates D3-based heatmaps in R
- htmlwidgets: suite of packages that port javascript visualization packages into R
- metricsgraphics: creates interactive plots based on the MetricsGraphics.js / D3 chart library
- rCharts: creates interactive plots based on several javascript charting libraries
- DiagrammeR: creates graph diagrams using a Markdown-like syntax
- packcircles: creates non-overlapping packed circles
- waffle: creates isotype graphs (a single object repeated N times)
Geospatial analysis and mapping
- ggmap: geocoding and geospatial library
- leaflet: R wrapper to embed dynamic maps using leaflet.js
- choroplethr, choroplethrAdmin1: easy way to create choropleths (heatmaps for a map) at the Admin 0- (country) and Admin 1-level (states/provinces)
- RgoogleMaps: overlays plots on a Google map
Interactivity
- shiny: easy way to create custom, interactive web applications in R
- shinydashboard: uses Shiny to create customized dashboards
- shinythemes: customize appearance of Shiny apps
Reporting, publication, and custom appearance
- *knitr: helper function to produce RMarkdown documents
- formattable: better tables for RMarkdown documents
- animation: make GIFs in R
- RColorBrewer: imports Cynthia Brewer’s excellent color palettes as R objects
- extrafont: allows you to use a font other than Helvetica in plots
Importing files
- haven: imports in files from Stata, SAS, and SPSS
- foreign: an alternative to haven to import from Stata, SAS, and SPSS. Doesn’t support Stata 14 (yet?)
- readr: an advanced form of the base
read.csv
function with some added functionality. - readxl: imports in multiple sheets from Excel
- googlesheets: connects to Google Drive spreadsheets.
- rvest: scrapes websites
- pdftools: scrapes .pdf files
- jsonlite: converts between JSON objects and R ones
Developer libraries
- *devtools: makes writing and releasing R packages a breeze. For casual users, allows you to install packages directly from Github using
install_github
- roxygen2: allows for easy commenting of functions and packages
- testthat: reproducible testing functions for package development
- microbenchmark: timing function to profile how long functions take to execute
- profvis: allows visual profiling of function timing to optimize performance
Fitting libraries
- *broom: cleans up results from any fitted model into something neat and organized
- MASS
- sandwich
- lmtest
- plm
- ggalt
- coefplot
- cluster
- GWmodel
Misc.
- swirl: A package to learn R within R.