binscatter

A stata program to generate binned scatterplots.

by Michael Stepner

binscatter is a Stata program which generates binned scatterplots. These are a convenient way of observing the relationship between two variables, or visualizing OLS regressions. They are especially useful when working with large datasets.

Installation

Open Stata and install binscatter from the SSC repository by running the command:

ssc install binscatter

After installing binscatter, you can read the documentation by running help binscatter. The Examples section of the help file contains a clickable walk-through of binscatter's various features.

Introductory Slide Deck

This slide deck provides a thorough introduction to binscatter. It explains:

You can also download the source files, which include the Stata code to generate every figure shown in the slide deck.

Technical Description

Binned scatterplots are a non-parametric method of plotting the conditional expectation function (which describes the average y-value for each x-value).

To generate a binned scatterplot, binscatter groups the x-axis variable into equal-sized bins, computes the mean of the x-axis and y-axis variables within each bin, then creates a scatterplot of these data points. By default, binscatter also plots a linear fit line using OLS, which represents the best linear approximation to the conditional expectation function.

binscatter provides built-in options to control for covariates before plotting the relationship, and can automatically plot regression discontinuities. All procedures in binscatter are optimized for speed in large datasets.

Example

The following graph shows the relationship between quality of teaching in elementary or middle school and a student's earnings at age 28.

Binscatter example

This graph is a visual representation of a multivariate regression with 650,965 observations. The regression finds that after controlling for a number of characteristics that affect student achievement (like class size and parental income), a 1 unit increase in Normalized Teacher Value Added is associated with a $350 increase in Earnings at Age 28.

The graph was created using binscatter:

Source of Example: Raj Chetty, John N. Friedman, and Jonah E. Rockoff. 2013. "Measuring the Impacts of Teacher II: Teacher Value-Added and Student Outcomes in Adulthood." NBER Working Paper 19424. http://obs.rc.fas.harvard.edu/chetty/w19424.pdf

The graph above is Figure 2a in the paper, and the associated regression is reported in Table 3, Column 1.

Acknowledgements

binscatter is based on a program first written by Jessica Laird.

This program was developed under the guidance and direction of Raj Chetty and John Friedman. Laszlo Sandor provided suggestions which improved the program considerably, and offered abundant help testing it.

Thank you also to the users of early versions of the program who devoted time to reporting the bugs that they encountered.