Monday, November 6, 2017

Detailed summary statistics in Stata

The command for basic summary statistics in Stata is summarize var1, detail, with var1 replaced by your variable name (or, as you'll see, more than one variable name). 

Let's see this code in action. Copy and paste the following code into your Stata command window:

webuse census13
summarize brate, detail




Here, we've called up Stata's built-in dataset and asked it to provide detailed summary statistics for the birthrate variable.




If you need basic summary statistics for more than one variable, simply list all of the variables after the summarize command:

summarize brate pop, detail



With this command, you get detailed descriptive statistics on two variables at the same time.

Let's say you wanted basic summary statistics on birthrate and population by region. You would use the following code:

by region, sort: summarize brate pop, detail




As you can see, this approach gives you basic summary statistics sorted by region.


Convinced of our expertise? Let 272Analytics assist with data analysis and/or methodology for your quantitative thesis or dissertation.

Basic summary statistics in Stata

The command for basic summary statistics in Stata is summarize. Let's see this code in action. Copy and paste the following code into your Stata command window:

webuse census13
summarize brate




Here, we've called up Stata's built-in dataset and asked it to provide basic summary statistics for the birthrate variable.



If you need basic summary statistics for more than one variable, simply list all of the variables after the summarize command:

summarize brate pop





With this command, you get basic summary statistics on two variables at the same time.

Let's say you wanted basic summary statistics on birthrate and population by region. You would use the following code:

by region, sort: summarize brate pop




As you can see, this approach gives you basic summary statistics sorted by region.

When you're ready for more complex summary statistics, check out our post on detailed summary statistics in Stata.

Convinced of our expertise? Let 272Analytics assist with data analysis and/or methodology for your quantitative thesis or dissertation.

Sunday, November 5, 2017

Histograms in Stata

A histogram is a graph of the distribution of a continuous variable. Let's create a histogram using Stata's built-in census dataset. Copy and paste the following code into your Stata command window:

webuse census13
hist pop



The resulting histogram indicates that the variable of population is not normally distributed. Most states have relatively smaller populations.



Now let's say you wanted to see the distribution of population by region. Try this command:

hist pop, by(region)




The resulting histogram shows you that the Western states tend to have lower populations.

There are other options within the hist command that can be useful. Let's say you want to increase your number of bins. Stata picked 10 bins as the default for your histogram, but, in some cases. you can make your histogram more informative by increasing the number of bins. Try this code:

histogram pop, bin(20)



See the differences between this histogram and the one you produced earlier?

Some other ways to manipulate histograms in Stata are to (a) add labels and (b) change the y axis to different measures. Typically, the y axis measure density, but you can change it to percent:

histogram pop, percent

Try adding labels to each bar, keeping the percent option, and expanding to 20 bins:

histogram pop, bin(20) percent addlabel



Like all Stata graphics, histograms can be manipulated in an extremely broad variety of ways, and with a fairly simple and intuitive series of commands. That's one of Stata's benefits in comparison to other software packages.

Convinced of our expertise? Let 272Analytics assist with data analysis and/or methodology for your quantitative thesis or dissertation.






Saturday, November 4, 2017

Scatter plots in Stata

A scatter plot is a simple two-way plot that graphs the values of one variable against the values of another variable. Let's see a scatter plot in action, using the existing Stata sample dataset known as census13. Copy and paste the following code into your Stata command window:

webuse census13
scatter brate pop 




Notice that the variable listed second in the scatter plot constitutes the x axis. You can switch it to the y axis by changing the scatter plot as follows:

scatter brate pop




In the scatter plot, we see that, as a state's population increases, its birth rate declines. In the different regression tutorials, you'll learn more about how we can plot the fit between two variables.

Convinced of our expertise? Let 272Analytics assist with data analysis and/or methodology for your quantitative thesis or dissertation.

Friday, November 3, 2017

Pearson correlation in Stata

In Stata, Pearson corrrelation is carried out by the following code:

pwcorr var1 var2

Here, var1 and var2 are the two variables you're correlating. Unless you've actually named your variables var1 and var2, you'll have to insert your own variable names here.

Bear in mind that you can correlate as many variables as you like. If you were correlating four variables, your code would be:

pwcorr var1 var2 var3 var4

What this code returns is a Pearson (r) value, which can vary from -1 (perfectly negative correlation) to 1 (perfectly positive correlation), with 0 representing the complete absence of correlation. However, the r value alone isn't of much use. You'll want to know whether the correlation is statistically significant, in which case you will alter the code as follows:

pwcorr var1 var2, sig

Let's try this code on a real-world example. Copy and paste the following lines of code directly into command box in Stata and press the RETURN key.

webuse census13
pwcorr brate pop, sig


In this Census dataset, brate is birth rate and pop is population, so are we correlating the relationship between the population of a state and its birth rate. 

Once you enter the commands, a correlation matrix is produced. Obviously, the correlation of birth rate with itself is 1. The correlation of birthrate with population yields an r value of -0.283, and it is statistically significant, because is < .05. It therefore seems that, the larger the state, the lower its birthrate. 





Obviously, correlation is just the beginning of numerous possible analyses of the relationships between these two variables. Elsewhere, we've provided tutorials on performing ordinary least squares (OLS) regression, also known as linear regression, which is a common procedure conducted after correlation. We've also provided guidance on how to construct scatter plots of the relationship between variables in order to complement your statistical analysis.

Convinced of our expertise? Let 272Analytics assist with data analysis and/or methodology for your quantitative thesis or dissertation.