Stata Workshop
- Spring 2003
- Alexandra Guisinger
Using STATA at the Statlab
- Click on
Stata 7 icon (found under the Start Menu)
- The Different Windows in STATA
- Automatically
displayed windows
- Command Window: executes STATA commands; type in commands here and execute with the ENTER key.
- Results Window: displays commands entered and corresponding output/results; for screen display only; contents cannot be edited.
- Review Window: displays commands already entered; to re-run commands without typing them again, "click" the command line; the highlighted command line will appear in the STATA Command Window; ENTER to execute the command; contents may also be saved to a file to edit and to use as a do-file (click on review menu -- the box to the left of the word review -- and select "Save Review Contents"
- Variable Window: displays the variable names and labels and allows for "point and click" additions to the command line
- Other Windows
accessible through the "Windows" menu or by clicking on the respective
menu button
Data Editor: allows for manual entry of data or manual correction
of data (use cautiously; however you will be asked to confirm changes
upon closing the window)
Data Browser: allows for viewing but no editing of data (button
accessible only, safer than Data Editor)
Do-File Editor: allows for the creation of saved lists of command
"do-files"
Viewer Window: allows for you to open log files which are text
file versions of the results window suitable for editing (also provides
access to help files)
2. Entering Data
- Opening an existing
Stata Dataset
- Select Open under the File Menu and browse for the Stata files (.dta)
- Type use "directorypath\filename" [for example: use "i:\workshop\stata\auto.dta"] in the command line . To access files stored on your personal account, the command and address would be something like: use "y:\mydata\newdataset.dta"
- Opening another
program's dataset or an ASCII dataset
- Use StatTransfer
(found under the Start Menu) to transform the file into a Stata dataset
(.dta). A help file is available on the StatLab webpage (http://statlab.stat.yale.edu). - Alternatively for an ASCII file use the command "infile" and follow directions in the Stata Manuals.
- Use StatTransfer
- Entering data manually (for example see below)
- Open the data editor as described above and either type in data or copy and paste data from an excel spreadsheet (caution: do not enter "." for empty cells; Stata will generate these markers by itself.)
- STATA 7 variable names can be up to 32 characters. However, be wary with long names. Stata 6 only recognizes up to 8 characters so long names will make files more difficult to transfer. Also, Stata will attempt to guess the variable when abbrieviates forms of the name are used. Lengthy names may add to confusion or use of a similar variable.
- Please note that Stata is
not backward compatible, in other words Stata 6 cannot open Stata
7 datafiles. Statlab computers uses Stata 7. If using an older version elsewhere,
you should remember to use StatTransfer to convert the datafile.
3. Analyzing Data
- Before you do
any analysis OPEN A LOG FILE
- You must direct
output to a log file to save and/or print the results of your analysis.
To do so either:
- Select Log>Begin from the File Menu and select to save as either log (text) or formatted log (sncl)
- Type in the command line: log using "c:\temp\mylogname", [append or replace] [text or sncl]
- To close your
log either:
- Select Log>Close
- Type in the command line: log close
- Logs capture all the text printed in the results window. To save time editing, consider suspending (log off) the log when it is not necessary (log on to resume). Logs can be edited in wordpad etc. or in the viewer.
- You must direct
output to a log file to save and/or print the results of your analysis.
To do so either:
- The Basic Syntax
of a STATA command is as follows
- COMMAND Variable-list Restrictions, options
- For Example: regress y x1 x2 x3 if x4==2, noconstant
- In the manual,
the generic command lines will look as the following:
- regress depvar [varlist] [weight] [if exp] [in range] [, level(#) beta robust cluster(varname) hc2 hc3 hascons noconstant tsscons noheader eform(string) depname(varname) mse1 plus ]
- Text within brackets[] are optional restrictions or options.
- Underlined sections (bolded above) of the word highlight acceptable abbreviations (i.e. reg instead of regress; noc instead of noconstant)
- Note: Stata is case sensitive
- To add variables to the command line, you may either type the variable or click on the variable name in the variable window
- Obtaining information
about the data file (for this example using data from i:\ workshop\stata\auto.dta)
- describe to display variable names, types and labels (also whether string or number)
- list
to display all the observations and variables in the data file
Note: Interrupt the --more-- display with q (for quit) or the Break button on the Toolbar (1st on right) or <CTRL+<BREAK - Modify list command to display only a subset of the dataset:
list price mpg to display values of price and mpg for all observations
list price mpg in 10/20 to display values of price and mpg for observations 10 through 20
list price mpg if mpg<20 to display values of price and mpg for observations where mpg<20
- Descriptive statistics:
- summarize price mpg returns number of observations, mean, standard deviation, minimum and maximum values of the two variables
- summarize price mpg if mpg==30 returns the summary above only for observations with mpg=30 (note the double equal sign)
- tabulate foreign returns frequency table of foreign, a categorical variable
- tabulate rep78 foreign returns a cross-tabulation of 2 categorical variables, rep78 and foreign
- sort foreign
by foreign: summarize price mpg
Returns descriptive statistics of price and mpg for each of the categories of foreign.
Note: The by statement works for most STATA commands. You most sort using the by variable first.
- Generate/Modify
Variables
- generate wgt_sq = weight^2
- generate lowrep = rep78<=3 Creates a new variable lowrep, an indicator variable (0,1) for low repairs.
- Note the difference with generate lowrep = 1 if rep78<=3 . the latter generates missing values"." for all observations with rep78>3
- tabulate rep78 lowrep to confirm that the new variable has been defined correctly.
- replace weight = weight/2.2 rescales the variable weight into kgs
Note: The generate command only works with non-existing variable names.
The replace command modifies existing variables only.When creating a variable based on a function (min,max,diff, etc) you will need to use the command egen. See help for details.
- Graphs
- Creating Graphs
- graph rep78 for a histogram
- graph mpg weight, title("Miles per Gallon / Weight") for a scatter plot
- graph mpg weight, yscale(0,.) xscale(0,.) to set scales the axes to start at zero.
- Saving Graphs
- graph
mpg weight, saving(c:\temp\graph1) saves the graph as c:\temp\graph1.gph
OR - graph
mpg weight
choose File/Save Graph from the pull-down menu, navigate to c:\temp and save as graph1.gph
- graph
using graph1.gph
displays the graph saved in file graph1.gph
- graph
mpg weight, saving(c:\temp\graph1) saves the graph as c:\temp\graph1.gph
- Printing Graph
- Click on File/Print Graph from the pull-down menu
- Creating Graphs
4. Using Do-Files
- Lists of commands can be stored in the form of do-files. These files are useful both as records of data manipulation and of the estimation processes as well as replication on other datasets.
- Creating Do-Files
- The review window stores
in order all commands entered in the command line. This window can
be saved and then edited in the Do-FIle Window.
- Alternatively, you can enter text directly into the do file window. For example:
- The review window stores
in order all commands entered in the command line. This window can
be saved and then edited in the Do-FIle Window.
- some people prefer to create a line ending marker. To do so type #delimit ; to make a semicolon an end of line marker
- To run the do file select
either:
to run with output showing in the results screen
to run "quietly" i.e. without output showing
- To run only a part of the do-file, use cursor to highlight section to be run (if have used delimit statement this must always include this line)
set more off
log using "c:\temp\Cars_oct26", replace t
use "c:\temp\cardata.dta",cleardescribe mpg foreign rep78
summarize mpg foreign rep78
regress mpg foreign rep78
log close
5. Other Helpful commands
- set memory : use to increase memory capacity. set memory 100m
- set matsize : increases the number of variable allowable from 40 up to 800. set matsize 200
- clear: clears existing data in Stata
- quietly: suppresses output. qby
- replace: overwrites existing files save "c:\temp\myfile,replace"
- append: appends to log file
- reshape: changes "long" datasets to "wide" datasets and vice versa
6. Exiting STATA
- exit OR, if you haven't saved your work, but want to exit anyway: exit,clear
7. Getting Help
- STATA Manuals provide detailed descriptions of commands and options along with examples and methods issues ordered by command. Search index to find commands.
- On-line help (see help menu) allows for search both by topic and by command, but on-line explications lack examples
- For FAQs, Help, and this
workshop, see the StatLab software help page http://statlab.stat.yale.edu/statlab/software.html
EXAMPLE: Creating STATA datasets
- Restart STATA. If STATA is already loaded, clear data in memory using clear
- AMC Concord 4099 22
- AMC Pacer
4749 17
- AMC Spirit
22
- Buick Century
4816 20
- Buick Electra
7827 15
- After you
have entered the data, double-click anywhere in the column to invoke
the STATA variable definition dialogue box
Note: you will not be able to invoke this box until you have entered some data
- Enter variable
names for your columns: make (instead of var1), price and mpg
Note: variable names can be 32 characters long; also, remember that STATA is case sensitive! - save c:\temp\myfile saves data in workspace into a new STATA dataset myfile.dta.
Manual input
You can input data manually using the Editor. Click on the Editor button on the Toolbar (4th from the right).Enter the following 3 columns (make, price and mpg) and 10 lines of data:
Note: A missing value (empty data cell) is coded with a dot: "."; you do not need to type in the dot; if you leave the cell blank, it is automatically inserted.
Saving your data in STATA format
Yale University Social Science Statistical Laboratory
Comments: stathelp@yale.edu
URL: http://statlab.stat.yale.edu
Certifying Authority: Ann Green, Director Social Science Statlab
lm: Feb 21, 2003
Copyright © Yale University, 2003
