home | schedules | software | help | who we are | about | workshops | links | data access | contact us | print version

<  December 2009 >
Su Mo Tu We Th Fr Sa
    1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

Reserve a classroom


Schedule for
12/04/2009


Main Lab
140 Prospect St.
Room 101
10:30am- 1pm STAT 625

Rosenkranz Hall
115 Prospect St.
Room 01
9:25am- 11:15am PLSC 504

Consultant's Desk
140 Prospect St.
Room 100
10am- 2:30pm Dan Campbell
2:30- 7pm No Consultant On Duty


schedules

software

help

who we are

data access

about

workshops

links

Stata Workshop

Spring 2003
Alexandra Guisinger

Using STATA at the Statlab

  1. The Different Windows in STATA

2. Entering Data

3. Analyzing Data

  • Before you do any analysis OPEN A LOG FILE
    • You must direct output to a log file to save and/or print the results of your analysis. To do so either:
      • Select Log>Begin from the File Menu and select to save as either log (text) or formatted log (sncl)
      • Type in the command line: log using "c:\temp\mylogname", [append or replace] [text or sncl]
    • To close your log either:
      • Select Log>Close
      • Type in the command line: log close
    • Logs capture all the text printed in the results window. To save time editing, consider suspending (log off) the log when it is not necessary (log on to resume). Logs can be edited in wordpad etc. or in the viewer.
  • The Basic Syntax of a STATA command is as follows
    • COMMAND Variable-list Restrictions, options
    • For Example: regress y x1 x2 x3 if x4==2, noconstant
    • In the manual, the generic command lines will look as the following:
      • regress depvar [varlist] [weight] [if exp] [in range] [, level(#) beta robust cluster(varname) hc2 hc3 hascons noconstant tsscons noheader eform(string) depname(varname) mse1 plus ]
      • Text within brackets[] are optional restrictions or options.
      • Underlined sections (bolded above) of the word highlight acceptable abbreviations (i.e. reg instead of regress; noc instead of noconstant)
    • Note: Stata is case sensitive
    • To add variables to the command line, you may either type the variable or click on the variable name in the variable window

  • Obtaining information about the data file (for this example using data from i:\ workshop\stata\auto.dta)
    • describe  to display variable names, types and labels (also whether string or number)
    • list  to display all the observations and variables in the data file
      Note: Interrupt the --more-- display with q (for quit) or the Break button on the Toolbar (1st on right) or <CTRL+<BREAK
    • Modify list command to display only a subset of the dataset:
list price mpg to display values of price and mpg for all observations
list price mpg in 10/20 to display values of price and mpg for observations 10 through 20
list price mpg if mpg<20 to display values of price and mpg for observations where mpg<20
  • Descriptive statistics:
    • summarize price mpg returns number of observations, mean, standard deviation, minimum and maximum values of the two variables
    • summarize price mpg if mpg==30 returns the summary above only for observations with mpg=30 (note the double equal sign)
    • tabulate foreign returns frequency table of  foreign, a categorical variable
    • tabulate rep78 foreign returns a cross-tabulation of 2 categorical variables, rep78 and foreign
    • sort foreign
      by foreign: summarize price mpg

      Returns descriptive statistics of price and mpg for each of the categories of foreign.
      Note:  The by statement works for most STATA commands. You most sort using the by variable first.
    TIP: to recall previous commands, use the <PageUp and <PageDown keys, or highlight them in the Review Window and ENTER to execute.
  • Generate/Modify Variables
    • generate wgt_sq = weight^2
    • generate lowrep = rep78<=3 Creates a new variable lowrep, an indicator variable (0,1) for low repairs.
    • Note the difference with generate lowrep = 1 if rep78<=3 . the latter generates missing values"." for all observations with rep78>3
    • tabulate rep78 lowrep to confirm that the new variable has been defined correctly.
    • replace weight = weight/2.2 rescales the variable weight into kgs

    Note: The generate command only works with non-existing variable names.
    The replace command modifies existing variables only.

    When creating a variable based on a function (min,max,diff, etc) you will need to use the command egen. See help for details.

  • Graphs
    • Creating Graphs
      • graph rep78  for a histogram
      • graph mpg weight, title("Miles per Gallon / Weight") for a scatter plot
      • graph mpg weight, yscale(0,.) xscale(0,.) to set scales the axes to start at zero.
    • Saving Graphs
      • graph mpg weight, saving(c:\temp\graph1) saves the graph as c:\temp\graph1.gph
        OR
      • graph mpg weight
        choose File/Save Graph from the pull-down menu, navigate to c:\temp and save as graph1.gph
      • graph using graph1.gph
        displays the graph saved in file graph1.gph
    • Printing Graph
      • Click on File/Print Graph from the pull-down menu

4. Using Do-Files

  • Lists of commands can be stored in the form of do-files. These files are useful both as records of data manipulation and of the estimation processes as well as replication on other datasets.
  • Creating Do-Files
    • The review window stores in order all commands entered in the command line. This window can be saved and then edited in the Do-FIle Window.
    • Alternatively, you can enter text directly into the do file window. For example:
    • set more off
      log using "c:\temp\Cars_oct26", replace t
      use "c:\temp\cardata.dta",clear

      describe mpg foreign rep78

      summarize mpg foreign rep78

      regress mpg foreign rep78

      log close

    • some people prefer to create a line ending marker. To do so type #delimit ; to make a semicolon an end of line marker
    • To run the do file select either:
      • to run with output showing in the results screen
      • to run "quietly" i.e. without output showing
    • To run only a part of the do-file, use cursor to highlight section to be run (if have used delimit statement this must always include this line)

5. Other Helpful commands

    • set memory : use to increase memory capacity. set memory 100m
    • set matsize : increases the number of variable allowable from 40 up to 800. set matsize 200
    • clear: clears existing data in Stata
    • quietly: suppresses output. qby
    • replace: overwrites existing files save "c:\temp\myfile,replace"
    • append: appends to log file
    • reshape: changes "long" datasets to "wide" datasets and vice versa

     

6. Exiting STATA

      • exit OR, if you haven't saved your work, but want to exit anyway: exit,clear

7. Getting Help

    • STATA Manuals provide detailed descriptions of commands and options along with examples and methods issues ordered by command. Search index to find commands.
    • On-line help (see help menu) allows for search both by topic and by command, but on-line explications lack examples
    • For FAQs, Help, and this workshop, see the StatLab software help page http://statlab.stat.yale.edu/statlab/software.html

 

EXAMPLE: Creating STATA datasets

      • Restart STATA. If STATA is already loaded, clear data in memory using clear

    Manual input
    You can input data manually using the Editor. Click on the Editor button on the Toolbar (4th from the right).

    Enter the following 3 columns (make, price and mpg) and 10 lines of data:

    AMC Concord      4099        22
    AMC Pacer           4749        17
    AMC Spirit                            22
    Buick Century        4816        20
    Buick Electra         7827        15

    Note: A missing value (empty data cell) is coded with a dot: "."; you do not need to type in the dot; if you leave the cell blank, it is automatically inserted.

    • After you have entered the data, double-click anywhere in the column to invoke the STATA variable definition dialogue box
      Note: you will not be able to invoke this box until you have entered some data
       
    • Enter variable names for your columns:  make (instead of var1), price and mpg
      Note: variable names can be 32 characters long; also, remember that STATA is case sensitive!

    Saving your data in STATA format

    • save c:\temp\myfile saves data in workspace into a new STATA dataset myfile.dta.

Yale University Social Science Statistical Laboratory
Comments: stathelp@yale.edu
URL: http://statlab.stat.yale.edu

Certifying Authority: Ann Green, Director Social Science Statlab
lm: Feb 21, 2003
Copyright © Yale University, 2003