home | schedules | software | help | who we are | about | workshops | links | data access | contact us | print version

<  November 2009 >
Su Mo Tu We Th Fr Sa
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

Reserve a classroom


Schedule for
11/23/2009


Main Lab
140 Prospect St.
Room 101
8:30am- 5pm No Classes

Rosenkranz Hall
115 Prospect St.
Room 01
8:30am- 5pm No Classes

Consultant's Desk
140 Prospect St.
Room 100
10am- 1:30pm Jennifer Green
1:30- 5pm Taylor Arnold


schedules

software

help

who we are

data access

about

workshops

links

Stata FAQ


What is the best way to transfer my data (from Excel, from SPSS, from a text file, etc.) into Stata?

The best way to transfer any non-Stata data file (and indeed other versions of Stata data files) into Stata 7 is to use the StatTransfer program, available on all Statlab computers under the Start Menu.

  1. In the “Input File Type” box select from the pull down menu the type of file you wish to transfer. Then type in the file name or use the Browse function to find the file.
  2. In the “Output File Type” box, select “Stata (Standard Version) 7” from the pull down menu. StatTransfer will automatically generate a file name based on your original file name. To change, enter your own file specification.
  3. Click on “Transfer”. StatTransfer will alert you when the transfer is completed. As a check, note whether the “observations transferred” is as expected.

To see other available StatTransfer options such as the deletion or recoding of observations/variables, see the StatTransfer help file.

Why can't I open my Stata .dta file?

The new version of Stata is not backwards compatible; in other words Stata 6 cannot open Stata 7 (the current version of Stata available at the Statlab) or Stata/Se files. However, the updated version of StatTrasfer (also available on Statlab computers) can convert files across all three file versions: 6, 7, and SE. If using Stata 6 at home, remember to use StatTransfer before leaving the lab.

How do I open my log files in MS Word?

Stata 7 does funky stuff with log files when you open them in word that did not happen in older versions. There's an easy way to translate logs into a version readable in word:
At the end of your log file, type: translate y:\filename.smcl y:\filename.txt
In word, open as plain text and change the font to courier 8 point

How do I tell the difference between two similar data sets?

Stata offers an easy way to compare the data in two “identical” datasets.

  1. Make sure that all variable names are identical (to change use the command “rename”)
  2. Sort both datasets by the same unique identifier or by a set of variables that create a unique identifier (for example, sort country year)
  3. Merge together the two datasets using the “update” option
  4. Tabulate the automatically created variable _merge. If datasets are identical, _merge=3 in all cases. If for some observations _merge=5, then data in the two files disagree. (See Stata Reference Manual H-P, page 317, for more information)

Commands may look as follows for the two data files data_v1.dta and data_v2.dta.

        Use “c:\temp\data_v2.dta”, clear
        Sortid
        Save “c:\temp\data_v2.dta”, replace
        Use “c:\temp\data_v1.dta”, clear
        Sort id
        Merge id using “c:\temp\data_v2.dta”, update
        Tabulate_merge
    

I want to merge two datasets together. How do I do that in Stata?

Stata provides three different commands for merging datasets together: append, joinby, and merge. See which is appropriate for your use and then find the command help pages either through Stata itself (under the help menu) or in the Stata manuals which are available through the Statlab consultant.

  1. Append: Use append to add more records to a current data file. For example, if you had collected identical information on students from two different high schools and had originally placed the information in two different Stata data files, you could use “append” to aggregate the data into one large data set with records from both schools.
  2. Joinby: Use joinby to link group attributes to individual attributes. For example, if you had collected information on high schools (average GPA, funding, etc.) in one Stata file and information on individual students in another, you could use “joinby” to merge schools’ attributes on to each student observation.
  3. Merge: Use merge to add further information (variables) on observations to a current datafile. For example, if you had two different sources of information on high school students, such as a data file of school records and a data file of survey answers, you could use “merge” to generate a file which combined for each record a student’s academic performance and his or her answers to the survey. This merge would require a unique identifier for each case (for example student ID); this identifier must be the same in each file. Identifiers may be created by using a combination of variables (such as first and last name); however, be cautious that these combinations will indeed be unique.

How can I create journal-ready regression output?

The command "outreg" can be used following almost any estimation command in Stata to generate tables suitable for presentation (in other words, output much nicer that displayed in the Stata output window or saved in log files). Help for this command can be found by typing "help outreg" in the command line or by searching for "outreg" (find search in the help menu). Currently, the outreg command is not detailed in the manuals.

Use of outreg creates an ASCII text file which among other options can be opened in Excel. One of the benefits of the outreg command is its ability to append successive estimations results, even those with different variables. Also, outreg utilizes variable labels, automatically replacing short variable names with clear defined variable descriptions. Furthermore, titles can be added to the regression table and to the table columns by using the title and ctitle options respectively.

By default, outreg tables include coefficient estimates, report t statistics with asterisks for standard significance levels (1% and 5%), number of observations, true R-squareds (no pseudo R-squareds), and the number of groups in panel estimation. However, any additional statistic saved during the estimation process (see the "Saved Results" section within each estimator's stata manual discussion) can also be added to the output by using the "addstat" option. For example, pseudo R-squared's can be added to output for Maximum-likelihood logit estimation by adding to the command: ,addstat("Pseudo R-squared",e(r2_p))

Outreg provides many options for the manipulation of the tables: text size, decimal specification, etc. While at first selection of the options may appear tedious, remember that by using a do-file one can save the preferences for future use.

A number of model outreg commands are available in the outreg help file.

Once you have run the outreg command and created your new file, it is simple to view the table in Excel. Simply open Excel, select "Open" from the file member. Click on the arrow to the right of the "files of type window" and select all files. Select the file folder in the "Look in Window". You file (extension .out) should now appear in the window. Select the file. At this point you have 2 options. Either select "Finish". The table will be imported, however any non-significant standard errors will appear as negative. Using the control key, select all rows with standard errors and under the format menu select cells>numbers>neagtive numbers: () option. Alternatively, while in the "Excel Import Wizard", select "Next" until you reach Step 3 of 3. At this point, make sure all columns data formal is "text" (simply select the column and then click on the text option).

One warning on using Outreg. If your regression utilizes lags, leads, or differences using the form l.var, f.var, or s.var, you MUST label your variables before proceeding with the regression and outreg functions. Failure to do so will results in misleading and unclear output. To label variables, use the command structure:

    label var varname "label"

For example, creating a label "country" for the variable ct, the command would be:
    label var ct "country"

The label will now be used instead of the variable name in all outreg output.

My numbers are strings. How can I convert a string variable into a numeric variable?

As a precaution, when using StatTransfer, make sure that your variable types are correctly defined. StatTransfer generally does this automatically with a high degress of accuracy. However, you can also define variable types manually. After selecting the file to be transferred and before transferring the data, click on the “variables” page. On this page, you will be able to select each variable and specify its type.

If you still need to convert a string to a numeric, use the command “destring.” Help can be found through the Stata Help menu. Note that the destring command will automatically read a single period (as commonly used in Excel files) as a missing data point. “Tostring” offers the reverse conversion.

How do I format my data for cross-section time series analysis?

For Stata XT commands (those used in cross-section time series analysis), data must be formatted so that each record is for a single individual and year observation. As shown below, two patients (identified by a patient id) followed for 3 years would create a dataset with 6 records.

pid       year             fev  age sex   height  smokes

1071    1991   1.21  25   1      69      0

1071    1992   1.52  26   1      69      0

1071    1993   1.32  28   1      68      0

1072    1991   1.33  18   1      71      1

1072    1992   1.18  20   1      71      1

1072    1993   1.19  21   1      71      0

The individual identifier (i) in this case pid mustbe numeric. If your identifier is a string (such as patient’s name), you must create a unique numeric identifier for each individual or group before proceeding. See the Stata website http://www.stata.com/support/faqs/data/group.html for a demonstration of how to do this.

If your data is ordered such that there is not a unique record for an individual/year observation, then you may find the command “reshape” helpful.  Reshape converts data from “wide” to “long” and vice versa. For example the “wide”form where there is only one record (including multiple years) for each individual:

id  sex   inc80  inc81  inc82

1    0    5000   5500   6000

2    1    2000   2200   3300

3    0    3000   2000   1000

can be changed into the “long” form in which each individual/year has its own record:

id   year   sex   inc

1     80     0  5000

1     81     0  5500

1     82     0  6000

2     80     1  2000

2     81     1  2200

2     82     1  3300

3     80     0  3000

3     81     0  2000

3     82     0  1000

See the Stata help menu or the manual for a helpful, detailed description of how to use the “reshape” command.

Finally, before certain cross-section time-series commands, you must by using the command “tsset” first define the id variable (i) and the time variable (t) before proceeding. Information on this command can be found both through the Stata help menu or in the Stata manuals.

It may also be useful to read the Stata manual section “XT” before starting.

ADVANCED: I have an unusual estimator that doesn’t seem to be available in Stata. Do I have to program it myself?

Stata programmers and other parties are frequently updating old programs and creating new ones. Between updates you can find these on the Stata website http://www.stata.com/support/faqs/. If you require the update, ask a Statlab consultant to arrange for the update to be downloaded for you.

Another option is to sign up to the stata listserve (http://www.stata.com/support/statalist/) and post a request for a program. The listserve is read by both Stata programmers and avid Stata users and so can be an excellent source of information; however, please use this resource respectfully and ask only questions not covered in the manuals or on the website.

A third option is to search the web and or university sites. Boston University’s Department of Economics offers a useful site for Stata modules: http://ideas.uqam.ca/ideas/data/bocbocode.html

Alexandra Guisinger Summer 2002

©2007 Yale University
Social Science Statistical Laboratory
Certifying Authority: Themba Flowers
lm: Wed Apr 09 15:45:20 EDT 2003