home | schedules | software | help | who we are | about | workshops | links | data access | contact us | print version

<  October 2009 >
Su Mo Tu We Th Fr Sa
        1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

Reserve a classroom


Schedule for
10/03/2009


Main Lab
140 Prospect St.
Room 101
No Events Today

Rosenkranz Hall
115 Prospect St.
Room 01
No Events Today

Consultant's Desk
140 Prospect St.
Room 100
12- 3:30pm Jeremy Green
3:30- 7pm Adrian de la Garza


schedules

software

help

who we are

data access

about

workshops

links

Using SSDA Data Files: SPSS

  • Introduction
  • Reading ASCII data files with SPSS syntax files
  • Reading ASCII data without a syntax file
  • Using the Text Import Wizard
  • Writing your own syntax file
  • Reading delimited ASCII datasets
  • Using SPSS Export files
  • Using SPSS System files


  • Introduction

    Different Data Types for SPSS

    There are three different types of data usable by SPSS: raw data, SPSS save file, and an SPSS portable file. It is important to know what type of data you are using so you can properly load and save it.

    SPSS Save files (.sav) are special fomat files that SPSS creates. They can normally only be read on the same type of operating system they were created on. So if your SPSS save file was created on a Unix machine, it can only be read by a unix machine (you can transform it into a portable file using StatTransfer).

    SPSS Portable files (.por) are also special format files SPSS creates. However, they can be transported to other computers and operating systems.

    Raw Data files or text files (.dat or .txt) consist of strings of text without any formatting. Because they contain no program specific formtating, raw data files can be read by virtually any statistical software and are fairly compact for storage.

    SPSS Save and Portable files are useful because they can store information about the variables like names and value labels. If you are loading a textfile, you will have to do the work of naming hte variables yourself (unless there is a syntax file to go with it).

    If your data is not one of the above formats, you should use StatTransfer to transform your dataset into an SPSS portable file.

     


    Using ASCII Data Files with SPSS

    I. Reading non-delimited ASCII datasets using SPSS Syntax Files

    Syntax files tell SPSS where to find and how to interpret "raw" data files. They do not contain any data themselves, but are very useful because they save you the time and trouble of entering in all the locations and definitions of variables. Most of the syntax files in the Social Science Data Archive are for older versions of SPSS and will not work as written. Fortunately, they are easily edited. The file extensions should be .sps.

    1) Start SPSS.

    2) In the FILE menu, go to Open and then Syntax. Specify the SPSS syntax file to be used (either by typing it in or browsing).

    3) Remove any informative matter on the top (such information should be blocked out, but the formatting may be old and no longer valid -- it is safer to remove this from the syntax file). You may want to cut and paste the information into notepad for reference later.

    4) There should be a few lines of code before the list of variable locations (often beginning with "File Handle" or "Data File Path." Now paste in the following:

    FILE HANDLE data1 NAME='datafilepath' LRECL=nnn .

    DATA LIST FILE=data1 RECORDS=1 /

    5) File Handle is a temporary name SPSS uses when accessing the data. You will need to change this name each time you attempt to load the data (sadly, you may not be successful on the first time every time you load data). Simply replace data1 with whatever name suits your fancy.

    6) Replace datafilepath with the file path for the data you wish to load (usually beginning H:\ssda\). Be sure to keep the single quotation marks.

    7) Replace nnn with the logical record length of your datafile (how many columns per case are there?). This information should be available in the codebook and on the list of statlab holdings. Do not delete the period after the LRECL, it is crucial to the syntax.

    8) Replace the data1 in front of List File with whatever name you typed into File Handle.

    9) If there is more than one record per case, you will need to change the number of records to reflect the number of records per case in your dataset.

    10) While scrolling down check to make sure there is a period after the list of variables. You should also check to make sure there is a period at the end of any Variable Label section or Value Label section. You may also check for extraneous comments.

    11) Type EXECUTE. after the final period. Make sure to put a period after the execute.

    12) Highlight the whole syntax file (use select all).

    13) Run the file by either pressing the dark triangular button on the tool bar or by going to All in the Run menu.

    Trouble shooting

    A) Did you remember to type "Execute." at the bottom?

    B) Did you specify the correct path to the data?

    C) Is there a period after the LRECL statement, the variable locations, the variable labels, and the execute?

    D) Is there a "/" after Records?

    E) Is there any hidden formatting before the first variable (delete until it is up against the margin)?

    If none of these are the problems, you could try eliminating optional sections such as comment fields, missing values, value labels, and the like. While these are certainly nice features, there are often small errors or incompatibilities in these sections.

     


    II. Reading raw ASCII datasets without a syntax file

    Syntax files are not available for every dataset found in the Social Science Data Archive. Since most ASCII datasets consist of very long strings of text, you will need to use the codebook to load and interpret the data. Most codebooks are separate text or pdf files in the SSDA folder you can open and read in any text editor (e.g., notepad or Microsoft Word) or using Adobe Acrobat, but others are stored in the Social Science Library. You can discover the location of the codebook from the SSDA catalog on the main screen for the study.

    Most datasets are set up with each variable being assigned a specific column location. This is called fixed width. In other datasets, variables are separated by a delimiter (like a comma). The codebook should tell you which is the case. If you are using a delimited dataset, separate instructions are provided below.

    There are two primary options for loading ASCII data in SPSS. The easiest is to use the Text Import Wizard -- SPSS will provide helpful instructions each step of the way. However, more advanced users may want to input more information about variables (like value labels) and will need to write their own syntax file.


    IIA. Using the Text Import Wizard

    1) Find the codebook for the dataset you wish to use and locate the column location for each variable you want to use.

    2) Start SPSS.

    3) Click on the FILE menu and go to "Read Text Data."

    4) Provide the desired file name (type it in or browse to it).

    5) You do not have a predefined format, so click "Next."

    6) The variables are arranged in fixed width, so click the appropriate radio button. Chances are the variable names are not at the top of the dataset, but checking should be an easy matter in the window provided by the Text Import Wizard. Click "Next."

    7) You will now need to tell SPSS where to begin reading the data, how many lines of text there are per case (when there are many variables they may not all fit on a single line, so multiple lines may be used -- the data codebook should contain such information), and what portion of the data you will want to input (usually you will want to load all the data). Click "Next."

    8) You now need to tell SPSS where the columns are to be found in the data. The codebook will give you the precise location of each variable. By clicking in the appropriate location in the window, you can drop lines to denote column breaks. You can move columns by clicking and dragging the lines to the appropriate location. You can also delete lines by clicking and dragging the line off the window. The ruler at the top of the window makes it easy to place lines in the correct location. You want to place lines at the end of each variable. For example, if the codebook said: "ID 1-4; Name 5-20; Sex 21; Age 22-23," you would place lines on the ruler at the 4, 20, 21, and 23 locations. Note, if there are large sections of the data that you do not want to import, then you do not need to delimit columns within these parts and can save yourself time. Click "Next" when you are finished.

    9) You can now enter the names of each variable by clicking on the appropriate column and typing in the name in the slot called "Variable Name". You can also specify the format of the variable at this time. Any column that you do not wish to import, can be eliminated by highlighting it and choosing "Do Not Import" under the "Data Format" window. Click "Next" when you are finished naming the columns.

    10) Click "Finish."

    The data should now be loaded into the data editor. You now can save the data as an SPSS data file. If the data did not load correctly, you probably did not place the column markers in the correct location. You will need to go back and repeat the loading.

     


    IIb. Writing your own syntax file to load data.

    Writing your own syntax file is not difficult, but requires a fair amount of careful typing (a single stray comma or period will cause the program to fail).

    1) Start SPSS.

    2) Click on the File Menu, go to New, and then Syntax.

    3) In the syntax window that jut appeared, cut and paste the following.

    FILE HANDLE data1 NAME='datafilepath' LRECL=nnn .

    DATA LIST FILE=data1 RECORDS=1 /

    V1 1-4

    .

    VARIABLE LABELS

    V1 "name"

    .

    VALUE LABELS

    V3 00 "Male"

    01 "Female"

    .

    EXECUTE.

    4) File Handle is a temporary name SPSS uses when accessing the data. You will need to change this name each time you attempt to load the data (sadly, you may not be successful on the first time every time you load data). Simply replace data1 with whatever name suits your fancy.

    5) Replace datafilepath with the file path for the data you wish to load (usually beginning H:\ssda\). Be sure to keep the single quotation marks.

    6) Replace nnn with the logical record length of your datafile (how many columns per case are there?). This information should be available in the codebook and on the list of statlab holdings. Do not delete the period after the LRECL, it is crucial to the syntax.

    7) Replace the data1 in front of List File with whatever name you typed into File Handle.

    8) If there is more than one record per case, you will need to change the number of records to reflect the number of records per case.

    9) Now you will need to tell SPSS where to find the variables. For the first variable, simply copy the location found in the codebook (e.g., 1-4). To input more variables, give subsequent variables names like V2 and V3, and tell the computer where to find them (e.g., V2 5-20 V3 21 V4 22-23). Do not put commas or periods between the variable and/or variable locations. The only punctuation should be the period at the end of the variable location statement. Note that you only need to input the variables you wish to call up.

    10) You can now name the variables in the "Variable labels" section. For each variable you wish to name, provide the indicator you gave above and place the names. For instance, V1 "ID" V2 "Name" V3 "Sex" V4 "Age" would tell SPSS to call the first variable "ID" and the second variable "Name" and so on. The names must be in double quotation marks. Do not separate names by commas or periods. The only period should be at the end of the "Variable labels" section. Note, this is not necessary to load the data, but column headings are often helpful is keeping track of information. If you do not desire column headings, delete this whole section.

    11) The "Value Labels" section allows you to identify and interpret numeric values that refer to categories. For instance, V3 00 "Male" 01 "Female" would tell SPSS that when the third variable has a value of 00 it refers to males and 01 to females. There should be no extraneous punctuation in this section (excepting the period at the end). Again, this entire section is nonessential and can be deleted.

    12) Make sure that the only periods in your syntax file come after the LRECL, all the variables, all the variable labels, all the value labels, and the execute at the end.

    13) Highlight the whole syntax file (use select all).

    14) Run the file by either pressing the dark triangular button on the tool bar or by going to All in the Run menu.

    Trouble shooting

    A) Did you remember to type "Execute." at the bottom?

    B) Did you specify the correct path to the data?

    C) Is there a period after the LRECL statement, the variable locations, the variable labels, and the execute?

    D) Is there a "/" after Records?

    If none of these are the problems, you could try eliminating optional sections such as comment fields, missing values, value labels, and the like. While these are certainly nice features, there are often small errors or incompatibilities in these sections.


    III. Reading delimited ASCII datasets.

    Delimited datasets have markers signifying the breakpoints for columns. Comma, tab, space, and semicolon are the most common delimiters. The text import wizard can quickly read in delimited data. The text import wizard is very user friendly and has instructions at every step. If at any point you make a mistake, simply click "Back" to go back and repair your mistake.

    1) Start SPSS.

    2) Click on the FILE menu and go to "Read Text Data."

    3) Provide the desired file name (type it in or browse to it).

    4) You do not have a predefined format, so click "Next."

    5) Click "Delimited." Enter whether or not variable names are included in the top of the file. Then click "Next."

    6) You will now need to tell SPSS where to begin reading the data, how many lines of text there are per case (when there are many variables they may not all fit on a single line, so multiple lines may be used -- the data codebook should contain such information), and what portion of the data you will want to input (usually you will want to load all the data). Click "Next."

    7) Now tell SPSS which delimiters your dataset uses. It is possible to have more than one delimiter present, so be sure that only the delimiter(s) you want have a check marks. Click "Next."

    8) You can now name your variables and specify the format for each column. This is often not necessary since the text import wizard will automatically sense what type of data is in the column. You can also opt not to import particular variables. Click "Next."

    9) Click Finish.

    The data should now be loaded into the data processor. You now can save the data as an SPSS data file. If your data did not load correctly, you should check to make sure:

    A) Is the data really delimited? Did you specify the correct delimiter?

    B) Are there variable names at the top of the file?

    C) Are there multiple lines per case?


    Using SPSS Export Files (.spx and .por files)

    These files are SPSS portable files for use on multiple computer platforms.  The two file types refer to SPSS portable files created in SPSS/PC+  for DOS (.spx) and SPSS for Windows (.por).  They require different procedures for access.

    To access a portable file from SPSS for Windows, identified by the extension .por, use the following procedure:

    To access a portable file from SPSS /PC+ for DOS, identified by the extension .spx, use the following procedure:
            import file='h:\ssda\directory\filename.spx'.  
            execute.
    Note that both commands (import and execute) need to begin in the first column of the line.  You also need the final periods at the end of each of the command lines.  Consult the help sheet on SPSS Syntax Basics for additional information on submitting program lines. You are now ready to analyze the data.  Consult the SPSS help page for additional information on running procedures.
     

    Using SPSS System Files (.sys or .sav files)

    These files are ready for analysis in SPSS for Windows.
     

    ©2007 Yale University
    Social Science Statistical Laboratory
    Certifying Authority: Themba Flowers
    lm: Fri Apr 11 11:07:04 EDT 2003