Past Workshops

This list of past workshops includes those which were collaboratively offered with the support of the StatLab and other groups.

Statistical Techniques and Software

Introduction to SPSS

SPSS is a flexible and user-friendly statistical software package known for its graphics, quick assessment tools and easy programming language. SPSS also works directly with Excel files. Widely used in all of the social sciences, SPSS offers add-ons which enable qualitative analysis, missing values analysis, and Survey design. This class offers a very basic introduction to the application GUI and coding mechanisms. Basic statistical understanding is expected but not necessary.

Intermediate SPSS

This workshop will build upon the Introduction to SPSS workshop. The focus will be on testing for moderation using analysis of variance and multiple linear regression. Sample topics include decomposing interactions with post-hoc tests and planned contrasts, and simple slopes analysis. Basic statistical understanding and beginner knowledge of SPSS is expected.

Introduction to Stata

Stata is a popular integrated statistical program used by academic researchers across campus, especially in economics, political science, and EPH. If you are a total to moderate rookie with Stata (i.e. have never used or only ever used "regress" for class) and want to learn more about importing, merging, and cleaning your data, this class is for you. We will cover the basics: getting around the program, do files, graphics and table generation.

Introduction to R

R is a free, open source development language for statistical computing and graphics. Because of its price and large development community, R is quickly becoming the statistical application of choice at Yale. R has add-ons for GIS, graphing, advanced statistics, econometrics, image analysis and more. This class offers an extremely basic introduction to the programming language and resources available. Basic statistical understanding is expected.

Intermediate & Advanced R

This class assumes basic knowledge of R and statistics. Topics include the various data types in R, reading in data, graphing, matrix manipulations and using and writing your own statistical functions.

Introduction to Machine Learning with R

This workshop will serve as a theoretical and practical introduction to using machine learning methods to solve problems using R. It takes a high-level approach, with almost no equations, and uses examples drawn from various areas of biology. The workshops will include an introduction to the field, and then a broad tour through the machine learning “pipeline”: feature selection, algorithm selection, measuring performance, and model validation. This pipeline is demonstrated in R.

Parallel R using foreach

This workshop well provide an overview of Parallel programming in R. Learn how to use the foreach package, a popular parallel programming package for R that allows you to execute your R script faster using multiple cores on your laptop and multiple nodes on an HPC cluster.

GIS Techniques and Software

Introduction to GIS: Mapping with QGIS

This workshop will introduce basic concepts of geographic information systems through the use of QGIS, a free and open-source GIS that can be used on Windows, Mac OS, and Linux platforms. We will cover installation and adding plugins, projection and coordinate systems, types of spatial data, transforming tabular into spatial data, creating color-coded maps, and conducting basic spatial analysis to help you start with your research.

Introduction to GIS: Mapping with ArcGIS

This is an introduction to the basic concepts of creating, managing and analyzing explicitly spatial data within a Geographic Information Systems (GIS) framework. Included is a step-by-step, “hands on” introduction to using spatial data within ESRI’s ArcGIS software. Topics will include: Spatial Data Models, Spatial Relationships, The ArcMap User Interface, Thematic Mapping Using Symbology, and Simple Analysis Using Complex Selection Methods.

Intermediate GIS: ArcGIS

This workshop will build upon the Introduction to GIS workshop by introducing students to a variety of tools for spatial analysis. Students will use the ArcGIS software suite to load, manipulate, analyze, and visualize data. We will consider strategies for working with both vector and raster data. As time allows, topics to be covered include: coordinate systems and projections, geospatial data management, creating spatially explicit datasets (geocoding and georeferencing), measures of central tendency (spatial means, standard distances, proximity), estimating geographic distributions (interpolation, kernel density estimation), measuring geographic distributions (spatial autocorrelation and clustering/’hot spot analysis’), basic raster analysis (local, focal, and zonal functions), and non-Euclidean raster operations (impedance layers, cost distance, least cost paths, cost allocation).


Geocoding is a geoprocessing technique that allows you to derive latitude and longitude coordinates from an address database. As a result, the original table of addresses can be mapped, enabling the power of geospatial analysis. This workshop will provide an overview of geocoding and the most important considerations in conducting your geocoding project. We will review several desktop and web-based geocoding options, such as ArcGIS, QGIS and Python. After the workshop, researchers will have a good understanding of how to conduct geocoding with sensitive data and how to assess the accuracy of geocoded results.

Introduction to ModelBuilder: ArcGIS

ArcGIS ModelBuilder is a visual programming language for automating repetitive geoprocessing. It has an easy-to-use flowchart-like interface that allows users to drag, drop, link, and loop ArcToolbox tools and input data files for quick, repetitive execution. Models are easy to troubleshoot and change, and at the end, users can export a rudimentary python script for further development. This hands-on workshop assumes some prior familiarity with ArcGIS and Toolbox tools.

Introduction to Spatial Analysis

This course will focus on developing spatial questions, visualizing spatial data, and creating statistically sound analysis plans for spatially relevant data. Specifically, the course will include techniques within the ArcGIS environment to visualize and explore spatial data, assess geographic clustering, and perform basic spatial prediction. The course is designed to provide a foundation in spatial statistics and highlight key caveats and questions to consider when working with spatial data. Prospective attendees should have working/intermediate knowledge of ArcGIS. All tools and data will be available on site using tools availible to all Yale faculty and students.

Programming Topics

Introduction to Python

Python is an easy to learn, powerful programming language. It has efficient high-level data structures and a simple but effective approach to object-oriented programming. In this workshop session, we’ll introduce you to basic Python programming with some examples of simple data analysis and GIS. No programming experience or statistical training required.

Introduction to the Command Line: UNIX/Linux

A lot of software programs do not come with a graphical user interface (GUI), and a Unix command-line terminal environment is required to run such programs. In this 2-hour session, you will learn the basics of a Unix command-line terminal, such as how to navigate the file system, the permission and security structure, and how to run programs from the command line. No previous Unix or command-line experience is required to attend this session.

Web Scraping with Python

Websites can be full of useful data that are not always downloadable or easily accessible. Rather than doing a manual copy/paste of a site, python allows you to access the raw HTML behind every webpage and automate the process of retrieving, structuring, and outputting data from pages across a domain. This workshop will cover identifying good candidates for scraping, discovering what data can be scrapped, and how python helps automate the process. Attendees are encouraged to bring in examples of sites they want to scrape as there may be some time to discuss individual projects.This class assumes a working knowledge of python (running code, installing libraries, etc) and familiarity with HTML structure.

Other Topics

Introduction to Qualtrics

Qualtrics is an easy to use but very sophisticated online survey tool that is now available to students, staff and faculty at Yale. This workshop will introduce you to some of the more advanced design considerations and features of the software, including conditional branching, scoring, embedded data, implementation of longitudinal designs, and integration of Qualtrics with crowdsourcing tools like Amazon Mechanical Turk.

Research Data Management

This workshop will introduce researchers (from postdocs to undergrads) to the fundamentals of research data management. You’ll learn about the data life cycle: creating, processing, analyzing, preserving, giving access to, and re-using data. We’ll discuss how to identify the current best practices in your field and any funder or publisher mandates that you’ll need to be aware of. Topics will include metadata standards, data documentation, data preservation, and how to access Yale’s many resources for data management help. In addition, we’ll discuss data management guidelines for NIH, NSF, and NEH grants.

Introduction to LaTeX/BibTeX

LaTeX is a document preparation system for typesetting technical or scientific documents but it can be used for almost any form of publishing. In this workshop, students will learn how to install LaTeX, work with TeX editors, generate basic documents (e.g., papers and Beamer presentations), manage bibliographies, and collaborate with others using ShareLaTeX.

Data Visualization and Tableau

This workshop will familiarize you with key issues in data visualization. You’ll learn about the principles of creating effective visualizations and some common pitfalls that result in confusing or misleading ones. We’ll introduce popular tools, discuss their differences, and point you towards resources (at Yale and beyond) for learning to use them. We’ll also explore a portfolio of science, social science, and digital humanities data visualizations to help you imagine how you might communicate your data and findings through visualizations.