Categorical Analysis Training Course



Categorical Data Analysis and Logistic Regression

Course Description

This course will cover analytical and graphical methods to model both binary and multinomial categorical data. This class builds on some of ANOVA and regression concepts to introduce approaches to accommodate common situations where responses are non-numeric such as pass/fail (or 0/1, Go/No Go, Loan Default/No Loan Default, …) and multilevel categories (e.g. low, medium, high risk). Topics covered will include graphical analysis methods such as Mosaic Plots, table computations, Chi Square (and its relatives) tests, binary, nominal, and ordinal logistic regression, odds ratios, multiple predictor variables, variable selection, significance testing, and maximum likelihood estimation. Additionally, Classification Trees and related partitioning methods are included. Modeling will be conducted using common statistical software. Graphical plots and output will be used to illustrate important concepts. This hands-on class will feature numerous instructor demonstrations, representative datasets, and student exercises to reinforce the fundamental principles.

Course Goals/Objectives

A student who successfully completes this course will:

  1. Understand the basic principles for association, correspondence, and categorical analysis.
  2. Be able to use EG to determine relationships between categorical variables both graphically and with tables.
  3. Understand the use of and differences between various Chi-Square (and other) tests of association.
  4. Know how to interpret EG output to determine if a relationship exists and the relative strength.
  5. Understand the basic principles behind logistic regression modeling and recognize terms such as maximum likelihood estimators, logit, and odds ratio.
  6. Be able to use the EG Logistic Regression task to construct simple and complex models.
  7. Know how to interpret EG output to evaluate the quality of a logistic regression model and the importance of the independent predictor variables.

Course Outline/Objectives
1. Categorical Analysis and Association
1.1 Probability and Chi Square Tests
1.2 Fisher’s Exact Test and CMH Test
1.3 Correspondence Analysis
1.4 Graphical Methods for Association

2. Logistic Regression
2.1 Model and Maximum Likelihood Estimation
2.2 Diagnostics
2.3 Odds and Odds Ratios
2.4 Classification Rates, Cutoff Values, Lift, Receiver Operating Characteristic Curves

3. Classification Trees
3.1 Algorithm
3.2 Model Building
3.3 Examples
3.4 Boosted Trees and Random Forests