Data Mining – Intermediate

Course Description
Intermediate Data Mining extends upon the methods covered in the Basic Data Mining to build models containing multiple inputs simultaneously. This course covers multi-factor ANOVA, multiple regression, and introduces logistic regression.

Models with multiple inputs require special attention to build but offer a potentially more sensitive method to find key process input variables or model output performance. Included are logistic regression methods to model discrete outputs such as conforming/non-conforming which is particularly useful when quality problems are experienced. These methods can be used to troubleshoot processes, find potential root causes, characterize complex relationships between inputs and outputs, and even suggest optimums from observational data.

Who Should Attend
This course is designed for engineers, quality professionals, researchers, and managers who need to understand and extract information from observational data such as key process input variables or process drivers.

Learning Objectives
Through training, participants will:

  • Be able to identify key process input variables (KPIVs) or process drivers
  • Know how to model a response with multiple discrete factors
  • Know how to build multiple regression models for identification of KPIVs, characterization of complex relationships, or optimization using observational data
  • Know how to find drivers for binary response variables, such as pass/fail or conforming/non-conforming

Course Outline
Review of Basic Data Mining and Preparing Data for Analysis

  • The Purpose of Data Mining and When it Should be Used
  • Pitfalls of Data Mining
  • Data Mining Strategy
    • Data Cleaning
    • Data Entry Issues
      • Data Type
      • Sorting
      • Row and Column Lengths
      • Handling Dates and Times
    • Data Validity
      • Data Entry Errors
      • Outliers
      • Missing Values
    • Data Preparation
      • Conditional Formatting
      • Creating Variables
      • Merging Data

Multi-Factor ANOVA

  • General Linear Models
    • Review of the ANOVA Concept
    • Analysis Roadmap
    • Blocking
    • Fixed vs. Random Effects
    • Variance Inflation Factor
    • Comparisons
    • Transformations
  • Nested Models
  • Comparing Regression Slopes

Multiple Regression

  • Review of Simple Linear Regression and Correlation
  • Fitting a Multiple Regression Model
    • Multicollinearity and the Variance Inflation Factor
    • Lack of Fit
    • Stepwise Fitting
    • Durbin-Watson Test for First Order Autocorrelation
    • Factorial Plots
    • Response Optimizer
    • Advantages of Multiple Regression Models
    • Examples

Logistic Regression

  • Response Types
    • Binary, Nominal and Ordinal
  • Logistic Model
    • Motivation and Link Functions
    • Probability of a Success vs. Odds of Success
    • Log Odds Ratio
    • Meaning of Model Coefficients
    • Fitting the Model
    • Factorial Plots
    • Concept of Deviance
    • Examples

Prerequisites
Basic Data Mining or the equivalent

Course Format
16 hours
Instructor-led class training, with opportunities to practice learned skills using prepared data
Minitab or JMP Statistical Software

Follow QSG on LinkedIn!
Become a QSG Member today!

Always Keep Improving!