Data Mining – Advanced

Course Description
Advanced Data Mining covers techniques able to handle vast numbers of input variables, correlated input variables, and complex relationships. These additional tools equip the user to perform exploratory data analysis, simplify problems, characterize relationships, set up advanced multivariate process control, and deal with binary response data. This course covers multivariate analysis methods which are ideally suited for today’s big data environment.

Specifically, methods are discussed for multivariate testing with MANOVA, reducing complexity, detection of outliers or clusters, visualizing correlation with PCA, performing multivariate control, finding drivers for categorical responses with discriminant analysis, and handling short fat tables (more variables than rows with potentially correlated inputs) using PLS models.

Who Should Attend
This course is designed for engineers, quality professionals, researchers, and managers who need to understand and extract information from big datasets or simplify very complex data problems.

Learning Objectives
Through training, participants will:

  • Be able to visualize important relationships with hundreds of input variables
  • Reduce complex for other model building efforts such as principal component regression
  • Find new interpretations for complex datasets
  • Know how to set up multivariate process control and why this is a more sensitive type of control method
  • Be able to classify observations according to a categorical response
  • Know how to handle the case where there are more variables than rows in a dataset, even when the inputs in the data are correlated

Course Outline
Introduction and Basic Concepts

  • Applications for Multivariate Analysis
  • Visualization
  • Scaling
  • Covariance
  • Bivariate and Multivariate Distributions
  • Data Mining Strategy

MANOVA

  • Applications and Advantages
  • Null and Alternative
  • Univariate Follow-Up Analysis

Principal Component Analysis

  • Uses of PCA including Outlier and Cluster Detection
  • Basic Concept of Principal Components
  • Understanding the Calculation of PC’s
  • Choice of Correlation or Covariance Matrix
  • Loading Plot, Scree Plot and Score Plots
  • Judging the Percent of Variation Modeled
  • Interpretation of PC’s
  • Summary Output
  • Examples

Multivariate Process Control

  • Dangers of Univariate Control
  • Three Methods of Multivariate Control
  • Bivariate control charts for 2 Variables
  • Bivariate control Charts for Multiple Variables
  • Interpreting a control Ellipse
  • Hoteling’s T2 Chart

Discriminant Analysis

  • What is Discriminant Analysis
  • Applications for Discriminant Analysis
  • How to Perform the Analysis
  • How to Interpret the Output

Partial Least Squares or Projection to Latent Structures

  • History of PLS
  • When to use PLS
  • Fitting a PLS Model
  • Calculating the VIS:  Variable Importance Score
  • Examining the Score Plots
  • Examples including a PLS Discriminant Analysis

Prerequisites
Intermediate Data Mining or the equivalent

Course Format
16 hours
Instructor-led class training, with opportunities to practice learned skills using prepared data
Minitab or JMP Statistical Software

Follow QSG on LinkedIn!
Become a QSG Member today!

Always Keep Improving!