Course Description
Advanced Data Mining covers techniques able to handle vast numbers of input variables, correlated input variables, and complex relationships. These additional tools equip the user to perform exploratory data analysis, simplify problems, characterize relationships, set up advanced multivariate process control, and deal with binary response data. This course covers multivariate analysis methods which are ideally suited for today’s big data environment.
Specifically, methods are discussed for multivariate testing with MANOVA, reducing complexity, detection of outliers or clusters, visualizing correlation with PCA, performing multivariate control, finding drivers for categorical responses with discriminant analysis, and handling short fat tables (more variables than rows with potentially correlated inputs) using PLS models.
Who Should Attend
This course is designed for engineers, quality professionals, researchers, and managers who need to understand and extract information from big datasets or simplify very complex data problems.
Learning Objectives
Through training, participants will:
- Be able to visualize important relationships with hundreds of input variables
- Reduce complex for other model building efforts such as principal component regression
- Find new interpretations for complex datasets
- Know how to set up multivariate process control and why this is a more sensitive type of control method
- Be able to classify observations according to a categorical response
- Know how to handle the case where there are more variables than rows in a dataset, even when the inputs in the data are correlated
Course Outline
Introduction and Basic Concepts
- Applications for Multivariate Analysis
- Visualization
- Scaling
- Covariance
- Bivariate and Multivariate Distributions
- Data Mining Strategy
MANOVA
- Applications and Advantages
- Null and Alternative
- Univariate Follow-Up Analysis
Principal Component Analysis
- Uses of PCA including Outlier and Cluster Detection
- Basic Concept of Principal Components
- Understanding the Calculation of PC’s
- Choice of Correlation or Covariance Matrix
- Loading Plot, Scree Plot and Score Plots
- Judging the Percent of Variation Modeled
- Interpretation of PC’s
- Summary Output
- Examples
Multivariate Process Control
- Dangers of Univariate Control
- Three Methods of Multivariate Control
- Bivariate control charts for 2 Variables
- Bivariate control Charts for Multiple Variables
- Interpreting a control Ellipse
- Hoteling’s T2 Chart
Discriminant Analysis
- What is Discriminant Analysis
- Applications for Discriminant Analysis
- How to Perform the Analysis
- How to Interpret the Output
Partial Least Squares or Projection to Latent Structures
- History of PLS
- When to use PLS
- Fitting a PLS Model
- Calculating the VIS: Variable Importance Score
- Examining the Score Plots
- Examples including a PLS Discriminant Analysis
Prerequisites
Intermediate Data Mining or the equivalent
Course Format
16 hours
Instructor-led class training, with opportunities to practice learned skills using prepared data
Minitab or JMP Statistical Software
Follow QSG on LinkedIn!
Become a QSG Member today!