Minitab data mining training

Data Mining – Basic

Course Description
In today’s data rich environment, vast amounts of data are routinely collected. These are termed ‘happenstance’, ‘non-experimental’, or ‘observational’ data. The role of statistics with such observational data is to extract all available information – often called Data Mining – and in particular to identify the Key Process Input Variables (KPIVs) for use in process improvement and process control. With a suitable sampling plan and a knowledge of how to prepare data for analysis, the engineer or researcher can then use statistical methods, much like a detective looking for clues, to release otherwise hidden information from data, providing the basis for correct decisions.

Observational data require special techniques and care in order to extract meaningful information and reach valid conclusions. Observational data are common in most process industries and can yield valuable information from normal process data without resorting to designed experimental data, which may be more costly to obtain. This course gives basic methods to compare a single input to a single output. It covers discrete or continuous inputs with continuous outputs and discrete inputs with discrete outputs. The methods introduced here are building blocks for more advanced data mining techniques as well as the basis for single factor experiments.

Who Should Attend
This course is designed for engineers, quality professionals, researchers, and managers who need to understand and extract information from observational data such as key process input variables or process drivers.

Learning Objectives
Through training, participants will:

  • Understand statistical reasoning
  • Be able to plan a multi-vari study and clean datasets
  • Learn the different types of statistical tests based on data type (t-Tests, ANOVA, non-parametric tests, simple linear regression and chi-square test)
  • How to avoid the pitfalls and perils of analyzing observational data
  • Improve utilization of available data to extract relevant information

Course Outline
Introduction to Data Mining

  • The Purpose of Data Mining and When it Should be Used
    • Six Sigma Roadmap Application
    • The Data Difference
    • Step-by-Step Guide
    • Analysis Tools
    • What Can Be Learned
  • Pitfalls of Data Mining
    • Observational Data Conclusions
    • Specification Influence
    • Confounding
    • Interactions
    • Large Amounts of Data
  • Planning the Study
    • Questions vs. Data Sources
    • Sampling
    • Data Organization
  • Data Mining Strategy
  • Data Cleaning

Statistical Reasoning

  • The Logic of Statistical Reasoning
    • Scientific Method
    • Fundamental Question Statistics Can Answer
  • Statistical Testing
    • Four Steps of Statistical Testing
    • Two Decision Errors
    • Two Ways to Control the More Serious Error
    • p-Values
    • Five Conditions to Accept a Conclusion from Data
    • Confidence Intervals

One and Two Sample Comparison of Means

  • One Sample Comparison
    • Analysis Roadmap
    • t-Distribution
    • Non-Parametric Sign and Wilcoxon Tests
    • Examples
  • Two Sample Comparison
    • Analysis Roadmap
    • Comparison of Standard Deviations
    • F-Distribution
    • Non-Parametric Mann Whitney Test
    • Examples
    • Paired t-Test

Three or More Sample Comparison of Means

  • Analysis of Variance
    • Null and Alternative
    • Partitioning Variation
    • Signal to Noise Ratios
    • Assumptions
    • Analysis Roadmap
    • Examples
    • Residuals
    • Multiple Comparisons
    • Non-Parametric Kruskal Wallis Test and Multiple Comparisons

Simple Linear Regression

  • Correlation
  • Analysis Roadmap
  • Coefficient of Determination
  • Assumptions and Transformations
  • Polynomial Regression
  • Examples
  • Exercise:  Hands-On Helicopter Demonstration

Chi-Square Analysis

  • Contingency Table
  • Chi-Square Distribution
  • Assumption
  • Examples
  • Cross Tabulation and Layers
  • Examples

Prerequisites
Basic SPC or the equivalent

Course Format
16 hours
Instructor-led class training, with opportunities to practice learned skills using prepared data, live demonstrations, and data collected real time in class
Minitab or JMP Statistical Software

Course instructors:
Saleha Yusof-Mullenix – Consultant, Lean Six Sigma

Course Evaluations

Follow QSG on LinkedIn!
Become a QSG Member today!

Always Keep Improving!