 # VIRTUAL – Basic Data Mining (a.k.a. Multi-Vari Studies) with Minitab

Course Description
In today’s data rich age, vast amounts of data are routinely collected. These are termed ‘happenstance’, ‘non-experimental’ or ‘observational’ data. The role of statistics with such observational data is to extract all available information—often called Data Mining. Equally important is to design the right sampling plan to afford the right data in order to extract information which will allow critical sources of variation to be identified, as well as, to the extent possible, highlight significant relationships between inputs and outputs. The researcher can then use statistical methods much like a detective looking for clues, to release otherwise hidden information from data, providing knowledge for correct decisions.

Who Should Attend
This course is appropriate for anyone with basic knowledge of descriptive Statistics (measures of location such as mean and median, measures of spread such as standard deviation and variance and properties of the Normal distribution).

Learning Objectives
During this course participants will learn how to use Minitab to perform hypothesis testing to study the relationships between process variables which is the basis of Data Mining. Assumptions underlying the analyses are also checked before the results of the analyses are accepted.

This course does not teach participants simply how to point and click. The basic concepts underlying each tool will be discussed before the use of the software is demonstrated. \

Course Outline
Introduction to Multi-Vari Studies (2 hours)

• Overview of Multi-Vari Studies
• Noise Variables Analysis
• Pitfalls of Observational Data Analysis
• Planning Multi-Vari Studies: Data Collection and Sampling Methods

Statistical Reasoning (2 hours)

• Hypothesis Testing
• Type I and Type II errors
• p-value and its interpretation
• Confidence Intervals

Two Sample Comparison (4 hours)

• Box Plots
• Two Sample Comparison for Means
• Assumptions underlying the t-test
• Equality of Variance
• Concept of Blocking
• Sample Size Determination

Comparing Three or More Samples (4 hours)

• One-way Analysis of Variance (ANOVA)
• Statistical Assumptions underlying ANOVA
• Residual Analysis for checking Statistical Assumptions
• ANOVA for Blocking

Linear Regression (3 hours)

• Scatterplots
• Correlation and Correlation Coefficient
• Basic Concepts of Simple Linear Regression Analysis
• Diagnostics using Residual Analysis
• Introduction to Multiple Regression (multicollinearity)

Chi-Square Test (1 hour)

• Concept of Statistical Independence
• Chi-Square Test

Course Format
16 hours

Course instructors:
Saleha Yusof-Mullenix – Consultant, Lean Six Sigma

Course Evaluations