data science day 25
T.Test is one of the most commonly used statistical tools to compare the difference in the mean value for Continuous variable as outcomes using Binary Explotary variables.
Today we will go over three basic pieces of knowledge for T.Test
The 2 basic Assumptions for T.Test (i.i.d, sample is normal )
One Sample vs Two Sample T.Test (pop vs sample, group1 vs group2)
T.Test vs Z.Test (NO sdev info vs known sdev)
Independence: Observations are not related to each other
–No paired observations/ No Repeated Sampling
- Normal distribution:
If the sample size >= 50 for a group, we can assume the data is normally distributed by Central Limit Theorem.
If the sample size is <50 for a group, we use the histogram, QQplot, or Shapiro-wilk/Skewness-kurtosis test to assess the normality for the group.
Alternative Solution for non-normality data:
- Apply Log to transform a skewed distribution
- Nonparametric test: Wilcoxon rank-sum test
* Additional Assumption for Two-Sample T.Test:
we usually assume the Variances are NOT equal for two groups unless you have a good reason to choose Equal-Variance.
One-Sample Vs Two-Sample T.Test:
Compare the mean of one group to a known population value.
Example: Average Height in Grade 10 vs Average Height in Class 1
H0: Population Mean= Sample mean
p>0.05, Accept the H0, CI should contain the estimate mean value.
Compare the mean between the two groups.
Example: Placebo vs. Treatment
H0: Group1 Mean – Group2 Mean =0
p>0.05, Accept the H0, CI should contain 0.
T.Test vs Z.Test
Z.Test: Know population standard deviation.
T.Test: Don’t Know the population standard deviation.
if degrees of freedom gets higher and higher, T-distribution becomes very similar to Normal Distribution. (n=1000)