data science day 25
T.Test is one of the most commonly used statistical tools to compare the difference in the mean value for Continuous variable as outcomes using Binary Explotary variables.
Today we will go over three basic pieces of knowledge for T.Test

The 2 basic Assumptions for T.Test (i.i.d, sample is normal )

One Sample vs Two Sample T.Test (pop vs sample, group1 vs group2)

T.Test vs Z.Test (NO sdev info vs known sdev)
T.Test Assumptions:

Independence: Observations are not related to each other
–No paired observations/ No Repeated Sampling
 Normal distribution:
If the sample size >= 50 for a group, we can assume the data is normally distributed by Central Limit Theorem.
If the sample size is <50 for a group, we use the histogram, QQplot, or Shapirowilk/Skewnesskurtosis test to assess the normality for the group.
Alternative Solution for nonnormality data:
 Apply Log to transform a skewed distribution
 Nonparametric test: Wilcoxon ranksum test
* Additional Assumption for TwoSample T.Test:
we usually assume the Variances are NOT equal for two groups unless you have a good reason to choose EqualVariance.
OneSample Vs TwoSample T.Test:
OneSample T.test:
Compare the mean of one group to a known population value.
Example: Average Height in Grade 10 vs Average Height in Class 1
H0: Population Mean= Sample mean
p>0.05, Accept the H0, CI should contain the estimate mean value.
TwoSample T.Test:
Compare the mean between the two groups.
Example: Placebo vs. Treatment
H0: Group1 Mean – Group2 Mean =0
p>0.05, Accept the H0, CI should contain 0.
T.Test vs Z.Test
Z.Test: Know population standard deviation.
T.Test: Don’t Know the population standard deviation.
if degrees of freedom gets higher and higher, Tdistribution becomes very similar to Normal Distribution. (n=1000)
Happy Studying!