# Chi -Square 1

#### Data Science Day 3:  Chi-square Test Learning Objectives 1. Define the Chi-Square distribution 2. Explain the 3 Chi-square test applications scenario

The Chi- Square distribution is the sum of variance (squared standard normal deviates). The following equation represents a Chi-Square distribution with m degrees of freedom. where X1,  X2, … Xm are independent random variables having the standard normal distribution. The higher the degree of freedom, the more it approaches to a normal distribution. The Chi-Square distribution has 3 basic properties:

1. Not symmetric, Skewed to the right
2. No Negative Values
3. Total area under the curve=1

Three primary Chi-square test applications:
1.Test independence of two categorical variables:
Whether the two categorical variables have a strong association, or whether the two categorical variables are independently distributed in one sample space.
Null hypothesis: Two categorical variables are independent.
Note: There are two categorical variables from one sample space
Mini E.g. The Sex Frequency (Boys and Girls number) and Nationality distribution in a class

2*.Test the Goodness of Fit (Pearson):
Whether the sample categorical data are consistent with a hypothesized distribution.
Null hypothesis: Sample data are consistent with a specified distribution
Note: It is one Categorical variable from one sample space
Mini E.g. The Sex Ratio (Girls : Boys) in one class is the 50%

3.Test of Homogeneity:
Whether frequency counts of the categorical variable have the same distribution for different sample spaces.
Null hypothesis: The proportion of the categorical variable is the same in all sample space.
Note: It is one categorical variable from two or more different sample space.
Mini E.g. Whether the Sex Frequency is the same in all classes.

* In Clinical Trials, we use Chi-square log-rank test in survival analysis.

We will show the application examples next time!
Thanks very much to Renee Wu, Ali Motamedi~
Happy learning!