Data Science Day 11: CMH Test
We know Chi-square can test the independence between two categorical variables in one sample population. What if we need to check the independence relation among three categorical variables or more?
Cochran Mantel Haenszel (CMH):
There are 3 categorical variables, we want to test if the third categorical variable is independent of the other two variables. Usually, the third nominal variable that identifies the repeats (such as different times, different locations, or different studies).
Without Loss of Generality, CMH is used for check independence of a 2 x 2 x K table.
Common Odds Ratio: If the association are similar across the partial tables, then we will have a common odds ratio.
Null Hypothesis, H0: The relative proportions of one variable are independent of the other variable within the repeats; in other words, there is no consistent difference in proportions in the 2×2 tables.
H0: odds ratio ab(1)= odds ratio ab(2)= ……= odds ratio ab(k)=1
Example: Berkeley Admission CMH Analysis
We want to know if the Admission rate is associated with Gender and if the Admissions rate is independent across Departments?
1.We will use Chi-Square to test if Admission is independent of Gender.
Null Hypothesis: Admission is independent of Gender.
2. We will use CMH to test if the Department is independent of Admission and Gender.
Null Hypothesis: Department is independent of Admission and Gender.
proc freq data=berkeley order=data; weight count; tables Sex*Accept/chisq relrisk; tables Department*Sex*Accept/ cmh ; run;
Since the Chi-square P-value <0.0001, we conclude the Admission is gender biased. Furthermore, Odds ratio=0.54 implies male is twice more likely to get an acceptance letter than female.
For CMH P-value=0.23, we conclude Department is independent of Admission and Gender. Given Department, there is no consistent difference in proportion in the acceptance rate and Gender. The Common Odds ratio=1.102 supported our conclusion.
The Brewslow-Day Test for Homogeneity of the Odds Ratios P-value=0.0021 implies there are significant differences in Odds Ratio of Department. For example, Department A has overall acceptance rate of 64.42% whereas Department E only has an acceptance rate of 6.44%.
Happy Studying 😉!
input Department Accept $ count;
DeptA Male Reject 313
DeptA Male Accept 512
DeptA Female Reject 19
DeptA Female Accept 89
DeptB Male Reject 207
DeptB Male Accept 353
DeptB Female Reject 8
DeptB Female Accept 17
DeptC Male Reject 205
DeptC Male Accept 120
DeptC Female Reject 391
DeptC Female Accept 202
DeptD Male Reject 278
DeptD Male Accept 139
DeptD Female Reject 244
DeptD Female Accept 131
DeptE Male Reject 138
DeptE Male Accept 53
DeptE Female Reject 299
DeptE Female Accept 94
DeptF Male Reject 351
DeptF Male Accept 22
DeptF Female Reject 317
DeptF Female Accept 24
Thanks to https://onlinecourses.science.psu.edu/stat504/node/114/ , it helped me to understand CMH test !