Chi-Square 3

Data Science Day 5:

Chi-Square Application 2:

Test Independence of Two categorical variables, or known as Contingency Table.

We use the Chi-Square test of Independence to check if two categorical variables are independent, or have a strong association.

Example 1: Ice-cream Favor VS. Buyer’s Gender

We want to see if there is a preference for ice-cream favor based on the gender of people eating it

TotalGenderStrawberryChocolateVanillaGreen Tea

H0 (Null Hypothesis): The preference for Ice-Cream Favor and Buyer’s Gender are Independent, (There is no association between Ice-Cream Favor and Gender selection)

we will use SciPy package and chi2_contingency function in Python.

Python Code:

We see the p-value is 4.3e-08, which is significantly < 0.05. So we Reject the Null Hypotheses and Conclude the Ice-Cream Favor is dependent on the buyer’s Gender. Note: If if the total count is <5 the result might be biased, and if it is 2 x 2 table (2 categorical variables with 2 observations) we will proceed with Fisher’s Exact Test.

Data Visualization:

From the visualization, we can see the indeed there’s more girls prefer Green Tea favor Ice-cream. 🙂


Source Code:

To be continue….



Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Social media & sharing icons powered by UltimatelySocial
%d bloggers like this: