SAS Day 33: Box Plot
Box Plot or Whisker plot displays the distribution of 5-number summary of a dataset: minimum, maximum, q1, q3, and Median.
The 5-number summary approximately divides the data into 4 sections that each containing 25% of the data.
Explore a little more
If we want to look at the Outliers, we define the points below q1- 1.5(q3-q1) and q3+ 1.5(q3-q1) as outliers.
Note: if we transfer the Q1-Q3 range of a boxplot into a normal distribution, then it maps to the peak of a normal curve (± 0.6745σ).
we will use sashelp.class as an example for box-plot using SGPLOT and TEMPLATE, they both produce the same result!
the median weight of female student is a little lower than 90, 25% of female students’ weight are within 75- 82, 25% are within 105-115 and 50% are between 85-102.
proc sgplot data=sashelp.class;
title “Distribution of Weight by Sex”;
vbox weight / category= sex;
define statgraph ClassBox;
entrytitle “Distribution of Weight by Sex”;
boxplot y=weight x=sex ;
proc sort data=sashelp.class out=class;
proc sgrender data=class template=ClassBox;
Advance Box Plot:
proc univariate data=sashelp.class;
var weight ;
ods output quantiles =q;
data q2(rename=(estimate=weight) where=(Quantile ne ” “));
quantile= scan(quantile, 2,””);
define statgraph bpp;
entrytitle “Distribution of Weight by Sex” ;
boxplotparm y=weight x=sex stat=quantile;
proc sgrender data=q2 template=bpp;
with the extra univariate step, we have a summary dataset to look for cross-validate the graph.
we can see indeed the min of female students weight is 50.
Creating Statistical Graphics in SAS,