# SAS Boxplot

SAS Day 33: Box Plot

#### Definition: Box Plot or Whisker plot displays the distribution of 5-number summary of a dataset: minimum, maximum, q1, q3, and Median.

Interpreting quartiles:

The 5-number summary approximately divides the data into 4 sections that each containing 25% of the data.

#### Explore a little more

If we want to look at the Outliers, we define the points below q1- 1.5(q3-q1) and q3+ 1.5(q3-q1) as outliers.

Note: if we transfer the Q1-Q3 range of a boxplot into a normal distribution, then it maps to the peak of a normal curve (± 0.6745σ). akshayapatra / Pixabay

#### Example:

we will use sashelp.class as an example for box-plot using SGPLOT and TEMPLATE, they both produce the same result!

Basic Box-Plot Interpretation:
the median weight of female student is a little lower than 90, 25% of female students’ weight are within 75- 82, 25% are within 105-115 and 50% are between 85-102.

#### Code:

SPGLOT

proc sgplot data=sashelp.class;
title “Distribution of Weight by Sex”;
vbox weight / category= sex;
run;

TEMPLATE

proc template;
define statgraph ClassBox;
begingraph;
entrytitle “Distribution of Weight by Sex”;
layout overlay;
boxplot y=weight x=sex ;
endlayout;
endgraph;
end;
run;

proc sort data=sashelp.class out=class;
by sex;
run;
proc sgrender data=class template=ClassBox;
run; #### Code:

proc univariate data=sashelp.class;

var weight ;
class sex;
ods output quantiles =q;
run;

data q2(rename=(estimate=weight) where=(Quantile ne ” “));
set q;
quantile= scan(quantile, 2,””);
run;

proc template;
define statgraph bpp;
begingraph;
entrytitle “Distribution of Weight by Sex” ;
layout overlay;
boxplotparm y=weight x=sex stat=quantile;
endlayout;
endgraph;
end;
run;

proc sgrender data=q2 template=bpp;
run;

with the extra univariate step, we have a summary dataset to look for cross-validate the graph.
we can see indeed the min of female students weight is 50. proc sgplot data=test noautolegend;
highlow x=visit1 high=q3 low=median/ type=bar;
highlow x=visit1 high=median low=q1 / type=bar;
scatter x=visit1 y=medianlabel/markerchar=median markercharattrs(size=8)
xaxis discreteorder=data label=”n=&n1 n=&n2 n=&n3″;
yaxis label=”% x”;
values=(-100 to 100 by 20) grid max=120;
refline 0/axis= y lineattrs=(color= red) Reference: