R day 2:

I was working on a dataset of Airbnb in New York City from Kaggle, when i run the summary function for the price variable in R, i noticed there’s a strong difference between Mean and Median of the variable.

summary(ab$price)

Min. 1st Qu.

Median Mean3rd Qu. Max.

0.0 69.0106.0152.7175.0 10000.0

#### In this case, which variable is more persuasive? *Mean or Median.*

*Mean or Median.*

In order to answer this question, we will **run the density distribution** of the price variable first.

As the graph shows, the **price density distribution is extremely skewed to the left**.

Can you guess which one would make more sense?

Yes, it is the** median value that tells a better story about Airbnb price in NYC** !

d1<- ggplot(ab, aes(price))+geom_density(alpha=0.2) d1

#### What if the data is not skewed or just slightly skewed?

In this case, Mean Value is very reliable to describe the central tendency of the data

carrots <- data.frame(length = rnorm(100000, 6, 2)) cukes <- data.frame(length = rnorm(50000, 7, 2.5)) #Now, combine your two dataframes into one. First make a new column in each. carrots$veg <- 'carrot' cukes$veg <- 'cuke' #and combine into your new data frame vegLengths vegLengths <- rbind(carrots, cukes) #now make your lovely plot p <- ggplot(vegLengths, aes(length, fill = veg)) + geom_density(alpha = 0.2) p

by examining the density distributions of data, now we have a conclusion.

### Conclusion:

if a data distribution is

Normal/slightly SkewedtheMean Valueshows the Central Tendency of the dataset. Whereas if the data isskewed, then theMedianis a more intuitive measurement.

*Thanks to Jun.z, who is willing to share with me about all the stats tricks.*

**REF:**