# Category: Data Science

-当data science 届的网红

In order to fulfill my dream, I will organize the lecture notes, homework,  projects from Harrisburg University . Meanwhile, I will publish some data analysis related topics.

## Type I error vs Type II error

Background story: Last time, when we used R to calculate the sample size, we specified Type I error α and Type II error β, but what does the meaning behind α and β? Review: We define the “best” sample size that has less variation of the sample mean from sample to sample.

## Sample Size Calculation with R

Background Story: One day, my boss asked me to check if the data has a certain number of events to perform an efficacy analysis. I was curious how did he come up with the number, later I know he must have done the Sample Size Calculation. Today we will go over the basics and R applications for sample size calculation.

## Randomization Method

Background Stroy: Last time we emphasized the importance of Randomization because it will provide a balanced measurement for treated and placebo groups, so the treatment is exchangeable. Today we will introduce 3 common randomization methods for different clinical trial purposes and the R code for implementing them: Simple Randomization, Block Randomization, and Stratified Randomization.

## Machine Learning for Predictive Data Analytics1

Data Science Day 26 When I was cleaning my home, I found a brand new book of Fundamentals of Machine Learning for Predictive Data Analytics.  Therefore I decided to read the book and share some exercise problems.

## Unix Command

Today I want to share some basic Unix commands I use recently in Putty.     Directory: Direct the path to a certain location cd /… cd – {home directory} pwd {show current working directory} Files: List current files  ls {path} ls -l {date,size, permission} Check/Change Access: Read is 4. Write is 2. Execute is 1. ls -lah xyz(u,group,everyone else)…

## T.Test: One vs Two Sample

data science day 25 T.Test is one of the most commonly used statistical tools to compare the difference in the mean value for Continuous variable as outcomes using Binary Explotary variables. Today we will go over three basic pieces of knowledge for T.Test The 2 basic Assumptions for T.Test (i.i.d, sample is normal ) One Sample vs Two Sample T.Test…

python day 34 Plain text files are broadly used in Data Science nowadays. For example, in NLP, Natural Language Process, we usually import plain text files for sentimental analysis. Such as movie review, “good”, “bad”… So today we will learn how to read text files in Python.