본문 바로가기
석사과정

[Statistical Analysis with R] Descriptive Analysis

by JANIMUN 2021. 2. 23.

<Review!>

Does anyone have age small than 0? : table(age<0)

How man y missing values does the variable age have?: table(is.na(age))

Are those people with BMI 0 the same people with insulin 0: table((BMI==0) & (insulin==0)

 

Change variable type: dat$lunchtime <-as.Date(dat$lunchtime)

 

[row, column]

to call out variables use: $___ OR attach(dat)

to remove the variables attached to a dataset: detach(dat)

 

dat[!dat$Age == 0, ]    = > extracting data where age is not equal to 0

dat_female <- dat [dat$Gender == "F", ] => create object (new dataset) with just female data

dat_final <- data.frame(ID = dat_female$PatientId, Age=dat_female$Age, NoShow = dat_female$No-show)

 

<Descriptive analysis>

Goal:

Describe your study sample regarding its main characteristics 

ie. regarding main personal/sociodemographic variables, covariates, and outcome/exposure variables

 

Goal 1: for yourself

  • Compute plots (and tables) to get an understanding of the characteristics of the variables, their distribution, and association between variables

Goal 2: for presenting to others

  • Describe the main characteritics of your study sample
  • Present these descriptive statistics in an easily accessible minimal table (or graphic)

Table 1 ; main characteristic of the sample 

  • Informative? Relevant? Easy to understand?
  • ex) study of change in wellbeing following work exit in 8037 persons
    • absolute number, frequencies
    • all important  & relevant variables that later will be analysized should be included
    • (!) %; are these row-wise? or column wise?
    • how is each variable measured? ; nominal, ordinal, metric ==> different kind of summary statistics 

For nominal vafriables? >> frequencies

  • Absolute frequencies : table(var1)
  • Relative frequencies: table(var1)/length(var1)
  • Further functions to create frequency tables: prop.table(), janitor::tabyl(), summarytools::freq()
  • Frequencies of 2 variables: table(var1, var2)
  • Alternative: expss::cro(var1, var2)

Ways to save the statistics/tables: 

  • Draft table in e.g. Word, copy/paste values from R manually.
  • Create your table directly in R, export to e.g. csv or excel file (or by knitting directly to word/pdf file)
    • eg. use openxlsx::write.xlsx(), writexl::write_xlsx(), write.table(), write.csv() functions
  • Create your report through R Markdown, and generate the tables / figure there directly.

 

Exercise with>>

Main question: Does sending a reminder SMS have an affect on whether people come to their doctor appointment?

 

Learn how to save as excel!! - rewatch the video

 

Frequencty plots:

  • Bar plot: barplot(table(var1))
  • Pie chart: pie(table(var1))
  • Stratified barplots with barplot(table(var1, var2)) or mosaicplot(table(var1,var2))
  • Histogram: hist() for metric/continuous variables

Save by using pdf() and jpeg() to open the connection to a pdf or jpeg file.

 

Descriptive statistics of ordinal variables >> bar plot, mosaic plot, histogram, boxplot, scatterplots

  • Frequencies (if not many categories)
  • Minimum, maximum, median, 
  • Range, quantiles, IQR, median absolute deviation (MAD)
  • Used but not fully appropriate: mean, standard deviastion (SD)

For continuous/metric variables >> Histogram, boxplot, scatterplots, Quantile-quantile plots

  • Mean, median, min., max,
  • Range, quantiles, IQR, MAD, SD, variance
  • (Skewness, kurtosis)

Things to remember!

na.rm = TRUE option to remove missing values! ex) mean(dat$age, na.rm = TRUE)

SD and variance are based on denominator n-1

quantile() function has 9 types how to compute quantiles!

 

댓글