Important steps of a data analysis]
1. Import dataset from an external file (e.g. xls, txt, SPSS file)
2. Import check: check if dataset has been read correctly
3. Save dataset as R dataset (.Rdata), e.g. dat_raw.Rdata
4. Data check: check if data is correct/missing, and e.g. remove probands/variables or decide for imputation. Save corrected dataset as new dataset, e.g. dat_corrected.Rdata
5. Transform variables, compute new variables, and/or select subset for final analysis. Save this again as new dataset e.g. dat_final.Rdata, and use in all further steps
6. Descriptive to describe main characteristics of study sample
7. Main analyses
8. Secondary analyses
9. Sensitivity analyses
Data check & Manipulation (steps 4 & 5)
Ways to check:
-
Important logical operators: == & | !
-
In combination with funcitons to compare/evaluate values > = < >= <= is.na(), and further specific functions to e.g. evaluate strings, many questions can be evaluated
-
The number of times this evaluations is true can be then displayed using the table() function
-
eg) Does anyone have age smaller than 0? : table(age<0)
-
eg) How many missing values does the variable age have table(is.na(age))
-
eg) How many people have a BMI of 0 : table(BMI==0)
-
eg) How many people have insulin level of 0 : table(insulin == 0)
-
eg) How many are those people with BMI the same people with insulin 0: table((BMI == 0) & (insulin ==0)
Transform:
-
Change variable type using the funcitons as.numeric(), as.character(), as.factor(), as.numeric(as.character()), as.Date()
-
Create new variable through mathematical operation, eg)
-
compute BMI from height and weight: compute BMI <-dat$weight/(dat$height^2)
-
standardize variables with scale() function: dat$BMI_z <- scale(dat$BMI)
-
-
Remove/add/replace values of variable with [.] operator:
-
dat$BMI[1] <-20
-
dat$BMI[dat$BMI<0] <- NA
-
-
Same ideas as for transforming variables (columns of data frame = variables = vectors!)
-
select subset of data frame to filter variables/observations, or add columns/rows. This can be done using [, ] operator, data.frame() function, and others, e.g.:
-
dat[!dat$Age == 0]
-
dat_female <- dat [dat$Gender == "F", ]
-
dat_final <- data.frame(ID = dat_female$PatientId, Age =
-
Same ideas as for transforming variables (columns of data frame = variables = vectors!)
-
Select subset of data frame to filter variables/observations, or add columns/rows. This can be done using [, ] operator, data.frame() function, and others,
-
dat[!dat$Age == 0]
-
dat_female <- dat [dat$Gender == "F", ]
-
dat_final <- data.frame(ID = dat_female$PatientId, Age = dat_female$Age, NoShow=dat_female$No-show)
-
subset()function
-
-
Missing value?
-
is.na() ;; doesn't always work well this way!
-
a[a$mpg==21 & lis.na(a$mpg), ] ;; try like this!
-
Tidyverse
-
In R, in addition to the "classical" R programming, which we have mostly used so far, there are many new packages and functions that introduce new objects and structures how to program
-
Many are subsumed in the tidyverse ; examples:
-
Pima_diabetes %>%
-
dplyr::select(Pregnancies, BMI) %>%
-
dplyr::filter(Pregnancies>10) %>%
-
dplyr::summarize(av_BMI_highP = mean(BMI), n = n())
Dates and times are tricky in R, read more!
-
See R_2c_dates_and_times_in_R.pdf
-
Do exercise 4 in R 2 exercises.Rmd.
'석사과정' 카테고리의 다른 글
[Statistical Analysis with R] Data Analysis (0) | 2021.03.02 |
---|---|
[Statistical Analysis with R] Advanced Tables & Plots (0) | 2021.03.02 |
[Statistical Analysis with R] Descriptive Analysis (0) | 2021.02.23 |
[Statistical Analysis with R] R Markdown (0) | 2021.02.16 |
[Statistical Analysis with R] Overview (0) | 2021.02.09 |
댓글