By the end of the lecture, you will be able to …
tbl_summary() to create a descriptive summary tableDownload and open code-along-07.qmd
Load the standard packages & 1 new package: gtsummary()
lifenow R’s rating of life overall now from 0-10age Age of respondenteduc Respondents highest edu credithrs1 how many hours did you work last weekfefam Better for man to work, woman tend homerace What race do you consider yourselfsex Respondents sexMake a df with only the (pretty) categorical and continuous variables we’ll analyze.
# Categorical Variables
my_cat <- gss22 |>
select(id, fefam, race, sex) |>
zap_missing() |>
as_factor() |>
droplevels()
# Continuous Variables
my_con <- gss22 |>
select(id, lifenow, age, educ, hrs1) |>
mutate(
lifenow = as.numeric(lifenow),
age = as.numeric(age),
educ = as.numeric(educ),
hrs1 = as.numeric(hrs1))
# Combine the two dataframes
my_data <- left_join(my_cat, my_con, by = "id")tbl_summary()Calculates descriptive statistics for continuous, categorical, and dichotomous variables
| Characteristic | N = 4,1491 |
|---|---|
| respondent id number | 2,075 (1,038, 3,112) |
| fefam | |
| strongly agree | 171 (6.3%) |
| agree | 516 (19%) |
| disagree | 1,182 (43%) |
| strongly disagree | 866 (32%) |
| Unknown | 1,414 |
| race | |
| white | 2,651 (65%) |
| black | 775 (19%) |
| other | 659 (16%) |
| Unknown | 64 |
| sex | |
| male | 1,910 (46%) |
| female | 2,216 (54%) |
| Unknown | 23 |
| lifenow | 8.00 (7.00, 9.00) |
| Unknown | 2,001 |
| age | 47 (33, 63) |
| Unknown | 256 |
| educ | 14.00 (12.00, 16.00) |
| Unknown | 29 |
| hrs1 | 40 (36, 45) |
| Unknown | 1,829 |
| 1 Median (Q1, Q3); n (%) | |
my_data |>
drop_na() |>
select(lifenow, race, sex, fefam, age, hrs1, educ) |>
tbl_summary(
label = list(
race ~ "Race",
sex ~ "Gender",
fefam ~ "Better for man to work, woman tend home",
lifenow ~ "Life satisfaction (0-10)",
age ~ "Age",
hrs1 ~ "Number of hours worked last week",
educ ~ "Years of education"),
statistic = list(
all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{p}"
))my_data |>
drop_na() |>
select(lifenow, race, sex, fefam, age, hrs1, educ) |>
tbl_summary(
label = list(
race ~ "Race",
sex ~ "Women",
fefam ~ "Better for man to work, woman tend home",
lifenow ~ "Life satisfaction (0-10)",
age ~ "Age",
hrs1 ~ "Number of hours worked last week",
educ ~ "Years of education"),
type = list(sex ~ "dichotomous"),
value = list(sex = "female"),
statistic = list(
all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{p}"),
)my_data |>
drop_na() |>
select(lifenow, race, sex, fefam, age, hrs1, educ) |>
tbl_summary(
by = race,
label = list(
sex ~ "Women",
fefam ~ "Better for man to work, woman tend home",
lifenow ~ "Life satisfaction (0-10)",
age ~ "Age",
hrs1 ~ "Number of hours worked last week",
educ ~ "Years of education"),
type = list(sex ~ "dichotomous"),
value = list(sex = "female"),
statistic = list(
all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{p}")
)my_data |>
drop_na() |>
select(lifenow, race, sex, fefam, age, hrs1, educ) |>
tbl_summary(
by = race,
label = list(
sex ~ "Women",
fefam ~ "Better for man to work, woman tend home",
lifenow ~ "Life satisfaction (0–10)",
age ~ "Age",
hrs1 ~ "Number of hours worked last week",
educ ~ "Years of education"),
type = list(sex ~ "dichotomous"),
value = list(sex = "female"),
statistic = list(
all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{p}")) |>
add_overall() # This adds the Total columnmy_data |>
drop_na() |>
select(lifenow, race, sex, fefam, age, hrs1, educ) |>
tbl_summary(
by = race,
label = list(
sex ~ "Women",
fefam ~ "Better for man to work, woman tend home",
lifenow ~ "Life satisfaction (0–10)",
age ~ "Age",
hrs1 ~ "Number of hours worked last week",
educ ~ "Years of education"),
type = list(sex ~ "dichotomous"),
value = list(sex = "female"),
statistic = list(
all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{p}")) |>
add_overall() |>
modify_header(
label = '**Variable**') Change YAML
output: pdf_document
my_data |>
drop_na() |>
select(lifenow, race, sex, fefam, age, hrs1, educ) |>
tbl_summary(
by = race,
label = list(
sex ~ "Women",
fefam ~ "Better for man to work, \nwoman tend home",
lifenow ~ "Life satisfaction (0–10)",
age ~ "Age",
hrs1 ~ "Number of hours \nworked last week",
educ ~ "Years of education"),
type = list(sex ~ "dichotomous"),
value = list(sex = "female"),
statistic = list(
all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{p}")) |>
add_overall() |>
modify_header(
label = '**Variable**') |>
as_flex_table()Add a title and footnotes to your Quarto document
Title:
> Table 01. Unweighted Descriptive Statistics by Race
Footnotes:
> Notes: Unweighted data from the 2022 U.S. General Social Survey
Problem
Detection
Solutions
Listwise deletion
Problem
Solutions
Solutions
1. Create Table 01 for your research project
2. Create a scatterplot with your DV & 1 IV
3. Create a bargraph with your DV & 1 IV