Appendix A — Data description

A.1 Palmer Penguins

Information about this data are available from palmerpenguins

data(penguins, package = "palmerpenguins")
skimr::skim(penguins)
Data summary
Name penguins
Number of rows 344
Number of columns 8
_______________________
Column type frequency:
factor 3
numeric 5
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
species 0 1.00 FALSE 3 Ade: 152, Gen: 124, Chi: 68
island 0 1.00 FALSE 3 Bis: 168, Dre: 124, Tor: 52
sex 11 0.97 FALSE 2 mal: 168, fem: 165

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
bill_length_mm 2 0.99 43.92 5.46 32.1 39.23 44.45 48.5 59.6 ▃▇▇▆▁
bill_depth_mm 2 0.99 17.15 1.97 13.1 15.60 17.30 18.7 21.5 ▅▅▇▇▂
flipper_length_mm 2 0.99 200.92 14.06 172.0 190.00 197.00 213.0 231.0 ▂▇▃▅▂
body_mass_g 2 0.99 4201.75 801.95 2700.0 3550.00 4050.00 4750.0 6300.0 ▃▇▆▃▂
year 0 1.00 2008.03 0.82 2007.0 2007.00 2008.00 2009.0 2009.0 ▇▁▇▁▇

A.2 Cancer Registry

cancer_reg <- readr::read_csv(here::here("data/cancer_reg.csv")) 
Rows: 527 Columns: 20
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (3): County, state, group
dbl (17): mortality_rate, number_death, cancer_incidence, number_cases, popu...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
skimr::skim(cancer_reg)
Data summary
Name cancer_reg
Number of rows 527
Number of columns 20
_______________________
Column type frequency:
character 3
numeric 17
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
County 0 1 16 38 0 527 0
state 0 1 4 14 0 48 0
group 0 1 7 7 0 3 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
mortality_rate 0 1 181.19 26.75 66.30 163.35 181.80 197.75 292.50 ▁▂▇▂▁
number_death 0 1 119.08 155.31 3.00 28.00 58.00 141.50 914.00 ▇▁▁▁▁
cancer_incidence 0 1 450.87 50.24 211.10 421.55 453.55 487.15 630.40 ▁▁▇▆▁
number_cases 0 1 409.08 568.85 6.00 76.00 158.00 453.50 2841.00 ▇▁▁▁▁
population_2015 0 1 59408.13 90106.98 1130.00 11343.50 25594.00 64463.00 734871.00 ▇▁▁▁▁
age 0 1 41.00 5.02 22.30 38.10 40.90 43.80 56.60 ▁▂▇▅▁
income 0 1 45918.68 11779.18 23047.00 38017.50 44065.00 51414.50 108477.00 ▅▇▂▁▁
poverty 0 1 17.53 6.61 4.20 12.60 16.70 21.30 45.10 ▅▇▃▁▁
household 0 1 2.47 0.46 0.02 2.37 2.50 2.63 3.97 ▁▁▃▇▁
married 0 1 51.56 6.91 23.10 48.10 52.30 56.30 69.20 ▁▁▅▇▁
unemployed 0 1 8.09 3.25 0.70 5.90 8.00 9.70 22.60 ▂▇▃▁▁
medicare 0 1 19.63 5.91 2.60 15.50 19.40 23.45 41.40 ▁▇▇▂▁
white 0 1 83.10 17.16 11.01 75.81 90.24 95.41 100.00 ▁▁▁▂▇
black 0 1 10.36 16.07 0.00 0.68 2.28 12.45 84.87 ▇▁▁▁▁
asian 0 1 0.99 1.67 0.00 0.24 0.52 1.13 21.28 ▇▁▁▁▁
BirthRate 0 1 5.58 1.98 0.00 4.54 5.38 6.42 17.88 ▁▇▁▁▁
random_n 0 1 0.19 0.10 0.00 0.10 0.19 0.28 0.35 ▆▇▆▇▇