This datalab notebook describes the process of cleaning the Dix Intake Ledger, from Raleigh, North Carolina.
The code was written in Stata MP v16.
By: Nabarun Dasgupta (nab@unc.edu)
display "Notebook generated on $S_DATE at $S_TIME ET"
cd "/Users/nabarun/Dropbox/Projects/Dix Park Intake/"
use DixLedgerDeidentified_clean, clear
qui: describe, f
// Space for exploratory variable creation
* gen VAR = regexm(lower(TEXT),"token|token")
* table year if war==1, c(sum war) col
graph dot (sum) counter, over(decade) vertical title("Number of Patients Admitted") ytitle("Number of Admissions by Decade") graphregion(color(white)) bgcolor(white) scale(1.4)
mdesc decade
tab decade
The number of patients started to climb in the 1880s and increased substantially for the next decades.
graph dot (sum) counter, over(dayofweek) vertical title("Day of Week of Admission") ytitle("Number of Admissions") graphregion(color(white)) bgcolor(white) scale(1.4)
mdesc dayofweek
tab admitmonth pellagra
Admissions peaked on Tuesdays and were lowest on Sunday.
graph dot (sum) counter, over(admitmonth) vertical title("Month of Admission") ytitle("Number of Admissions") graphregion(color(white)) bgcolor(white) scale(1.4)
mdesc admitmonth
April was the month with the most admissions.
Histogram of age distribution at time of admission
* Age at intake histogram
hist age, width(5) freq graphregion(color(white)) bgcolor(white) note("Caution: missing age in `miss' (`pct'%) of patients")
bysort decade: summ age
graph dot (mean) age, over(decade) vertical title("Mean Age at Admission") ytitle("Age in Years") graphregion(color(white)) bgcolor(white) scale(1.4)