This datalab notebook describes the process of cleaning the Dix Intake Ledger, from Raleigh, North Carolina.
The code was written in Stata MP v16.
By: Nabarun Dasgupta (nab@unc.edu)
display "Notebook generated on $S_DATE at $S_TIME ET"
cd "/Users/nabarun/Dropbox/Projects/Dix Park Intake/"
use DixLedgerDeidentified_clean, clear
qui: describe, f
// Space for exploratory variable creation
* gen VAR = regexm(lower(TEXT),"token|token")
* table year if war==1, c(sum war) col
graph dot (sum) counter, over(decade) vertical title("Number of Patients Admitted") ytitle("Number of Admissions by Decade") graphregion(color(white)) bgcolor(white) scale(1.4)
mdesc decade
tab decade
The number of patients started to climb in the 1880s and increased substantially for the next decades.
graph dot (sum) counter, over(dayofweek) vertical title("Day of Week of Admission") ytitle("Number of Admissions") graphregion(color(white)) bgcolor(white) scale(1.4)
mdesc dayofweek
tab admitmonth pellagra
Admissions peaked on Tuesdays and were lowest on Sunday.
graph dot (sum) counter, over(admitmonth) vertical title("Month of Admission") ytitle("Number of Admissions") graphregion(color(white)) bgcolor(white) scale(1.4)
mdesc admitmonth
April was the month with the most admissions.
Histogram of age distribution at time of admission
* Age at intake histogram
hist age, width(5) freq graphregion(color(white)) bgcolor(white) note("Caution: missing age in `miss' (`pct'%) of patients")
bysort decade: summ age
graph dot (mean) age, over(decade) vertical title("Mean Age at Admission") ytitle("Age in Years") graphregion(color(white)) bgcolor(white) scale(1.4)
Median distribution was nearly identical and is not shown. Mean age at admission rose slightly beyond in the 1880s, from 35 to 39 years, as the overall population of the hospital increased.
We noticed above that the absolute number of patients admitted increased substantially in the early 1900s. Therefore, in order to visualize trends over time, we are created weights for each indivual to adjust for the secular trend of increasing patient volume.
Imaginge a scenario where there were 100 patients admitted in Decade A, and ten times as many patients in Decade B (n=1000). The weight for patients in Decade A would be 1/100 or 0.01. The weight for Decade B would be 0.001. Visually, each patient in Decade A would represent 10 patients in Decade B. After generating them, we muliplied the weights by 100 to make the vertical axis easier to interpret as percent of admissions by deacde.
Check out the weighted and unweighted data in the example below where we evalute whether there were changes over time in marital status among patients at admission. For single patients, the number (unweighted) and percent (weighted) follow.
tab marital, m
graph dot (sum) counter if marital==1, over(decade) vertical title("Single Patients Admitted") ytitle("NUMBER of Admissions by Decade") graphregion(color(white)) bgcolor(white) scale(1.4)
graph dot (sum) weight if marital==1, over(decade) vertical title("Single Patients Admitted") ytitle("% of Admissions by Decade") graphregion(color(white)) bgcolor(white) scale(1.4)
From the top graph alone we would concluse that the number of single patients increased as the overall population of the hospital increased. Although this is ture, the weighted data (second graph above) show that the proportion of single patients decreased overtime, nearly half in the 1900s as it was in the 1850s.
tab gender, m
graph dot (sum) weight if gender=="F", over(decade) vertical ytitle("% of Admissions Women") graphregion(color(white)) bgcolor(white) scale(1.4)
The percent of women increased steadily through the late 1800s.
Analysis to be done after variable cleanup
December 1884 seems to be when the occupation starts being filled out regularly, if that helps pinpoint any changes to staff or policy that might have influenced the recording of this field. Right around the time that Broughton opens?
frame create temp
frame temp: use DixLedgerDeidentified_clean
frame change temp
gen flag=1 if occupationcleaned=="no entry" | occupationcleaned==""
collapse (sum) flag (count) patientid, by(year)
gen percent = (flag/patientid) *100
la var percent "Percent of Intakes with no Occupation Noted"
list year percent
line percent year, xtick(1855(5)1920)
frame change default
frame drop temp
graph dot (sum) weight if farmer==1, over(decade) vertical ytitle("% Farmer among patients") graphregion(color(white)) bgcolor(white) scale(1.4)
Repeat as proportion. From 1859-84 there were nearly no farmers recorded... ? Between 1908 (n=38) to 1909 (n=82) the number of farmers increased -- was there a facility expansion at the time? Check trends in general patient population. Occupation missing in this period?
Thomas: handwriting during late 1860s and 1870s was really bad. Different person interpreting the occupation. Occupation is missing in the 1861 to 1890s
table year if healthpro==1, c(sum healthpro) col
graph dot (count) patientid if healthpro==1, over(decade) vertical ytitle("Number of Health Professionals") graphregion(color(white)) bgcolor(white) scale(1.4)
graph dot (sum) weight if dead==1, over(decade) vertical title("Death as Final Disposition") ytitle("% of Admissions") graphregion(color(white)) bgcolor(white) scale(1.4)
graph dot (sum) weight if transfer==1, over(disdecade) vertical title("Transfers by Decade of Admission") ytitle("% of Discharges as Transfers") graphregion(color(white)) bgcolor(white) scale(1.4)
Suicide could be a baseline condition noted in the ledger.
tab intakecondition, m sort
gen suiciderecode = regexm(lower(intakecondition),"suic")
tab decade suiciderecode if suiciderecode==1
Suicides reported AT hospital, as noted in Ledger.
table year if suicide==1, c(sum suicide) col
graph dot (sum) weight if cocaine==1, over(decade) vertical title("Cocaine as Cause of Attack") ytitle("% of Admissions") graphregion(color(white)) bgcolor(white) scale(1.4)
table year if cocaine==1, c(sum cocaine) col
There were relative few cocaine-as-cause admissions.
graph dot (sum) weight if opiate==1, over(decade) vertical title("Opiate as Cause of Attack") ytitle("% of Admissions") graphregion(color(white)) bgcolor(white) scale(1.4)
table year if opiate==1, c(sum opiate) col
Hypothesis: Patients with opiate disorders were more likely to be health professionals (with access to morphine and opium).
tab healthpro opiate, m
cc healthpro opiate
local or=round(r(or),.1)
local ub=round(r(ub_or),.1)
local lb=round(r(lb_or),.1)
di "Odds ratio: `or' (95% CI: `lb', `ub')"
Hypothesis confirmed: Patients with opiates listed as cause-of-attack had an odds of 18.4 (95% CI: 9.1, 34.8) of being a health professional. As a time trend, opiate diagnoses/admissions spiked in the 1890s... why? Was this a period of increased availability of morphine and opium via patent medicines?
graph dot (sum) weight if alcohol==1, over(decade) vertical title("Alcohol as Cause of Attack") ytitle("% of Admissions") graphregion(color(white)) bgcolor(white) scale(1.4)
Alcohol was not diagnosed as a cause prior to the 1870s.
// Space for exploratory variable creation
gen compfarm = regexm(lower(occupationcleaned),"farm|labor|agri")
* table year if war==1, c(sum war) col
tab compfarm decade, m
table year if war==1, c(sum war) col
graph dot (sum) weight if overwork==1, over(decade) vertical title("Overwork as Cause of Attack") ytitle("% of Admissions") graphregion(color(white)) bgcolor(white) scale(1.4)
tab overwork farmer
graph dot (sum) weight if pregnancy==1, over(decade) vertical title("Pregnancy as Cause of Attack") ytitle("% of Admissions") graphregion(color(white)) bgcolor(white) scale(1.4)
graph dot (sum) weight if menopause==1, over(decade) vertical title("Menopause as Cause of Attack") ytitle("% of Admissions") graphregion(color(white)) bgcolor(white) scale(1.4)
graph dot (count) patientid if syphilis==1, over(decade) vertical title("Syphilis as Cause of Attack") ytitle("NUMBER of Admissions") graphregion(color(white)) bgcolor(white) scale(1.4)
graph dot (sum) weight if hereditary==1, over(decade) vertical title("Heredity as Cause of Attack") ytitle("% of Admissions") graphregion(color(white)) bgcolor(white) scale(1.4)
// See if differences by gender and age
bysort gender: summarize age
qui distinct patientid if age==. | gender==""
local miss= r(ndistinct)
vioplot age, over(gender) ytitle("Percent of Patients") obs(alt) graphregion(color(white)) bgcolor(white) note("Caution: Age or gender missing in `miss' patients not represented above")
Intrepretation: Female patients at entry were clustered around mid-30s of age, while males tended to be a decade younger but also extending into later age.
graph dot (sum) weight if marital==1 & gender=="F", over(decade) vertical title("Single Women Admitted") ytitle("% of Admissions") graphregion(color(white)) bgcolor(white) scale(1.4)
The percent of single women admitted peaked in 1880s and decreased through the following decades as the asylum increasingly became a place for elderly indigent patients.
Hyopothesis: Diagonses of masturbation would have been more attached to men than women.
graph dot (sum) weight if masturbation==1, over(decade) vertical title("Masturbation as Cause-of-Attack") ytitle("% of Admissions") graphregion(color(white)) bgcolor(white) scale(1.4)
Hypothesis: The masturbation diagnosis was applied more often to men.
tab sex masturbation, m
cc sex masturbation
local or=round(r(or),.1)
local ub=round(r(ub_or),.1)
local lb=round(r(lb_or),.1)
di "Odds ratio: `or' (95% CI: `lb', `ub')"
As expected, the men had 20 higher odds of having masturbation listed as a cause. Only 11 women had this diagnosis. Let's look into who they were.
table year if sex==0 & masturbation==1, c(sum masturbation) col
summ age if sex==0 & masturbation==1
tab patientid if sex==0 & masturbation==1 & age==12
Hypothesis: Masturbation diagnoses would be more common in unmarried patients.
tab marital masturbation, m
cc marital masturbation
Recode marital to dichotomous for single:
gen single=0
replace single=1 if marital==1
replace single=. if marital==.
tab single masturbation, m
cc single masturbation
As expected, masturbation was a diagnosis of singles.
graph dot (sum) weight if pellagra==1, over(decade) vertical title("Pellagra as Cause of Death") ytitle("% of Deaths") graphregion(color(white)) bgcolor(white) scale(1.4)
table year if pellagra==1, c(sum pellagra) col
Pellagra cases were noted with greater frequency after 1910. Was this due to increased case finding (e.g., diagnostic suspicion bias) or a medical phenomenon as corn became more prevelant in agriculture?
From Bobby: I spent a bit of time this morning looking for more research/sources on pellagra. Here’s one I found very useful:
https://www-nber-org.libproxy.lib.unc.edu/papers/w23730
It is pretty current (2017). It makes the case that the specific mechanism by which one contracted pellagra wasn’t a part of standard medical knowledge until the 1930s, and laws requiring the “fortification” of foods associated with pellagra weren’t passed until the 1940s in the South. The article also helps to explain why farm families contracted pellagra, not just “urban” cotton mill workers: the prevalence of mono-culture (cotton or tobacco), which crowded out land upon which fresh produce could be produced. They use the boll weevil infestation to make their case. They also tell us more about the way the disease develops—it takes about 6 months to develop symptomatically, for example, so was regarded as a seasonal disease (spring!) and say a bit about mortality. Fascinating!
So, we have two populations of patients with whom pellagra is associated: one of patients who are admitted presumably because they presented with the tell-tale symptomatic expressions of active pellagra: dermatitis, and “dementia.” The admissions ledgers would tell us what happened to these patients: as I understand it, if caught early enough pellagra can be “cured” simply by re-supplying the nutrient missing from their diet. So, if the hospital diet did contain sufficient amounts of niacin, some, at least, of these patients would have be “cured” regardless of any other therapies that might have been tried. It would be interesting to follow some of the “cured” population to see how they fared back at home.
The other, and a more puzzling group who contract pellagra after long periods of confinement in the hospital itself. Their contracting of pellagra, it would seem, must have been connected to what they were being fed. So we have to wonder why some patients in the hospital contracted pellagra while others did not. Were there some who were more vulnerable because their bodies were less able to metabolize niacin? Is pellagra associated with age? We know that by far “ill-health” was much more associated with women (at the time of admission) than men. Also, this is puzzling because we know that hospital during this period produced much of its own food. Was this a matter of trying to economize by making fresh food go further or relying on cheaper alternatives to fresh food. The superintendents’ reports would be important here: they contain summaries of the kinds and amounts of food produced.
From Hannah: I met with Sarah this morning and updated her on the research I’ve been conducting on the 3 sisters and their illnesses. I had decided to start with Fannie, who succumbed to pellagra in 1918. Over the weekend, I looked at the Goldberger papers and other materials searching for connections between the federal government’s studies of pellagra and what might have been happening in North Carolina at the time.
What I have found so far is a number of gaps with regard to NC in the dominant US pellagra narrative: evidence that Goldberger did visit Goldsboro, of which he gives a brief but clear and description, but no clear evidence that he was ever sent to Raleigh. Likewise a lack of evidence so far of the Raleigh doctors’ participation in the national debates (in the form of publications and at least 3 conferences), though doctors from Durham, Charlotte, and Asheville were published and did present at these conferences. Meanwhile, the head of the NC Board of Health conducted at least one study of pellagra in Yancey County (perhaps because some thought the disease was tropical) and published a book in 1912 on the global history of pellagra and its connections to NC. This book mentions patients of his who were also treated in Raleigh. So far, this publication stands out as only one that directly addresses pellagra cases in NC.
I returned to the staff meeting minutes and admissions ledger and have found at least 4 interviews with patients diagnosed with pellagra. I have so far been able to OCR half of the minutes with ABBYY Fine Reader and will share those versions of the text as soon as I am able to finish the process.
Without wanting to write my paper into this email, I did want to share with you this research because I’d like to pivot a bit away from the Pitt County Williams family specifically and look at these cases (including one of the Williams sisters) and the larger context of pellagra in the hospital, including researching the patients’ diets as they may have been both inside and outside of the asylum and any records of care that might be found. (For example, did they build any separate housing for patients with pellagra as was practiced at other institutions?)
Hypothesis: Women would be more affected by pellagra among patient admissions because of greater nutrient insecurity.
In tables below: men="Cases" ; female="Controls"
tab sex pellagra, m
cc sex pellagra
local or=round(r(or),.01)
local ub=round(r(ub_or),.01)
local lb=round(r(lb_or),.01)
di "Odds ratio: `or' (95% CI: `lb', `ub')"
di "Inverse odds:"
di 1/`or'
About 3% of women had pellagra diagnoses, comapred to 1.4% of men. The odds ratio was 0.46 (95% CI: 0.32, 0.64), meaning that male gender was protective.
Hypothesis: Alcohol and pellagra may be associated with each other, in the similar way that certain nutritional disorders are noted in people with chronic excessive alcohol intake. (Caveat that we only have alcohol as a cause of attack.)
tab alcohol pellagra, m
cc alcohol pellagra
local or=round(r(or),.01)
local ub=round(r(ub_or),.01)
local lb=round(r(lb_or),.01)
di "Odds ratio: `or' (95% CI: `lb', `ub')"
di "Inverse odds:"
di 1/`or'
Intrepretation
There were n=161 pellagra cases identified. Alcohol-involved was n=281. Overlap was only n=3 patients (patientid: 6620, 4894, 5911), all 3 from 1909-1914. These three were middle-aged men, two also had cocaine or morphine involvement. All three died at the hospital; the 2 patients with other drug involvement within 3 days of admission, and the alcohol-only patient after about 3 weeks. Cause-of-death in verbatim free text: Gastro-Enteritis (Pellagra); Exhaustion of Pellagra; Pellagra. Their "form" as transcribed (akin to modern diagnosis codes) were: Toxic Mania; Drug Psychosis; Epileptic (respectively). The duration of illness prior to admission was reported to be 4 years, 1 week and 2 weeks. So, it seems like at least for one patient (4894, a salesman from Carteret County) there appeared to be long-term morphine and/or alcohol use, presenting in an agitated/altered state at Dix, followed by a short stay, whereupon he died of pellagra-related complications. I can't tell from these data whether the etiology of pellagra involved alcohol or not in the way that patients present in the modern day.
(There are a series of data caveats and assumption that we are working to articulate so please treat these as quantitative anecdote right now! In general we are operating under a "high specificity, unknown sensitivity, definitional plasticity" model.)
Interestingly this article from Tryon, NC mentions a combination of moonshine and herbs as an (effective) treatment for pellagra, but it seems like the moonshine would have been used as a solvent/carrier here more than than a libation. There is also the long-standing tradition of putting fruit in moonshine in NC but I have no idea if niacin concentration or bioavailability was meaningful.
Methods
Exposure: alcohol=1 represents whether the "supposed cause of attack" (e.g., the reason they were admitted to Dix) was related to alcohol consumption; Sarah's team standardized the different verbatim permutations and I ran regular expression text matching on that.
Outcome: pellagra=1 represents whether either the "supposed cause of attack" or mentioned in the free text notes or if the "form" of mental disease was explicitly attributable as pellagra (only n=2 cases both from 1914; this may have been an emergent diagnostic practice).
Hypothesis: Pellagra patients would be older age and the rest of the patients.
// See if differences by pellagra status and age
bysort pellagra: summarize age
qui distinct patientid if age==.
local miss= r(ndistinct)
vioplot age, over(pellagra) ytitle("Percent of Patients") obs(alt) graphregion(color(white)) bgcolor(white) note("Caution: Age missing in `miss' patients not represented above")
Patients with a pellagra diagnosis (right) had the same average age as the rest of the patient population (left). But, the spread was clustered more around middle age.