Dix Hospital Ledger data cleaning and basic descriptives

This datalab notebook describes the process of cleaning the Dix Intake Ledger, from Raleigh, North Carolina.
The code was written in Stata MP v16.

By: Nabarun Dasgupta (nab@unc.edu)



Import

In [1]:
display "Notebook generated on $S_DATE at $S_TIME ET"
Notebook generated on 24 Feb 2020 at 15:05:33 ET
In [2]:
cd "/Users/nabarun/Dropbox/Projects/Dix Park Intake/"
use DixLedgerDeidentified_clean, clear
qui: describe, f
/Users/nabarun/Dropbox/Projects/Dix Park Intake



Variable Construction

In [3]:
// Space for exploratory variable creation
* gen VAR = regexm(lower(TEXT),"token|token")
* table year if war==1, c(sum war) col

Univariate Exploration

Dates of Admission and Discharge

In [4]:
graph dot (sum) counter, over(decade) vertical title("Number of Patients Admitted") ytitle("Number of Admissions by Decade") graphregion(color(white)) bgcolor(white) scale(1.4)
mdesc decade
tab decade




    Variable    |     Missing          Total     Percent Missing
----------------+-----------------------------------------------
         decade |          31          7,479           0.41
----------------+-----------------------------------------------


  Decade of |
  admission |      Freq.     Percent        Cum.
------------+-----------------------------------
      1850s |        340        4.56        4.56
      1860s |        541        7.26       11.83
      1870s |        430        5.77       17.60
      1880s |        762       10.23       27.83
      1890s |      1,274       17.11       44.94
      1900s |      1,708       22.93       67.87
      1910s |      2,393       32.13      100.00
------------+-----------------------------------
      Total |      7,448      100.00

The number of patients started to climb in the 1880s and increased substantially for the next decades.


In [5]:
graph dot (sum) counter, over(dayofweek) vertical title("Day of Week of Admission") ytitle("Number of Admissions") graphregion(color(white)) bgcolor(white) scale(1.4)
mdesc dayofweek




    Variable    |     Missing          Total     Percent Missing
----------------+-----------------------------------------------
      dayofweek |          31          7,479           0.41
----------------+-----------------------------------------------
In [6]:
tab admitmonth pellagra
  Month of |       pellagra
 admission |         0          1 |     Total
-----------+----------------------+----------
         J |       575         13 |       588 
         F |       517          7 |       524 
         M |       580          7 |       587 
         A |       830         17 |       847 
         M |       660         17 |       677 
         J |       598         11 |       609 
         J |       571         20 |       591 
         A |       608         14 |       622 
         S |       608         14 |       622 
         O |       524         10 |       534 
         N |       608         21 |       629 
         D |       609          9 |       618 
-----------+----------------------+----------
     Total |     7,288        160 |     7,448 

Admissions peaked on Tuesdays and were lowest on Sunday.



In [7]:
graph dot (sum) counter, over(admitmonth) vertical title("Month of Admission") ytitle("Number of Admissions") graphregion(color(white)) bgcolor(white) scale(1.4)
mdesc admitmonth




    Variable    |     Missing          Total     Percent Missing
----------------+-----------------------------------------------
     admitmonth |          31          7,479           0.41
----------------+-----------------------------------------------

April was the month with the most admissions.



Age

Histogram of age distribution at time of admission

In [8]:
* Age at intake histogram
hist age, width(5) freq graphregion(color(white)) bgcolor(white) note("Caution: missing age in `miss' (`pct'%) of patients")
(bin=18, start=0, width=5)

In [9]:
bysort decade: summ age
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-> decade = 1850s

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         age |        301    35.15282    11.13178         17         67

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-> decade = 1860s

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         age |        476    35.83403    12.96375         13         85

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-> decade = 1870s

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         age |        427    34.55738    12.25773         12         78

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-> decade = 1880s

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         age |        751    38.11917    13.27175          8         81

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-> decade = 1890s

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         age |      1,252    38.49281    13.85457          7         83

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-> decade = 1900s

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         age |      1,662     39.4284    14.26474          8         84

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-> decade = 1910s

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         age |      2,359    39.25011    15.50522          0         90

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-> decade = .

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         age |          3    51.66667    25.14624         23         70

In [10]:
graph dot (mean) age, over(decade) vertical title("Mean Age at Admission") ytitle("Age in Years") graphregion(color(white)) bgcolor(white) scale(1.4)

Median distribution was nearly identical and is not shown. Mean age at admission rose slightly beyond in the 1880s, from 35 to 39 years, as the overall population of the hospital increased.


Weighting of Patient Population by Decade

We noticed above that the absolute number of patients admitted increased substantially in the early 1900s. Therefore, in order to visualize trends over time, we are created weights for each indivual to adjust for the secular trend of increasing patient volume.

Imaginge a scenario where there were 100 patients admitted in Decade A, and ten times as many patients in Decade B (n=1000). The weight for patients in Decade A would be 1/100 or 0.01. The weight for Decade B would be 0.001. Visually, each patient in Decade A would represent 10 patients in Decade B. After generating them, we muliplied the weights by 100 to make the vertical axis easier to interpret as percent of admissions by deacde.

Check out the weighted and unweighted data in the example below where we evalute whether there were changes over time in marital status among patients at admission. For single patients, the number (unweighted) and percent (weighted) follow.


Marital status

In [11]:
tab marital, m
    Marital status at |
  admission (recoded) |      Freq.     Percent        Cum.
----------------------+-----------------------------------
               Single |      3,224       43.11       43.11
              Married |      3,380       45.19       88.30
              Widowed |        685        9.16       97.46
Separated or Divorced |          5        0.07       97.53
                    . |        185        2.47      100.00
----------------------+-----------------------------------
                Total |      7,479      100.00
In [12]:
graph dot (sum) counter if marital==1, over(decade) vertical title("Single Patients Admitted") ytitle("NUMBER of Admissions by Decade") graphregion(color(white)) bgcolor(white) scale(1.4)
In [13]:
graph dot (sum) weight if marital==1, over(decade) vertical title("Single Patients Admitted") ytitle("% of Admissions by Decade") graphregion(color(white)) bgcolor(white) scale(1.4)

From the top graph alone we would concluse that the number of single patients increased as the overall population of the hospital increased. Although this is ture, the weighted data (second graph above) show that the proportion of single patients decreased overtime, nearly half in the 1900s as it was in the 1850s.



Gender

In [14]:
tab gender, m
     gender |      Freq.     Percent        Cum.
------------+-----------------------------------
            |        123        1.64        1.64
          F |      3,525       47.13       48.78
          M |      3,831       51.22      100.00
------------+-----------------------------------
      Total |      7,479      100.00
In [15]:
graph dot (sum) weight if gender=="F", over(decade) vertical ytitle("% of Admissions Women") graphregion(color(white)) bgcolor(white) scale(1.4)

The percent of women increased steadily through the late 1800s.



Length of Stay

Analysis to be done after variable cleanup

Occupation

December 1884 seems to be when the occupation starts being filled out regularly, if that helps pinpoint any changes to staff or policy that might have influenced the recording of this field. Right around the time that Broughton opens?

In [16]:
frame create temp
frame temp: use DixLedgerDeidentified_clean
frame change temp
gen flag=1 if occupationcleaned=="no entry" | occupationcleaned==""
collapse (sum) flag (count) patientid, by(year)
gen percent = (flag/patientid) *100
la var percent "Percent of Intakes with no Occupation Noted"
list year percent
line percent year, xtick(1855(5)1920)
frame change default
frame drop temp



(5,905 missing values generated)





     +-----------------+
     | year    percent |
     |-----------------|
  1. | 1856   33.96227 |
  2. | 1857   60.91954 |
  3. | 1858   81.81818 |
  4. | 1859   92.59259 |
  5. | 1860   88.88889 |
     |-----------------|
  6. | 1861   98.33334 |
  7. | 1862        100 |
  8. | 1863        100 |
  9. | 1864        100 |
 10. | 1865        100 |
     |-----------------|
 11. | 1866        100 |
 12. | 1867        100 |
 13. | 1868        100 |
 14. | 1869        100 |
 15. | 1870        100 |
     |-----------------|
 16. | 1871        100 |
 17. | 1872        100 |
 18. | 1873        100 |
 19. | 1874        100 |
 20. | 1875        100 |
     |-----------------|
 21. | 1876        100 |
 22. | 1877        100 |
 23. | 1878        100 |
 24. | 1879        100 |
 25. | 1880        100 |
     |-----------------|
 26. | 1881        100 |
 27. | 1882         98 |
 28. | 1883   65.21739 |
 29. | 1884   92.78351 |
 30. | 1885          0 |
     |-----------------|
 31. | 1886          0 |
 32. | 1887          0 |
 33. | 1888          0 |
 34. | 1889    17.3913 |
 35. | 1890          0 |
     |-----------------|
 36. | 1891          0 |
 37. | 1892          0 |
 38. | 1893          0 |
 39. | 1894   1.052632 |
 40. | 1895          0 |
     |-----------------|
 41. | 1896   1.257862 |
 42. | 1897   .5376344 |
 43. | 1898          0 |
 44. | 1899   1.818182 |
 45. | 1900   5.641026 |
     |-----------------|
 46. | 1901          0 |
 47. | 1902   .5952381 |
 48. | 1903   1.449275 |
 49. | 1904   1.612903 |
 50. | 1905   11.33333 |
     |-----------------|
 51. | 1906   1.282051 |
 52. | 1907   1.010101 |
 53. | 1908          0 |
 54. | 1909   .4219409 |
 55. | 1910   .3508772 |
     |-----------------|
 56. | 1911          0 |
 57. | 1912          0 |
 58. | 1913   1.824818 |
 59. | 1914          0 |
 60. | 1915   1.567398 |
     |-----------------|
 61. | 1916   .3597122 |
 62. | 1917   .9852217 |
 63. |    .   93.54839 |
     +-----------------+




Farming

In [17]:
graph dot (sum) weight if farmer==1, over(decade) vertical ytitle("% Farmer among patients") graphregion(color(white)) bgcolor(white) scale(1.4)

Repeat as proportion. From 1859-84 there were nearly no farmers recorded... ? Between 1908 (n=38) to 1909 (n=82) the number of farmers increased -- was there a facility expansion at the time? Check trends in general patient population. Occupation missing in this period?

Thomas: handwriting during late 1860s and 1870s was really bad. Different person interpreting the occupation. Occupation is missing in the 1861 to 1890s



Health Professionals

In [18]:
table year if healthpro==1, c(sum healthpro) col
-------------------------
Year of   |
admission | sum(health~o)
----------+--------------
     1856 |             1
     1857 |             2
     1859 |             1
     1884 |             1
     1886 |             3
     1888 |             1
     1889 |             1
     1890 |             5
     1891 |             2
     1892 |             2
     1893 |             2
     1894 |             1
     1895 |             2
     1896 |             1
     1897 |             4
     1898 |             2
     1900 |             3
     1901 |             1
     1902 |             4
     1904 |             1
     1905 |             2
     1906 |             1
     1907 |             3
     1908 |             2
     1909 |             3
     1910 |             1
     1911 |             1
     1912 |             1
     1913 |             2
     1914 |             5
     1915 |             3
     1916 |             2
     1917 |             1
-------------------------
In [19]:
graph dot (count) patientid if healthpro==1, over(decade) vertical ytitle("Number of Health Professionals") graphregion(color(white)) bgcolor(white) scale(1.4)

Final Disposition

In [20]:
graph dot (sum) weight if dead==1, over(decade) vertical title("Death as Final Disposition") ytitle("% of Admissions") graphregion(color(white)) bgcolor(white) scale(1.4)
In [21]:
graph dot (sum) weight if transfer==1, over(disdecade) vertical title("Transfers by Decade of Admission") ytitle("% of Discharges as Transfers") graphregion(color(white)) bgcolor(white) scale(1.4)

Suicide

Suicide could be a baseline condition noted in the ledger.

In [22]:
tab intakecondition, m sort

gen suiciderecode = regexm(lower(intakecondition),"suic")
tab decade suiciderecode if suiciderecode==1

               Intake Condition |      Freq.     Percent        Cum.
--------------------------------+-----------------------------------
                                |      5,144       68.78       68.78
                       Suicidal |      1,177       15.74       84.52
                     Hereditary |        739        9.88       94.40
           Hereditary; Suicidal |        296        3.96       98.36
                      Puerperal |         70        0.94       99.29
            Puerperal; Suicidal |         33        0.44       99.73
          Hereditary; Puerperal |         13        0.17       99.91
Hereditary; Puerperal; Suicidal |          7        0.09      100.00
--------------------------------+-----------------------------------
                          Total |      7,479      100.00



           | suiciderec
 Decade of |    ode
 admission |         1 |     Total
-----------+-----------+----------
     1860s |         5 |         5 
     1870s |        31 |        31 
     1880s |        82 |        82 
     1890s |       320 |       320 
     1900s |       401 |       401 
     1910s |       665 |       665 
-----------+-----------+----------
     Total |     1,504 |     1,504 

Suicides reported AT hospital, as noted in Ledger.

In [23]:
table year if suicide==1, c(sum suicide) col
------------------------
Year of   |
admission | sum(suicide)
----------+-------------
     1908 |            1
     1909 |            2
     1913 |            1
     1915 |            1
------------------------

Substance Use Disorder time trends

In [24]:
graph dot (sum) weight if cocaine==1, over(decade) vertical title("Cocaine as Cause of Attack") ytitle("% of Admissions") graphregion(color(white)) bgcolor(white) scale(1.4)
table year if cocaine==1, c(sum cocaine) col




------------------------
Year of   |
admission | sum(cocaine)
----------+-------------
     1898 |            1
     1900 |            1
     1902 |            1
     1903 |            1
     1904 |            1
     1910 |            1
     1913 |            2
     1914 |            2
------------------------

There were relative few cocaine-as-cause admissions.


In [25]:
graph dot (sum) weight if opiate==1, over(decade) vertical title("Opiate as Cause of Attack") ytitle("% of Admissions") graphregion(color(white)) bgcolor(white) scale(1.4)
table year if opiate==1, c(sum opiate) col




------------------------
Year of   |
admission | sum(opiates)
----------+-------------
     1858 |            2
     1861 |            2
     1864 |            1
     1871 |            1
     1877 |            1
     1884 |            1
     1885 |            1
     1887 |            1
     1889 |            2
     1890 |            1
     1891 |            1
     1892 |            3
     1893 |            2
     1894 |            1
     1895 |            6
     1896 |            2
     1897 |            8
     1898 |           12
     1899 |            3
     1900 |            4
     1901 |            4
     1902 |            5
     1903 |            6
     1904 |            6
     1905 |            2
     1906 |            1
     1907 |            3
     1908 |            1
     1909 |            3
     1910 |            5
     1912 |            1
     1913 |            4
     1914 |            7
     1915 |           11
     1916 |            5
     1917 |            3
------------------------

Hypothesis: Patients with opiate disorders were more likely to be health professionals (with access to morphine and opium).

In [26]:
tab healthpro opiate, m
cc healthpro opiate

local or=round(r(or),.1)
local ub=round(r(ub_or),.1)
local lb=round(r(lb_or),.1)
di "Odds ratio: `or' (95% CI: `lb', `ub')"

  inferred |
    health |
profession |
        al |        opiates
occupation |        No        Yes |     Total
-----------+----------------------+----------
         0 |     7,304        108 |     7,412 
         1 |        53         14 |        67 
-----------+----------------------+----------
     Total |     7,357        122 |     7,479 

                                                         Proportion
                 |   Exposed   Unexposed  |      Total     Exposed
-----------------+------------------------+------------------------
           Cases |        14          53  |         67       0.2090
        Controls |       108        7304  |       7412       0.0146
-----------------+------------------------+------------------------
           Total |       122        7357  |       7479       0.0163
                 |                        |
                 |      Point estimate    |    [95% Conf. Interval]
                 |------------------------+------------------------
      Odds ratio |         17.86443       |    8.862705     33.8274 (exact)
 Attr. frac. ex. |         .9440228       |    .8871676    .9704382 (exact)
 Attr. frac. pop |         .1972585       |
                 +-------------------------------------------------
                               chi2(1) =   156.36  Pr>chi2 = 0.0000




Odds ratio: 17.9 (95% CI: 8.9, 33.8)

Hypothesis confirmed: Patients with opiates listed as cause-of-attack had an odds of 18.4 (95% CI: 9.1, 34.8) of being a health professional. As a time trend, opiate diagnoses/admissions spiked in the 1890s... why? Was this a period of increased availability of morphine and opium via patent medicines?


In [27]:
graph dot (sum) weight if alcohol==1, over(decade) vertical title("Alcohol as Cause of Attack") ytitle("% of Admissions") graphregion(color(white)) bgcolor(white) scale(1.4)

Alcohol was not diagnosed as a cause prior to the 1870s.

War as Cause of Attack

In [28]:
// Space for exploratory variable creation
gen compfarm = regexm(lower(occupationcleaned),"farm|labor|agri")
* table year if war==1, c(sum war) col
In [29]:
tab compfarm decade, m
           |                                   Decade of admission
  compfarm |     1850s      1860s      1870s      1880s      1890s      1900s      1910s          . |     Total
-----------+----------------------------------------------------------------------------------------+----------
         0 |       244        532        430        541        852      1,214      1,761         30 |     5,604 
         1 |        96          9          0        221        422        494        632          1 |     1,875 
-----------+----------------------------------------------------------------------------------------+----------
     Total |       340        541        430        762      1,274      1,708      2,393         31 |     7,479 
In [30]:
table year if war==1, c(sum war) col
----------------------
Year of   |
admission |   sum(war)
----------+-----------
     1862 |          8
     1863 |          5
     1864 |          6
     1865 |          6
     1866 |          5
     1867 |          3
     1868 |          2
     1891 |          1
     1895 |          1
     1898 |          1
     1899 |          1
     1915 |          1
----------------------

Overwork

In [31]:
graph dot (sum) weight if overwork==1, over(decade) vertical title("Overwork as Cause of Attack") ytitle("% of Admissions") graphregion(color(white)) bgcolor(white) scale(1.4)
In [32]:
tab overwork farmer
           |     infer farmer
           |      occupation
  overwork |         0          1 |     Total
-----------+----------------------+----------
        No |     5,809      1,539 |     7,348 
       Yes |       106         25 |       131 
-----------+----------------------+----------
     Total |     5,915      1,564 |     7,479 

Pregnancy

In [33]:
graph dot (sum) weight if pregnancy==1, over(decade) vertical title("Pregnancy as Cause of Attack") ytitle("% of Admissions") graphregion(color(white)) bgcolor(white) scale(1.4)

Menopause

In [34]:
graph dot (sum) weight if menopause==1, over(decade) vertical title("Menopause as Cause of Attack") ytitle("% of Admissions") graphregion(color(white)) bgcolor(white) scale(1.4)

Syphilis

In [35]:
graph dot (count) patientid if syphilis==1, over(decade) vertical title("Syphilis as Cause of Attack") ytitle("NUMBER of Admissions") graphregion(color(white)) bgcolor(white) scale(1.4)

Hereditary

In [36]:
graph dot (sum) weight if hereditary==1, over(decade) vertical title("Heredity as Cause of Attack") ytitle("% of Admissions") graphregion(color(white)) bgcolor(white) scale(1.4)

Age and Gender

In [37]:
// See if differences by gender and age
bysort gender: summarize age
qui distinct patientid if age==. | gender==""
local miss= r(ndistinct)

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-> gender = 

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         age |         77    42.06494    14.54425         16         75

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-> gender = F

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         age |      3,438    37.83406    13.43668          5         90

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-> gender = M

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         age |      3,716    38.79925     15.0076          0         87



In [38]:
vioplot age, over(gender) ytitle("Percent of Patients") obs(alt) graphregion(color(white)) bgcolor(white) note("Caution: Age or gender missing in `miss' patients not represented above")

Intrepretation: Female patients at entry were clustered around mid-30s of age, while males tended to be a decade younger but also extending into later age.

In [39]:
graph dot (sum) weight if marital==1 & gender=="F", over(decade) vertical title("Single Women Admitted") ytitle("% of Admissions") graphregion(color(white)) bgcolor(white) scale(1.4)

The percent of single women admitted peaked in 1880s and decreased through the following decades as the asylum increasingly became a place for elderly indigent patients.


Masturbation

Hyopothesis: Diagonses of masturbation would have been more attached to men than women.

In [40]:
graph dot (sum) weight if masturbation==1, over(decade) vertical title("Masturbation as Cause-of-Attack") ytitle("% of Admissions") graphregion(color(white)) bgcolor(white) scale(1.4)

Hypothesis: The masturbation diagnosis was applied more often to men.

In [41]:
tab sex masturbation, m
cc sex masturbation

local or=round(r(or),.1)
local ub=round(r(ub_or),.1)
local lb=round(r(lb_or),.1)
di "Odds ratio: `or' (95% CI: `lb', `ub')"

           |     masturbation
  0=female |        No        Yes |     Total
-----------+----------------------+----------
    female |     3,514         11 |     3,525 
      male |     3,604        227 |     3,831 
         . |       121          2 |       123 
-----------+----------------------+----------
     Total |     7,239        240 |     7,479 

                                                         Proportion
                 |   Exposed   Unexposed  |      Total     Exposed
-----------------+------------------------+------------------------
           Cases |       227        3604  |       3831       0.0593
        Controls |        11        3514  |       3525       0.0031
-----------------+------------------------+------------------------
           Total |       238        7118  |       7356       0.0324
                 |                        |
                 |      Point estimate    |    [95% Conf. Interval]
                 |------------------------+------------------------
      Odds ratio |         20.12103       |    11.00606     40.9402 (exact)
 Attr. frac. ex. |         .9503007       |    .9091409    .9755741 (exact)
 Attr. frac. pop |         .0563086       |
                 +-------------------------------------------------
                               chi2(1) =   184.76  Pr>chi2 = 0.0000




Odds ratio: 20.1 (95% CI: 11, 40.90000000000001)

As expected, the men had 20 higher odds of having masturbation listed as a cause. Only 11 women had this diagnosis. Let's look into who they were.

In [42]:
table year if sex==0 & masturbation==1, c(sum masturbation) col
summ age if sex==0 & masturbation==1
tab patientid if sex==0 & masturbation==1 & age==12

-------------------------
Year of   |
admission | sum(mastur~n)
----------+--------------
     1867 |             1
     1869 |             1
     1878 |             1
     1881 |             1
     1891 |             1
     1897 |             1
     1898 |             1
     1900 |             1
     1905 |             1
     1909 |             1
     1910 |             1
-------------------------


    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         age |         11    28.18182    10.32297         12         47


 Patient ID |      Freq.     Percent        Cum.
------------+-----------------------------------
       5125 |          1      100.00      100.00
------------+-----------------------------------
      Total |          1      100.00



Hypothesis: Masturbation diagnoses would be more common in unmarried patients.

In [43]:
tab marital masturbation, m
cc marital masturbation

    Marital status at |     masturbation
  admission (recoded) |        No        Yes |     Total
----------------------+----------------------+----------
               Single |     3,003        221 |     3,224 
              Married |     3,366         14 |     3,380 
              Widowed |       684          1 |       685 
Separated or Divorced |         5          0 |         5 
                    . |       181          4 |       185 
----------------------+----------------------+----------
                Total |     7,239        240 |     7,479 


                 | masturbation           |             Proportion
                 |   Exposed   Unexposed  |      Total     Exposed
-----------------+------------------------+------------------------
           Cases |       236        7058  |       7294       0.0324
        Controls |         0           0  |          0            .
-----------------+------------------------+------------------------
           Total |       236        7058  |       7294       0.0324
                 |                        |
                 |      Point estimate    |    [95% Conf. Interval]
                 |------------------------+------------------------
      Odds ratio |                .       |           0           . (Cornfield)
 Attr. frac. ex. |                .       |           .           . (Cornfield)
 Attr. frac. pop |                .       |
                 +-------------------------------------------------
                               chi2(1) =        .  Pr>chi2 =      .

 Note: Exact confidence levels not possible with zero count cells.


Recode marital to dichotomous for single:

In [44]:
gen single=0
    replace single=1 if marital==1
        replace single=. if marital==.
        
tab single masturbation, m
cc single masturbation

(3,224 real changes made)

(185 real changes made, 185 to missing)


           |     masturbation
    single |        No        Yes |     Total
-----------+----------------------+----------
         0 |     4,055         15 |     4,070 
         1 |     3,003        221 |     3,224 
         . |       181          4 |       185 
-----------+----------------------+----------
     Total |     7,239        240 |     7,479 

                                                         Proportion
                 |   Exposed   Unexposed  |      Total     Exposed
-----------------+------------------------+------------------------
           Cases |       221        3003  |       3224       0.0685
        Controls |        15        4055  |       4070       0.0037
-----------------+------------------------+------------------------
           Total |       236        7058  |       7294       0.0324
                 |                        |
                 |      Point estimate    |    [95% Conf. Interval]
                 |------------------------+------------------------
      Odds ratio |         19.89466       |    11.76779    36.21715 (exact)
 Attr. frac. ex. |         .9497353       |    .9150223    .9723888 (exact)
 Attr. frac. pop |         .0651028       |
                 +-------------------------------------------------
                               chi2(1) =   241.74  Pr>chi2 = 0.0000

As expected, masturbation was a diagnosis of singles.


Pellagra analyses

In [45]:
graph dot (sum) weight if pellagra==1, over(decade) vertical title("Pellagra as Cause of Death") ytitle("% of Deaths") graphregion(color(white)) bgcolor(white) scale(1.4)
In [46]:
table year if pellagra==1, c(sum pellagra) col
-------------------------
Year of   |
admission | sum(pellagra)
----------+--------------
     1885 |             1
     1892 |             1
     1897 |             2
     1898 |             3
     1902 |             3
     1904 |             4
     1905 |             1
     1906 |             3
     1907 |             1
     1908 |             5
     1909 |             7
     1910 |            10
     1911 |            15
     1912 |            11
     1913 |            11
     1914 |            26
     1915 |            34
     1916 |            15
     1917 |             7
-------------------------

Pellagra cases were noted with greater frequency after 1910. Was this due to increased case finding (e.g., diagnostic suspicion bias) or a medical phenomenon as corn became more prevelant in agriculture?

From Bobby: I spent a bit of time this morning looking for more research/sources on pellagra. Here’s one I found very useful:

https://www-nber-org.libproxy.lib.unc.edu/papers/w23730

It is pretty current (2017). It makes the case that the specific mechanism by which one contracted pellagra wasn’t a part of standard medical knowledge until the 1930s, and laws requiring the “fortification” of foods associated with pellagra weren’t passed until the 1940s in the South. The article also helps to explain why farm families contracted pellagra, not just “urban” cotton mill workers: the prevalence of mono-culture (cotton or tobacco), which crowded out land upon which fresh produce could be produced. They use the boll weevil infestation to make their case. They also tell us more about the way the disease develops—it takes about 6 months to develop symptomatically, for example, so was regarded as a seasonal disease (spring!) and say a bit about mortality. Fascinating!

So, we have two populations of patients with whom pellagra is associated: one of patients who are admitted presumably because they presented with the tell-tale symptomatic expressions of active pellagra: dermatitis, and “dementia.” The admissions ledgers would tell us what happened to these patients: as I understand it, if caught early enough pellagra can be “cured” simply by re-supplying the nutrient missing from their diet. So, if the hospital diet did contain sufficient amounts of niacin, some, at least, of these patients would have be “cured” regardless of any other therapies that might have been tried. It would be interesting to follow some of the “cured” population to see how they fared back at home.

The other, and a more puzzling group who contract pellagra after long periods of confinement in the hospital itself. Their contracting of pellagra, it would seem, must have been connected to what they were being fed. So we have to wonder why some patients in the hospital contracted pellagra while others did not. Were there some who were more vulnerable because their bodies were less able to metabolize niacin? Is pellagra associated with age? We know that by far “ill-health” was much more associated with women (at the time of admission) than men. Also, this is puzzling because we know that hospital during this period produced much of its own food. Was this a matter of trying to economize by making fresh food go further or relying on cheaper alternatives to fresh food. The superintendents’ reports would be important here: they contain summaries of the kinds and amounts of food produced.

From Hannah: I met with Sarah this morning and updated her on the research I’ve been conducting on the 3 sisters and their illnesses. I had decided to start with Fannie, who succumbed to pellagra in 1918. Over the weekend, I looked at the Goldberger papers and other materials searching for connections between the federal government’s studies of pellagra and what might have been happening in North Carolina at the time.

What I have found so far is a number of gaps with regard to NC in the dominant US pellagra narrative: evidence that Goldberger did visit Goldsboro, of which he gives a brief but clear and description, but no clear evidence that he was ever sent to Raleigh. Likewise a lack of evidence so far of the Raleigh doctors’ participation in the national debates (in the form of publications and at least 3 conferences), though doctors from Durham, Charlotte, and Asheville were published and did present at these conferences. Meanwhile, the head of the NC Board of Health conducted at least one study of pellagra in Yancey County (perhaps because some thought the disease was tropical) and published a book in 1912 on the global history of pellagra and its connections to NC. This book mentions patients of his who were also treated in Raleigh. So far, this publication stands out as only one that directly addresses pellagra cases in NC.

I returned to the staff meeting minutes and admissions ledger and have found at least 4 interviews with patients diagnosed with pellagra. I have so far been able to OCR half of the minutes with ABBYY Fine Reader and will share those versions of the text as soon as I am able to finish the process.

Without wanting to write my paper into this email, I did want to share with you this research because I’d like to pivot a bit away from the Pitt County Williams family specifically and look at these cases (including one of the Williams sisters) and the larger context of pellagra in the hospital, including researching the patients’ diets as they may have been both inside and outside of the asylum and any records of care that might be found. (For example, did they build any separate housing for patients with pellagra as was practiced at other institutions?)

Pellagra and Gender

Hypothesis: Women would be more affected by pellagra among patient admissions because of greater nutrient insecurity.

In tables below: men="Cases" ; female="Controls"

In [47]:
tab sex pellagra, m
cc sex pellagra

local or=round(r(or),.01)
local ub=round(r(ub_or),.01)
local lb=round(r(lb_or),.01)
di "Odds ratio: `or' (95% CI: `lb', `ub')"
di "Inverse odds:"
di 1/`or'

           |       pellagra
  0=female |         0          1 |     Total
-----------+----------------------+----------
    female |     3,420        105 |     3,525 
      male |     3,778         53 |     3,831 
         . |       120          3 |       123 
-----------+----------------------+----------
     Total |     7,318        161 |     7,479 

                                                         Proportion
                 |   Exposed   Unexposed  |      Total     Exposed
-----------------+------------------------+------------------------
           Cases |        53        3778  |       3831       0.0138
        Controls |       105        3420  |       3525       0.0298
-----------------+------------------------+------------------------
           Total |       158        7198  |       7356       0.0215
                 |                        |
                 |      Point estimate    |    [95% Conf. Interval]
                 |------------------------+------------------------
      Odds ratio |         .4569311       |    .3209839    .6441054 (exact)
 Prev. frac. ex. |         .5430689       |    .3558946    .6790161 (exact)
 Prev. frac. pop |         .0161765       |
                 +-------------------------------------------------
                               chi2(1) =    22.23  Pr>chi2 = 0.0000




Odds ratio: .46 (95% CI: .32, .64)

Inverse odds:

2.173913

About 3% of women had pellagra diagnoses, comapred to 1.4% of men. The odds ratio was 0.46 (95% CI: 0.32, 0.64), meaning that male gender was protective.


Hypothesis: Alcohol and pellagra may be associated with each other, in the similar way that certain nutritional disorders are noted in people with chronic excessive alcohol intake. (Caveat that we only have alcohol as a cause of attack.)

In [48]:
tab alcohol pellagra, m
cc alcohol pellagra

local or=round(r(or),.01)
local ub=round(r(ub_or),.01)
local lb=round(r(lb_or),.01)
di "Odds ratio: `or' (95% CI: `lb', `ub')"
di "Inverse odds:"
di 1/`or'

           |       pellagra
   alcohol |         0          1 |     Total
-----------+----------------------+----------
        No |     7,018        157 |     7,175 
       Yes |       300          4 |       304 
-----------+----------------------+----------
     Total |     7,318        161 |     7,479 

                                                         Proportion
                 |   Exposed   Unexposed  |      Total     Exposed
-----------------+------------------------+------------------------
           Cases |         4         300  |        304       0.0132
        Controls |       157        7018  |       7175       0.0219
-----------------+------------------------+------------------------
           Total |       161        7318  |       7479       0.0215
                 |                        |
                 |      Point estimate    |    [95% Conf. Interval]
                 |------------------------+------------------------
      Odds ratio |         .5960085       |    .1593402    1.572417 (exact)
 Prev. frac. ex. |         .4039915       |   -.5724172    .8406598 (exact)
 Prev. frac. pop |           .00884       |
                 +-------------------------------------------------
                               chi2(1) =     1.05  Pr>chi2 = 0.3047




Odds ratio: .6 (95% CI: .16, 1.57)

Inverse odds:

1.6666667

Intrepretation
There were n=161 pellagra cases identified. Alcohol-involved was n=281. Overlap was only n=3 patients (patientid: 6620, 4894, 5911), all 3 from 1909-1914. These three were middle-aged men, two also had cocaine or morphine involvement. All three died at the hospital; the 2 patients with other drug involvement within 3 days of admission, and the alcohol-only patient after about 3 weeks. Cause-of-death in verbatim free text: Gastro-Enteritis (Pellagra); Exhaustion of Pellagra; Pellagra. Their "form" as transcribed (akin to modern diagnosis codes) were: Toxic Mania; Drug Psychosis; Epileptic (respectively). The duration of illness prior to admission was reported to be 4 years, 1 week and 2 weeks. So, it seems like at least for one patient (4894, a salesman from Carteret County) there appeared to be long-term morphine and/or alcohol use, presenting in an agitated/altered state at Dix, followed by a short stay, whereupon he died of pellagra-related complications. I can't tell from these data whether the etiology of pellagra involved alcohol or not in the way that patients present in the modern day.

(There are a series of data caveats and assumption that we are working to articulate so please treat these as quantitative anecdote right now! In general we are operating under a "high specificity, unknown sensitivity, definitional plasticity" model.)

Interestingly this article from Tryon, NC mentions a combination of moonshine and herbs as an (effective) treatment for pellagra, but it seems like the moonshine would have been used as a solvent/carrier here more than than a libation. There is also the long-standing tradition of putting fruit in moonshine in NC but I have no idea if niacin concentration or bioavailability was meaningful.

Methods
Exposure: alcohol=1 represents whether the "supposed cause of attack" (e.g., the reason they were admitted to Dix) was related to alcohol consumption; Sarah's team standardized the different verbatim permutations and I ran regular expression text matching on that.

Outcome: pellagra=1 represents whether either the "supposed cause of attack" or mentioned in the free text notes or if the "form" of mental disease was explicitly attributable as pellagra (only n=2 cases both from 1914; this may have been an emergent diagnostic practice).


Pellagra and Age

Hypothesis: Pellagra patients would be older age and the rest of the patients.

In [49]:
// See if differences by pellagra status and age
bysort pellagra: summarize age
qui distinct patientid if age==.
local miss= r(ndistinct)

vioplot age, over(pellagra) ytitle("Percent of Patients") obs(alt) graphregion(color(white)) bgcolor(white) note("Caution: Age missing in `miss' patients not represented above")

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-> pellagra = 0

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         age |      7,075     38.3817    14.33303          0         90

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-> pellagra = 1

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         age |        156    38.07692    12.12491         16         78





Patients with a pellagra diagnosis (right) had the same average age as the rest of the patient population (left). But, the spread was clustered more around middle age.