The discovery of antibiotics had revolutionized the medicine and it had an immense impact on the standard of living of all people in the world. It allowed to effectively treat many infectious diseases caused by bacteria and to reduce their spread. Unfortunately, susceptibility to an antibiotic degrades with time, as the bacteria often mutate, and the evolution promotes resistant organisms. This effect is substantially augumented by mass-scale usage of the antibiotics, especially due to their frequent misuse. Recently, this problem has intensified  leading to emergence of antibiotic-resistant bacteria, which are not susceptible to some or most of the currently known antibiotics . Furthermore the development of new antibiotic is limited and recently has slowed down due to restrictive policies  and being not economically-sound for the pharmaceutical companies .
An important part of this problem, which is subject to this study, corresponds to hospital-acquired infections . It has been observed that multidrug-resistant Enterobacteriaceae infection rates grown considerably in Europe within recent years. While there are some preventive counter-measures [2, 3], they are focused mostly on the facility level, instead of network of facilities or whole healthcare system. Clearly, hospitals exchange patients directly due to transfers, but also indirectly due to for example geographical proximity. These patients provide means of pathogen transfer between the healthcare facilities . It is therefore beneficial, both in the sense of public health and economy, to focus on the multi-institutional or system-level preventive actions to control the pathogen spread [4, 14], as the individual hospital goals may be contradictory .
The simulations based on the network approach, where the healthcare facilities are considered as a base entities (network nodes), and the network edges corresponds to patient transfers, has already been successfully performed [6, 9, 5] showing that the cooperative approach may lead to better results than individual approach, at least in case of methicllinin-resistant Staphylococcus aureus .
The first step to perform such simulations is to build a realistic healthcare facility network. Unfortunately even in well-developed countries with extensive health care the (anonymized) patient transfer databases are not freely available. However, it may be possible, to use a database of patient admission/discharge to reconstruct the network connection between the healthcare facilities.
In this paper, we discuss the usage of such anonymized database, provided by AOK Lower Saxony, in context of derivation of the computer model of a regional healthcare system. The paper is organized as follows. In Sections 2 and 3 we briefly describe the database under consideration and the utilities used for its analysis. Then the analysis is performed as presented in Section 4. Finally, in Section 5 we recapitulate the main problems and we comment on building the network model from the data provided.
2 Description of data set
In the presented study, we consider anonymized patients data set provided by AOK Lower Saxony – a healthcare provider in a large federal state in Germany. Data set consist of 5 254 492 hospitalisation records of 1 673 247 patients (registered in AOK Lower Saxony) during the years 2008-2015. Data base consists in particular of the following information: patient anonymized ID, gender, anonymized healthcare facility ID, region code of healthcare facility, day of the admission, day of discharge, diagnosis (international ICD 10 code and numeric code). It should be mentioned that, due to the personal data protection, the information on geographical location of healthcare facilities has been removed from the provided database.
Within provided data set we have found 4 573 584 hospital/healthcare facility stay records for the facilities located in Lower Saxony with the numeric diagnosis code and 680 908 for other German federal states. Within these records, data related to healthcare facilities of any size were taken into account. Records without the diagnosis code were omitted in our analysis. Among them, there are 257 668 hospitalisation records for Lower Saxony facilities and 79 102 records for the remaining parts of Germany.
3 Data analysis tools
Provided data set consists of 5 254 492 records which makes it large. Thus, to perform the analysis described below we needed to process the data set provided by AOK Lower Saxony. From the admission and discharge data it was necessary to determine the duration of stays for each hospitalisation record. It was also necessary to group the records with the same patient identification number to detect existing overlaps (definition of the overlap will be provided in Subsection 4.4) in data set.
To do so we have developed code using Python programming language. In our code we used many widely available Python libraries, in particular matplotlib library for visualization. Since the code was supposed to analyse the provided data, it was developed with possibly flexible approach, such that it is possible to extend the analysis depending on need. Optimization of code was the secondary goal, as this is not supposed to be a high-performance simulation code at this stage, but a statistic/analytical tool.
The basic idea standing behind our application is as follows. The AOK hospitalisation database, provided as a CSV (comma-separated value) file are parsed and transformed into Python classes, providing high-level subroutines. Then they are stored in a data structured (lists, dictionaries), which were used to determine overlapping records and their statistics. Application of dictionaries allowed us to perform efficient operations of searching among millions of records.
At this moment, the computation time for this application is about 1 hour for current database size and it uses up to 10 GB of RAM memory, as all the structures are stored in RAM during processing, and only the results are stored permanently. In future versions, however, we tend to use on-disk databases, like for exampleSQLite, to overcome the problem of high memory consumption and to provide more flexibility in operating on the results.
4 Data analysis
Within all the records we have found data for 757 141 men with 1 up to 322 hospitalisations and 916 106 women with 1 up to 195 within years 2008–2015. The median and average number of entries per person for each sex are 2 and days for men and 2 and days for women.
Let focus on the characterisation of the healthcare facilities reported in the database. From Figure 1 it is clear that the analysis of the whole data set indicates that the majority healthcare facilities within particular years had between 1 and 9 admissions within the considered time periods. For the collectively calculated years 2008–2015 the majority of healthcare facilities had between 10 and 99 reported admissions. However, if we restrict our analysis only to the healthcare facilities located in the Lower Saxony we observe different trend – the majority healthcare facilities has between 10 000 and 99 999 admissions, for details see Figure 2. For separately treated years most Lower Saxony healthcare facilities had between 1 000 and 9 999 reported admissions.
4.2 Numbers of patients
On the other hand, if we look on the number of patients admitted to the hospitals, compare Figure 3 with Figure 4, we see that for healthcare facilities in all federal states the majority of hospitals had from one up to 10 patients in considered years and if we consider all years at once (Figure 3 (a)) most of hospitals registered within year 2008-2015 had from 10 to 99 patients. However, if we consider Lower Saxony healthcare facilities collectively the fraction of the hospitals having between 1 000 and 9 999 patients dominates.
To investigate the number of patients staying in the particular healthcare facilities more deeply we investigative the changes of the number of patients in time (from 2008 up to 2015) for all considered in Lower Saxony healthcare facilities. In Figure 5 we presents results for six biggest hospitals. Clearly, we observe some fluctuations and larger deviations in time of the number of patients, but that is rather natural phenomenon where larger deviations can be explained by Christmas time.
4.3 Durations of stays
We also investigated duration of reported stays of patients in particular healthcare facilities. In Figure 6 we present a histogram of the duration of the hospitalisations for all healthcare facilities (a) and those for Lower Saxony only (b).
The interesting issue is also duration of the patients stays outside healthcare facilities, understand as a time between hospitalisations spend in society, see Figure 7, and take that fact into account in the model of the hospital network.
What is interesting on can not see a big differences between the data for all considered healthcare facilities and those located in Lower Saxony only.
Analysing provided data for Lower Saxony healthcare facilities we found a number of overlapping records – 294 741 cases in total. By overlapping records we understand distinct sets of two or more records for a given patient, with non-empty intersection of stay periods, either within the same facility or in other facilities.
A vast majority of detected overlaps is related to simultaneous stay of some patients in at most two institutions, but in general there are cases when a patient is attributed to five stay records in the same day.
Performing more detailed analysis of the data we come up with the following classification:
standard transfer — one day overlap of two stay periods, where both periods are longer than one day and each record corresponds to different facility,
first day transfer/last day transfer — similar to above, but duration of the stay in one facility is exactly one day long and it coincides with admission to/discharge from the latter facility,
simultaneous two admissions in a single institution — two reported stays in the same place for the same period,
temporary transfer — two records, period of one of them is contained in the other, and admission and discharge dates are not the same,
simultaneous two entries in two different institutions — periods are exactly the same, but the facilities are different,
unknown two admissions in two different institutions — any two records for hospitalisations in different institutions, which is not covered by the cases already introduced,
two admissions in a single institution — two reported stays in the same institution but for different (overlapping) periods,
unknown multiple admissions () — more than two records of overlapping hospitalisation periods, with maximal number of records in a given day is .
In Figure 8 we present an exemplary visualization of the overlapping appearing in the data base.
For number of records and the percentage in all the overlaps see Table1.
In general, these overlapping hospitalisation records allow us to deduce direct transfers between healthcare facilities. Therefore we focus our analysis on the cases, where these transfers can be deduced with high probability of success using the provided hospitalisation periods. We are not interested in detailed analysis of minor problematic cases (unknown multiple entries ()). These are rare and they are often impossible to analyse reliably without study of individual cases.
|type||Overlap description||number of records|
|(% of all detected overlaps)|
|1||standard transfer||112 368 (38.1%)|
|2||two admissions in a single institution||69 788 (23.7%)|
|3||simultaneous two admissions in a single institution||69 577 (23.6%)|
|4||first day transfer||23 139 ( 7.9%)|
|5||temporary transfer||11 992 ( 4.1%)|
|6||unknown two admissions in two institutions||3 645 ( 1.2%)|
|7||unknown multiple admissions (3)||2 402 ( 0.8%)|
|8||last day transfer||1 208 ( 0.4%)|
|9||simultaneous two admissions in two institutions||606 ( 0.2%)|
|10||unknown multiple admissions (4+)||16 ( 0.0%)|
In general over 54% (160 764) of overlapping periods within years 2008–2015 intersect by one day, about 14% of them are 4-day overlaps (42 763), while both 3 (22 913) and 5-day overlapping records (25 409) are both around 8% of all overlaps, compare with Figure 9 (a). Interestingly 2-day overlaps are below 4% (11 542), which are less common that 6-day overlaps (over 5%, 15 027).
In order to deeply characterize the overlapping types we use a four-digit calcification. The truth is indicated by 1 while 0 means false. First digit indicate if two considered overlaps have place in the same healthcare facility, second digit: if overlaps have the same diagnoses, third: if two overlaps have the same admission dates and fourth if two overlaps have the same discharge dates. For example code 1100 simply means that two considered overlaps have been reported by the same healthcare facility, in both cases the diagnosis was the same, but there were different dates of admissions and discharges.
The results of our classification is presented in Table 2 and in Figure 10. The most frequently appearing overlap groups for the Lower Saxony healthcare facilities characterized by different declared hospitals are 0000 (different: facility, diagnose, admission and discharge date) 98 902 detected cases and 0100 (similar as 0000, but with the same diagnose) – 26 373 cases. Overlaps with the same admission dates but with different discharge dates are also quite frequent 0010 and 0110 – 19 242 and 6 415 cases, respectively.
In case of overlaps taking place in the same facility (first digit in 0-1 classification equal to 1), the most numerous cases are 1011 (different diagnosis the same admission and discharge date), 1001 (different diagnosis and admission dates, the same discharge date) and 1000 (different: diagnosis, admission and discharge dates) resulting in 69 536, 47 053 and 13 711 detected cases, respectively.
Each of four-digit set has been analysed further by assigning each diagnose into groups indexed by numbers according to the rules presented in Table 3. In Table 4 we summarise the most frequently appearing diagnosis within the particular types of overlaps. From the presented results we see that most frequent diagnosis for two overlapping records related to different healthcare facilities is disease of the circulatory system [9, 9]. Mental disorders [5, 5] or injuries [19, 19] are also the frequent problem in this case.
For the overlaps characterized by the same healthcare facility records the vast majority of cases are [15, ], related to birth and pregnancy. Indeed, the majority of overlapping records coming form the same facility is related to the fact that infants do not posses its own patient identification number and they are registered in the system under the identification number of one of the parents. Second, but much less abundant group of the same healthcare facility overlaps are [5, ] — mental disorders.
|1||A,B||Infectious and parasitic diseases|
|3||D||Diseases of the blood and blood-forming organs and certain disorders involving the immune mechanism|
|4||E||Endocrine, nutritional and metabolic diseases|
|5||F||Mental, Behavioral and Neurodevelopmental disorders|
|6||G||Diseases of the nervous system|
|7||H||Diseases of the eye and adnexa|
|8||H||Diseases of the ear and mastoid process|
|9||I||Diseases of the circulatory system|
|10||J||Diseases of the respiratory system|
|11||K||Diseases of the digestive system|
|12||L||Diseases of the skin and subcutaneous tissue|
|13||M||Diseases of the musculoskeletal system and connective tissue|
|14||N||Diseases of the genitourinary system|
|15||O||Pregnancy, childbirth and the puerperium|
|16||P||Certain conditions originating in the perinatal period|
|17||Q||Congenital malformations, deformations and chromosomal abnormalities|
Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere classified
|19||S,T||Injury, poisoning and certain other consequences of external causes|
|21||Z||Factors influencing health status and contact with health services|
|[9, 9]||19509||[5, 5]||194||[9, 9]||4412||[5, 5]||80||[9, 9]||12107||[9, 9]||97||[9, 9]||3223||[9, 9]||62|
|[5, 5]||6568||[9, 9]||155||[5, 5]||2319||[9, 9]||79||[19, 19]||3355||[5, 5]||50||[19, 19]||907||[5, 5]||48|
|[5, 19]||5210||[5, 19]||100||[19, 19]||1090||[5, 19]||39||[2, 2]||2642||[2, 2]||19||[5, 5]||453||[19, 19]||15|
|[2, 2]||4170||[5, 18]||53||[14, 14]||939||[6, 18]||19||[5, 5]||1860||[19, 19]||14||[14, 14]||387||[18, 18]||12|
|[19, 19]||2946||[9, 18]||43||[5, 19]||782||[5, 18]||18||[11, 11]||1073||[10, 10]||10||[11, 11]||321||[6, 6]||11|
|[5, 9]||2608||[5, 9]||40||[15, 15]||725||[9, 18]||17||[14, 14]||958||[14, 14]||8||[15, 15]||217||[15, 15]||5|
|[10, 10]||2525||[5, 11]||29||[11, 11]||695||[18, 18]||15||[13, 13]||953||[13, 13]||7||[18, 18]||178||[14, 14]||5|
|[14, 14]||2282||[2, 2]||29||[6, 9]||539||[14, 14]||14||[10, 10]||857||[11, 11]||6||[10, 10]||144||[11, 11]||4|
|[5, 18]||2189||[5, 6]||26||[6, 6]||434||[6, 6]||13||[6, 6]||601||[18, 18]||5||[6, 6]||132||[17, 17]||2|
|[11, 11]||2179||[14, 14]||24||[10, 10]||434||[5, 6]||13||[1, 1]||363||[6, 6]||5||[2, 2]||103||[10, 10]||2|
|[5, 5]||3877||[15, 21]||41482||[15, 21]||288||[15, 21]||63052||[5, 5]||5539||[5, 5]||81||[5, 5]||71||[16, 16]||23|
|[5, 19]||1200||[15, 16]||4814||[15, 16]||271||[15, 16]||5760||[2, 2]||1034||[10, 10]||41||[2, 2]||22||[21, 21]||5|
|[5, 9]||653||[15, 17]||344||[5, 5]||192||[15, 17]||516||[6, 6]||132||[2, 2]||35||[15, 15]||17||[15, 15]||4|
|[5, 11]||525||[5, 5]||52||[5, 19]||161||[15, 18]||44||[9, 9]||118||[13, 13]||8||[9, 9]||6||[5, 5]||3|
|[2, 2]||494||[5, 19]||41||[15, 15]||80||[21, 21]||40||[15, 15]||89||[9, 9]||6||[12, 12]||5||[11, 11]||1|
|[5, 18]||468||[15, 18]||33||[14, 21]||55||[8, 15]||24||[13, 13]||80||[8, 8]||5||[16, 16]||2||[12, 12]||1|
|[5, 6]||396||[10, 21]||28||[5, 18]||46||[2, 15]||13||[19, 19]||50||[15, 15]||5||[6, 6]||2||[14, 14]||1|
|[5, 10]||359||[5, 18]||18||[5, 6]||37||[16, 21]||8||[11, 11]||45||[4, 4]||3||[14, 14]||2||[2, 2]||1|
|[4, 5]||322||[5, 9]||17||[5, 9]||37||[11, 15]||7||[12, 12]||41||[18, 18]||3||[19, 19]||2||[18, 18]||1|
|[9, 9]||232||[12, 15]||13||[5, 11]||31||[12, 15]||7||[14, 14]||41||[11, 11]||2||[3, 3]||1||[6, 6]||1|
5 Towards hospital transfer network
Aim of analysis of the overlapping records is to provide model, allowing simulation of patient movement between healthcare facilities and the disease transmission in German healthcare system. We distinguish between two kinds of patient transfers: direct transfer between healthcare facilities (without stay at home) and indirect transfer (otherwise). The question is, whether it is possible to reliably deduce both types of transfers with the data we already have, or is more information necessary.
The indirect transfers are generally easy to determine in this setting, as we can simply trace stay-at-home periods between hospitalisations. The only problematic cases here are when a patient is dismissed from or admitted to many facilities at the same day, but these situations do not occur frequently. Among the problematic overlaps for Lower Saxony healthcare facilities presented in Table 1, the problematic cases correspond to first/last day transfers and admissions to many institutions (overlaps type: 4 and 6-10). These cases constitute about 10% of total number of overlaps, and taking into account the fact that there are more than 4 million records in total, they are too scarce to considerably impact the results.
In case of direct transfer, the situation is as follows. Admissions to single institutions (more than 46% of overlaps) do not contribute to direct transfers. Standard transfers and temporary transfers are the obvious cases. Here also first/last day transfers can be resolved easily, as we can simply assume that there is a single transfer from/to the one-day hospitalisation facility at beginning/end of the hospitalisation. Therefore we are left with less than 2% of problematic cases, where some of them may still be resolved (cf. Figure 8).
Thus, we conclude that the data provided by AOK Lower Saxon can be used to deduce the direct/indirect transfer rate between hospitals in Lower Saxony. The next step will be to build the transfer network with the resulting transfers.
This work was supported by grant no. 2016/22/Z/ST1/00690 of National Science Centre, Poland within the transnational research programme JPI-EC-AMR (Joint Programming Initiative on Antimicrobial Resistance) entitled "Effectiveness of infection control strategies against intra- and inter-hospital transmission of MultidruG-resistant Enterobacteriaceae – insights from a multi-level mathematical NeTwork model" (EMerGe-Net).
We thank the AOK Lower Saxony for providing anonymized record data.
-  A. J. Alanis, Resistance to antibiotics: Are we in the post-antibiotic era?, Archives of Medical Research, 36 (2005), pp. 697–705.
-  C. Camus, E. Bellissant, A. Legras, A. Renault, A. Gacouin, S. LavouĂ©, B. Branger, P. Y. Donnio, P. le Corre, Y. Le Tulzo, D. Perrotin, and R. Thomas, Randomized comparison of 2 protocols to prevent acquisition of methicillin-resistant Staphylococcus aureus: Results of a 2-Center study involving 500 patients, Infection Control and Hospital Epidemiology, 32 (2011), pp. 1064–1072.
-  I. F. Chaberny, F. Schwab, S. Ziesing, S. Suerbaum, and P. Gastmeier, Impact of routine surgical ward and intensive care unit admission surveillance cultures on hospital-wide nosocomial methicillin-resistant Staphylococcus aureus infections in a university hospital: An interrupted time-series analysis, Journal of Antimicrobial Chemotherapy, 62 (2008), pp. 1422–1429.
-  M. Ciccolini, T. Donker, R. KĂ¶ck, M. Mielke, R. Hendrix, A. Jurke, J. Rahamat-Langendoen, K. Becker, H. G. Niesters, H. Grundmann, and A. W. Friedrich, Infection prevention in a connected world: The case for a regional approach, International Journal of Medical Microbiology, 303 (2013), pp. 380–387.
-  T. Donker, T. Smieszek, K. L. Henderson, A. P. Johnson, A. S. Walker, and J. V. Robotham, Measuring distance through dense weighted networks: The case of hospital-associated pathogens, PLOS Computational Biology, 13 (2017), p. e1005622.
-  T. Donker, J. Wallinga, R. Slack, and H. Grundmann, Hospital networks and the dispersal of hospital-acquired pathogens by patient transfer, PLoS ONE, 7 (2012), p. e35002.
-  R. J. Fair and Y. Tor, Antibiotics and bacterial resistance in the 21st century, Perspectives in Medicinal Chemistry, (2014), pp. 25–64.
-  B. Y. Lee, S. M. Bartsch, K. F. Wong, S. L. Yilmaz, T. R. Avery, A. Singh, Y. Song, D. S. Kim, S. T. Brown, M. A. Potter, R. Platt, and S. S. Huang, Simulation shows hospitals that cooperate on infection control obtain better results than hospitals acting alone, Health Affairs, 31 (2012), pp. 2295–2303.
-  B. Y. Lee, S. M. McGlone, Y. Song, T. R. Avery, S. Eubank, C. C. Chang, R. R. Bailey, D. K. Wagener, D. S. Burke, R. Platt, and S. S. Huang, Social network analysis of patient sharing among hospitals in Orange County, California., American Journal of Public Health, 101 (2011), pp. 707–713.
-  A. Y. Peleg and D. C. Hooper, Hospital-acquired infections due to gram-negative bacteria, New England Journal of Medicine, 362 (2010), pp. 1804–1813.
-  E. Power, Impact of antibiotic restrictions: The pharmaceutical perspective, Clinical Microbiology and Infection, 12 (2006), pp. 25–34.
-  S. J. Projan, Why is big pharma getting out of antibacterial drug discovery?, Current Opinion in Microbiology, 6 (2003), pp. 427–430.
-  L. A. Real, The community-wide dilemma of hospital-acquired drug resistance, Proceedings of the National Academy of Sciences, 102 (2005), pp. 2683–2684.
-  D. L. Smith, S. A. Levin, and R. Laxminarayan, Strategic interactions in multi-institutional epidemics of antibiotic resistance, Proceedings of the National Academy of Sciences, 102 (2005), pp. 3153–3158.