SG HEALTHCARE AI
DATATHON & EXPO
Data for the Datathon
De-identified Real-world Healthcare Datasets
Datasets
Reminder: Teams need to apply and obtain access to the datasets they intend to use before the datathon.
1. Electrical Medical Records Datasets
During the datathon, teams will have access to 3 de-identified EMR datasets. Teams may choose to use one or all of these datasets to answer their clinical questions. In particular, these three datasets are: 1) the Medical Information Mart for Intensive Care (MIMIC)-IV Database from Physionet 2) the Philips eICU Collaborative Research Database (https://eicu-crd.mit.edu/) 3) the VitalDB dataset (https://vitaldb.net/dataset/) . These three databases share similar data schemas. They contain hourly physiologic readings from bedside monitors, validated by ICU nurses. They also contain records of demographics, labs, nursing progress notes, discharge summaries, IV medications, fluid balance, and other clinical variables.
MIMIC-IV Dataset
Introduction & Access Application: https://mimic-iv.mit.edu/
Github repository: https://github.com/MIT-LCP/mimic-iv
Documentation: https://mimic-iv.mit.edu/docs/
When using this resource, please cite:
Johnson, A., Bulgarelli, L., Pollard, T., Horng, S., Celi, L. A., & Mark, R. (2020). MIMIC-IV (version 0.4). PhysioNet. https://doi.org/10.13026/a3wn-hq05.
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.eICU-CRD Dataset
Introduction & Documentation: https://eicu-crd.mit.edu/about/eicu/
Github repository: https://github.com/mit-eicu/eicu-code
Example code: https://github.com/mit-eicu/eicu-code/blob/master/concepts/icustay_detail.sql
When using this resource, please cite:
Pollard, T., Johnson, A., Raffa, J., Celi, L. A., Badawi, O., & Mark, R. (2019). eICU Collaborative Research Database (version 2.0). PhysioNet. https://doi.org/10.13026/C2WM1R.Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
VitalDB Dataset
Introduction & Documentation: https://vitaldb.net/dataset/
Data Summary: https://vitaldb.net/dataset/?query=overview&documentId=13qqajnNZzkN7NZ9aXnaQ-47NWy7kx-a6gbrcEsi-gak§ionId=h.1fo5zknztqnw
API Documentation: https://vitaldb.net/docs/
Example code: https://github.com/vitaldb/examples/
When using this resource, please cite:
Lee, Hyung-Chul, and Chul-Woo Jung. "Vital Recorder—a free research tool for automatic recording of high-resolution time-synchronised physiological data from multiple anaesthesia devices." Scientific reports 8.1 (2018): 1-8.
National Sleep Research Resource Datasets
Introduction & Documentation: https://sleepdata.org/about
Data Summary: https://sleepdata.org/datasets
API Documentation: https://sleepdata.org/tools
Forum : https://sleepdata.org/forum
!!! Note: These datasets required individual registration and approval from NSRR. Thus, we will not be able to host these datasets on our cloud for every team. Teams that would like to use these datasets please remember to apply for approval and would need to host the datasets locally themselves. Thanks for your understanding in advance.
2. Medical Imaging Datasets
Wi will provide a large collection of medical image datasets will be provided to all teams.
MIMIC CXR Dataset
Introduction & Documentation: https://physionet.org/content/mimic-cxr/2.0.0/
When using this resource, please cite:
3D Medical Image Dataset from Medical Segmentation Decatholon
NIH Chest X-ray dataset
Introduction & Documentation: https://www.kaggle.com/nih-chest-xrays/data
When using this resource, please cite:
AutoImplant2020
Introduction & Documentation: https://autoimplant.grand-challenge.org/
When using this resource, please cite:
VerSe2019
VerSe2020
MICCAI 2020 RibFrac Challenge: Rib Fracture Detection and Classification
Introduction & Documentation: https://ribfrac.grand-challenge.org/
When using this resource, please cite:
Deep-Learning-Assisted Detection and Segmentation of Rib Fractures from CT Scans: Development and Validation of FracNet(In press)
EMIDEC automatic Evaluation of Myocardial Infarction from Delayed-Enhancement Cardiac MRI
Multimodal Brain Tumor Segmentation Challenge 2020: Data
Introduction & Documentation: https://www.med.upenn.edu/cbica/brats2020/data.html
When using this resource, please cite:
https://pubmed.ncbi.nlm.nih.gov/25494501/
https://pubmed.ncbi.nlm.nih.gov/28872634/
https://arxiv.org/abs/1811.02629
HECKTOR challenge
Introduction & Documentation: https://www.aicrowd.com/challenges/miccai-2020-hecktor
When using this resource, please cite: https://github.com/voreille/hecktor
Chest X-Ray Images (Pneumonia)
Introduction & Documentation: https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia
When using this resource, please cite: http://www.cell.com/cell/fulltext/S0092-8674(18)30154-5
Diabetic Retinopathy Detection
Introduction & Documentation: https://www.kaggle.com/c/diabetic-retinopathy-detection/data
When using this resource, please cite: http://www.eyepacs.com/
Messidor
Introduction & Documentation: http://www.adcis.net/en/third-party/messidor/
When using this resource, please cite:
http://www.ias-iss.org/ojs/IAS/article/view/1155
http://dx.doi.org/10.5566/ias.1155.SARAS
Introduction & Documentation: https://saras-mesad.grand-challenge.org/
When using this resource, please cite:
https://arxiv.org/abs/2104.03178
https://arxiv.org/abs/2006.07164Standford CheXpet dataset
Introduction & Documentation: https://stanfordmlgroup.github.io/competitions/chexpert/
When using this resource, please cite:
https://arxiv.org/abs/1901.07031?utm_medium=email&utm_source=transaction
Chula RBC-12-Dataset
Introduction & Documentation: https://github.com/Chula-PIC-Lab/Chula-RBC-12-Dataset
When using this resource, please cite:
https://arxiv.org/abs/2012.01321
SINGAPORE HEALTHCARE AI DATATHON AND EXPO 2021