Our Research

word cloud

Our research is focused on developing artificial intelligence (AI) methods to analyze heterogeneous biomedical big data for translational applications. This ongoing work brings together two branches of AI: knowledge representation reasoning and machine learning algorithms to characterize brain network dynamics and electronic health records (EHR) data.

Knowledge representation and reasoning involves development of knowledge models or ontologies. We have led the development of new methods to use ontology engineering principles across multiple stages of machine learning workflows, including feature engineering and model validation. This involves the development of deep neural network (DNN) models and the use of classical machine learning algorithms such as support vector machines (SVM) for integrative analysis of multi-modal brain connectivity data in neurological disorders such as epilepsy and Parkinson's Disease. To address the challenges of data quality and scientific reproducibility, we have led the development of a provenance metadata framework called ProvCaRe using ontology engineering and natural language processing techniques.

Research Interests

Epilepsy seizure networks; Structural connectivity networks derived from MRI; Functional connectivity networks derived from EEG; Provenance metadata; Ontology engineering; Data integration; High performance computing


22nd International Conference on Artificial Intelligence in Medicine (AIME 2024), Salt Lake City Utah, USA, July 9-12

This premier conference will bring together global experts to explore the latest trends, research, and practical applications of AI in medicine. At AIME 2024, our lab presents research in medical diagnostics, led by Dipak Upadhyaya, demonstrating the efficacy of large language models in healthcare applications.


CWRU Students

We welcome interest from current CWRU undergraduate and graduate students who are interested in working at the intersection of biomedical research and computer science (primarily artificial intelligence research). Please contact Dr. Sahoo at sss124@case.edu.

Projects and Resources

Brain Connectivity in Neurological Disorders

We study underlying mechanisms that influence the generation and progression of abnormal electrophysiological signals in epilepsy, which is a serious neurological disorder affecting more than 50 million individuals worldwide with debilitating seizures. Our research uses high resolution signal data recorded using intracranial EEG with multiple contacts. However, this approach involves querying and analyzing large volume of multi-modal data. To address this challenge, we incorporate techniques of Big Data analytics, including the development of new data models that are compatible with techniques of large-scale data analysis, such as parallel and distributed computing. We have developed flexible analysis workflows with multiple measures of statistical correlation that can quantitatively assess the strength of the connections among the brain regions active during a seizure event. More information is available on the project page. NIC Workflow Website


As an extension of these data-processing workflows, we have also developed MaTiLDA as an integrated web platform for analyzing abnormal electrophysiological signals using topological data analysis (TDA) and machine learning algorithms. MaTiLDA features a graphical user interface that enables users to apply topological data analysis and machine learning algorithms to analyze datasets from neurophysiological recordings without requiring substantial domain knowledge in mathematics or computing. More information on MaTiLDA can be found here. MaTiLDA

Provenance Metadata for Scientific Reproducibility

Scientific reproducibility is key to scientific progress as it allows the research community to build on validated results, protect patients from potentially harmful trial drugs derived from incorrect results, and reduce wastage of valuable resources. To address this challenge in the biomedical research domain, we are developing the Provenance for Clinical and Healthcare Research (ProvCaRe) framework using World Wide Web Consortium (W3C) PROV specifications, including the PROV Ontology (PROV-O). In the ProvCaRe project, we are extending PROV-O to create a formal model of provenance information that is necessary for scientific reproducibility in biomedical research. ProvCaRe framework aims to model, extract, and analyze provenance information. The ProvCaRe framework consists of the S3 Model that extends the PROV specifications to model provenance metadata describing Study Method, Study Tools, Study Data in a research study. We have developed a provenance-specific text processing pipeline that uses the ProvCaRe ontology to identify and extract provenance metadata from published literature describing biomedical research studies. The ProveCaRe knowledge repository contains provenance "triples" extracted from published research studies that can be queried and explored by users using "hypothesis-based search". Coming Soon

Ontology-based clinical decision-support system

Our primary objective is to create a clinical decision support system (CDSS), which mirrors the clinical workflow of movement disorders in diagnosing Parkinson’s disease (PD) using the International Parkinson and Movement Disorders Society criteria (MDS-PD). The MDS-PD criteria allow highly sensitive and specific diagnosis of PD but are inherently complex to apply using a manual approach with pen-and-paper and are not supported currently in electronic health record systems. This highlights the need for a CDSS to enable implementation of these criteria, which can support clinicians and researchers alike as part of clinical care and research. Our modular approach to creating ORMIS-PD consists of three steps; first building the data entry module to support capturing of relevant patient data needed for the MDS-PD criteria, then building the knowledge base module for modelling of the algorithm of the MDS-PD criteria, and finally building the data analytics module application of the algorithm on the captured data to classify the patient into one of the three levels of diagnostic classification of the MDS-PD criteria. ORMIS-PD Website



Upadhyaya, D.P., Shaikh, A., Cakir, G.B., Prantzalos, K., Golnari, P., Ghasia, F.F. and Sahoo, S.S., 2024. A 360 Degree View for Large Language Models: Early Detection of Amblyopia in Children using Multi-View Eye Movement Recordings. 22nd International Conference on Artificial Intelligence in Medicine (AIME24), Salt Lake City, UT July 9-12, 2024 (Accepted).

Sivagnanam, S., Yeu, S., Lin, K., Sakai, S., Garzon, F., Yoshimoto, K., ... & Lytton, W. W. (2024). Towards building a trustworthy pipeline integrating Neuroscience Gateway and Open Science Chain . Database, 2024, baae023.

Sahoo, S. S., Plasek, J. M., Xu, H., Uzuner, Ö., Cohen, T., Yetisgen, M., Liu, H., Meystre, S., & Wang, Y. (2024). Large language models for biomedicine: foundations, opportunities, challenges, and best practices. Journal of the American Medical Informatics Association : JAMIA, ocae074.


Prantzalos, K., Upadhyaya, D.P., Shafiabadi, N., Gurski, N., Fernandez-BacaVaca, G., Yoshimoto, K., Sivagnanam, S., Majumdar, A. and Sahoo, S.S., 2023. MaTiLDA: An Integrated Machine Learning and Topological Data Analysis Platform for Brain Network Dynamics . Proceedings of the 29th Pacific Symposium on Biocomputing (PSB), 29:65-80(2024).

Wang, L., Ambite, J.L., Appaji, A., Bijsterbosch, J., Dockes, J., Herrick, R., Kogan, A., Lander, H., Marcus, D., Moore, S.M. and Poline, J.B., 2023. NeuroBridge: a prototype platform for discovery of the long-tail neuroimaging data. Frontiers in Neuroinformatics, 17.

Upadhyaya, D.P., Tarabichi, Y., Prantzalos, K., Ayub, S., Kaelber, D.C. and Sahoo, S.S., 2023. Machine Learning Interpretability Methods to Characterize the Importance of Hematologic Biomarkers in Prognosticating Patients with Suspected Infection . medRxiv, pp.2023-05.

Sahoo, S.S., Turner, M.D., Wang, L., Ambite, J.L., Appaji, A., Rajasekar, A., Lander, H.M., Wang, Y. and Turner, J.A., 2023. NeuroBridge ontology: computable provenance metadata to give the long tail of neuroimaging data a FAIR chance for secondary use. Frontiers in Neuroinformatics, 17.

Upadhyaya, D.P., Prantzalos, K., Thyagaraj, S., Shafiabadi, N., Fernandez-BacaVaca, G., Sivagnanam, S., Majumdar, A. and Sahoo, S.S., 2023. Machine Learning Interpretability Methods to Characterize Brain Network Dynamics in Epilepsy . medRxiv, pp.2023-06.


Turner, J.A., Turner, M.D., Appaji, A., Rajasekar, A.K., Wang, L. & Sahoo, S.S. (2022). NeuroBridge ontology development for shared neuroimaging datasets. International Neuroinformatics Coordinating Facility (INCF) Assembly, 2022.

Lander, H., Rajasekar, A., Wang, Y., Watson, M., Sahoo, S., Turner, J., Poline, J-B. & Wang, L. (2022). Linking NeuroBridge and NeuroQuery with deep semantic matching. Neuroinformatics Assembly.

Sahoo, S.S., Kobow, K., Zhang, J., Buchhalter, J., Dayyani, M., Upadhyaya, D.P., Prantzalos, K., Bhattacharjee, M., Blumcke, I., Wiebe, S. and Lhatoo, S.D., 2022. Ontology-based feature engineering in machine learning workflows for heterogeneous epilepsy patient records . Scientific Reports, 12(1), p.19430.

Spilsbury, J.C., Hernandez, E., Kiley, K., Gillerlane Hinkes, E., Prasanna, S., Shafiabadi, N., Rao, P. and Sahoo, S.S., 2022. Social Service Workers’ Use of Social Media to Obtain Client Information: Current Practices and Perspectives on a Potential Informatics Platform. Journal of Social Service Research, 48(6), pp.739-752.

Wang, X., Wang, Y., Ambite, J.L., Appaji, A., Lander, H., Moore, S.M., Rajasekar, A.K., Turner, J.A., Turner, M.D., Wang, L. and Sahoo, S.S., 2022. Enabling scientific reproducibility through FAIR data management: An ontology-driven deep learning approach in the NeuroBridge Project. In AMIA Annual Symposium Proceedings (Vol. 2022, p. 1135). American Medical Informatics Association.


Prantzalos, K., Zhang, J., Shafiabadi, N., Fernandez-BacaVaca, G. and Sahoo, S.S., 2021. Epilepsy-Connect: An Integrated Knowledgebase for Characterizing Alterations in Consciousness State of Pharmacoresistant Epilepsy Patients. In AMIA Annual Symposium Proceedings (Vol. 2021, p. 1019). American Medical Informatics Association.

Gupta, D.K., Marano, M., Aurora, R., Boyd, J. and Sahoo, S.S., 2020. Movement disorders ontology for clinically oriented and clinicians-driven data mining of multi-center cohorts in Parkinson’s disease. medRxiv, pp.2020-11. Movement disorders ontology for clinically-oriented and clinicians-driven data mining of multi-center cohorts in Parkinson's disease (Poster)

Zhang, J., Bauman, R., Shafiabadi, N., Gurski, N., Fernandez-BacaVaca, G. and Sahoo, S.S., 2021. Characterizing Brain Network Dynamics using Persistent Homology in Patients with Refractory Epilepsy. In AMIA Annual Symposium Proceedings (Vol. 2021, p. 1244). American Medical Informatics Association.


Sahoo, S.S., Gershon, A., Nassim, S., Kaushik, G., Curtis, T., Lhatoo, S.D. and Fernandez-BacaVaca, G., 2020. NeuroIntegrative Connectivity (NIC) informatics tool for brain functional connectivity network analysis in cohort studies. In AMIA Annual Symposium Proceedings (Vol. 2020, p. 1090). American Medical Informatics Association.

Carr, S.J., Gershon, A., Shafiabadi, N., Lhatoo, S.D., Tatsuoka, C. and Sahoo, S.S., 2021. An integrative approach to study structural and functional network connectivity in epilepsy using imaging and signal data. Frontiers in integrative neuroscience, 14, p.491403

Liu, C., Kim, M., Rueschman, M. and Sahoo, S.S., 2020. ProvCaRe: A Large-Scale Semantic Provenance Resource for Scientific Reproducibility. In Provenance in Data Science: From Data Models to Context-Aware Knowledge Graphs (pp. 59-73). Cham: Springer International Publishing.

Lhatoo, S.D., Bernasconi, N., Blumcke, I., Braun, K., Buchhalter, J., Denaxas, S., Galanopoulou, A., Josephson, C., Kobow, K., Lowenstein, D. and Ryvlin, P., 2020. Big data in epilepsy: clinical and research considerations. Report from the Epilepsy Big Data Task Force of the International League Against Epilepsy. Epilepsia, 61(9), pp.1869-1883.


Hong X, Liu C, Momotaz H, Cassidy K, Sajatovic M, Sahoo SS. Enhancing Multi-Center Patient Cohort Studies in the Managing Epilepsy Well (MEW) Network: Integrated Data Integration and Statistical Analysis . In AMIA Annual Symposium Proceedings Vol. 2019, pp

Sahoo, S.S., Valdez, J., Kim, M., Rueschman, M. and Redline, S., 2019. ProvCaRe: characterizing scientific reproducibility of biomedical research studies using semantic provenance metadata. International journal of medical informatics, 121, pp.10-18.

Gershon, A., Devulapalli, P., Zonjy, B., Ghosh, K., Tatsuoka, C. and Sahoo, S.S., 2019. Computing functional brain connectivity in neurological disorders: efficient processing and retrieval of electrophysiological signal data. AMIA Summits on Translational Science Proceedings, 2019, p.107.

Yang, S., Ghosh, K., Sakaie, K., Sahoo, S.S., Carr, S.J.A. and Tatsuoka, C., 2019. A simplified crossing fiber model in diffusion weighted imaging. Frontiers in Neuroscience, 13, p.492.

Socrates, V., Gershon, A.L. and Sahoo, S.S., 2019, August. Computation of Brain Functional Connectivity Network Measures in Epilepsy: A Web-Based Platform for EEG Signal Data Processing and Analysis. In MedInfo (pp. 1590-1591).

Sahoo, S.S., Valdez, J., Rueschman, M. and Kim, M., 2019. S emantic Provenance Graph for Reproducibility of Biomedical Research Studies: Generating and Analyzing Graph Structures from Published Literature. Studies in health technology and informatics, 264, p.328.


Valdez, J., Kim, M., Rueschman, M., Redline, S. and Sahoo, S.S., 2018. Classification of provenance triples for scientific reproducibility: A comparative evaluation of deep learning models in the ProvCaRe project. In Provenance and Annotation of Data and Processes: 7th International Provenance and Annotation Workshop, IPAW 2018, London, UK, July 9-10, 2018, Proceedings (pp. 30-41). Springer International Publishing.

Gershon, A., Lhatoo, S.D., Tatsuoka, C., Ghosh, K., Loparo, K. and Sahoo, S.S., 2018. Scalable Signal Data Processing for Measuring Functional Connectivity in Epilepsy Neurological Disorder. Signal Processing and Machine Learning for Biomedical Big Data (Book).


Sajatovic, M., Tatsuoka, C., Welter, E., Friedman, D., Spruill, T.M., Stoll, S., Sahoo, S.S., Bukach, A., Bamps, Y.A., Valdez, J. and Jobst, B.C., 2017. Correlates of quality of life among individuals with epilepsy enrolled in self-management research: from the US Centers for Disease Control and Prevention Managing Epilepsy Well Network. Epilepsy & Behavior, 69, pp.177-180.

Valdez, J., Kim, M., Rueschman, M., Socrates, V., Redline, S. and Sahoo, S.S., 2017. ProvCaRe semantic provenance knowledgebase: evaluating scientific reproducibility of research studies. In AMIA Annual Symposium Proceedings (Vol. 2017, p. 1705). American Medical Informatics Association.

Valdez, J., Rueschman, M., Kim, M., Arabyarmohammadi, S., Redline, S. and Sahoo, S.S., 2017. An extensible ontology modeling approach using post coordinated expressions for semantic provenance in biomedical research. In On the Move to Meaningful Internet Systems. OTM 2017 Conferences: Confederated International Conferences: CoopIS, C&TC, and ODBASE 2017, Rhodes, Greece, October 23-27, 2017, Proceedings, Part II (pp. 337-352). Springer International Publishing.

Gershon, A.L., Zonjy, B., Tatsuoka, C., Ghosh, K. and Sahoo, S.S., 2017. A Flexible Computational Neuroinformatics Workflow for Computing Functional Networks in Epilepsy Neurological Disorder. In AMIA.


Sahoo, S.S., Valdez, J. and Rueschman, M., 2016. Scientific reproducibility in biomedical research: provenance metadata ontology for semantic annotation of study description. In AMIA Annual Symposium Proceedings (Vol. 2016, p. 1070). American Medical Informatics Association.

Dean, D.A., Goldberger, A.L., Mueller, R., Kim, M., Rueschman, M., Mobley, D., Sahoo, S.S., Jayapandian, C.P., Cui, L., Morrical, M.G. and Surovec, S., 2016. Scaling up scientific discovery in sleep medicine: the national sleep research resource. Sleep, 39(5), pp.1151-1164.

Sahoo, S.S., Wei, A., Tatsuoka, C., Ghosh, K. and Lhatoo, S.D., 2016. Processing neurology clinical data for knowledge discovery: scalable data flows using distributed computing. Machine Learning for Health Informatics: State-of-the-Art and Future Challenges, pp.303-318.

Sahoo, S.S., Wei, A., Valdez, J., Wang, L., Zonjy, B., Tatsuoka, C., Loparo, K.A. and Lhatoo, S.D., 2016. NeuroPigPen: a scalable toolkit for processing electrophysiological signal data in neuroscience applications using apache pig. Frontiers in neuroinformatics, 10, p.18.

Sahoo, S.S., Ramesh, P., Welter, E., Bukach, A., Valdez, J., Tatsuoka, C., Bamps, Y., Stoll, S., Jobst, B.C. and Sajatovic, M., 2016. Insight: An ontology-based integrated database and analysis platform for epilepsy self-management research. International journal of medical informatics, 94, pp.21-30.

Yang, S., Tatsuoka, C., Ghosh, K., Lacuey-Lecumberri, N., Lhatoo, S.D. and Sahoo, S.S., 2016. Comparative evaluation for brain structural connectivity approaches: towards integrative neuroinformatics tool for epilepsy clinical research. AMIA Summits on Translational Science Proceedings, 2016, p.446.

Valdez, J., Rueschman, M., Kim, M., Redline, S. and Sahoo, S.S., 2016. An ontology-enabled natural language processing pipeline for provenance metadata extraction from biomedical text (short paper). In On the Move to Meaningful Internet Systems: OTM 2016 Conferences: Confederated International Conferences: CoopIS, C&TC, and ODBASE 2016, Rhodes, Greece, October 24-28, 2016, Proceedings (pp. 699-708). Springer International Publishing. An Ontology-Enabled Natural Language Processing Pipeline for Provenance Metadata Extraction from Biomedical Text

Sahoo, S.S., Zhang, G.Q., Bamps, Y., Fraser, R., Stoll, S., Lhatoo, S.D., Tatsuoka, C., Sams, J., Welter, E. and Sajatovic, M., 2016. Managing information well: Toward an ontology-driven informatics platform for data sharing and secondary use in epilepsy self-management research centers. Health informatics journal, 22(3), pp.548-561.


LaFrance Jr, W.C., Ranieri, R., Bamps, Y., Stoll, S., Sahoo, S.S., Welter, E., Sams, J., Tatsuoka, C. and Sajatovic, M., 2015. Comparison of common data elements from the Managing Epilepsy Well (MEW) Network integrated database and a well-characterized sample with nonepileptic seizures. Epilepsy & Behavior, 45, pp.136-141.

Ramesh, P., Wei, A., Welter, E., Bamps, Y., Stoll, S., Bukach, A., Sajatovic, M. and Sahoo, S.S., 2015, November. Insight: Semantic provenance and analysis platform for multi-center neurology healthcare research. In 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (pp. 731-736). IEEE.

Sahoo SS, Rueschman M, Valdez J, Hsu W, Lhatoo SD, Redline S. Provenance Analysis over Biomedical Big Data Using PROV: Towards Effective Secondary Data Analysis Across Multiple Studies. NIH Big Data to Knowledge (BD2K) Meeting, Bethesda MD. Nov 12-13, 201 Provenance Analysis over Biomedical Big Data Using PROV: Towards Effective Secondary Data Analysis Across Multiple Studies (Poster)

Sahoo SS, Rao P. Provenance Analysis and RDF Query Processing: W3C PROV for Data Quality and Trust . In the 14th International Semantic Web Conference (ISWC 2015), Bethlehem, PA, 2015. (Tutorial, to appear)

Jayapandian, C., Wei, A., Ramesh, P., Zonjy, B., Lhatoo, S.D., Loparo, K., Zhang, G.Q. and Sahoo, S.S., 2015. A scalable neuroinformatics data flow for electrophysiological signals using MapReduce. Frontiers in neuroinformatics, 9, p.4.


Cui, L., Sahoo, S.S., Lhatoo, S.D., Garg, G., Rai, P., Bozorgi, A. and Zhang, G.Q., 2014. Complex epilepsy phenotype extraction from narrative clinical discharge summaries. Journal of biomedical informatics, 51, pp.272-279.

Zhang, G.Q., Cui, L., Lhatoo, S., Schuele, S.U. and Sahoo, S.S., 2014. MEDCIS: multi-modality epilepsy data capture and integration system. In AMIA Annual Symposium Proceedings (Vol. 2014, p. 1248). American Medical Informatics Association.

Jayapandian, C., Chen, C.H., Dabir, A., Lhatoo, S., Zhang, G.Q. and Sahoo, S.S., 2014. Domain ontology as conceptual model for big data management: application in biomedical informatics. In Conceptual Modeling: 33rd International Conference, ER 2014 , Atlanta, GA, USA, October 27-29, 2014. Proceedings 33 (pp. 144-157). Springer International Publishing.

Sahoo SS, McIntyre C, Lhatoo SD. A Match Made in Cloud? Meeting the Requirements of the Next Generation Neuroscience Research Using Configurable Cloud Infrastructure . National Science Foundation (NSF) Cloud Workshop, Dec 11-12, 2014

Sahoo, S.S., Tao, S., Parchman, A., Luo, Z., Cui, L., Mergler, P., Lanese, R., Barnholtz-Sloan, J.S., Meropol, N.J. and Zhang, G.Q., 2014. Trial prospector: matching patients with cancer research studies using an automated and scalable approach. Cancer informatics, 13, pp.CIN-S19454.


Sahoo, S.S., Zhang, G.Q. and Lhatoo, S.D., 2013. Epilepsy informatics and an ontology‐driven infrastructure for large database research and patient care in epilepsy. Epilepsia, 54(8), pp.1335-1341.

Parchman AJ, Zhang GQ, Mergler P, Barnholtz-Sloan J, Lanese R, Miller DW, Opper C,Sahoo SS, Tao S, Teagno J, Warfe J, Meropol NJ. Trial prospector: An automated clinical trials eligibility matching program . Proceedings of the American Society of Clinical

Sahoo, S.S., Lhatoo, S.D., Gupta, D.K., Cui, L., Zhao, M., Jayapandian, C., Bozorgi, A. and Zhang, G.Q., 2014. Epilepsy and seizure ontology: towards an epilepsy informatics infrastructure for clinical research and patient care. Journal of the American Medical Informatics Association, 21(1), pp.82-89.

Sahoo, S.S., Jayapandian, C., Garg, G., Kaffashi, F., Chung, S., Bozorgi, A., Chen, C.H., Loparo, K., Lhatoo, S.D. and Zhang, G.Q., 2014. Heart beats in the cloud: distributed analysis of electrophysiological ‘Big Data’using cloud computing for epilepsy clinical research. Journal of the American Medical Informatics Association, 21(2), pp.263-271.

Jayapandian, C.P., Chen, C.H., Bozorgi, A., Lhatoo, S.D., Zhang, G.Q. and Sahoo, S.S., 2013. Cloudwave: distributed processing of “Big Data” from electrophysiological recordings for epilepsy clinical research using Hadoop. In AMIA Annual Symposium Proceedings (Vol. 2013, p. 691). American Medical Informatics Association.

Bozorgi, A., Chung, S., Kaffashi, F., Loparo, K.A., Sahoo, S., Zhang, G.Q., Kaiboriboon, K. and Lhatoo, S.D., 2013. Significant postictal hypotension: Expanding the spectrum of seizure‐induced autonomic dysregulation. Epilepsia, 54(9), pp.e127-e130.

Cui, L., Mueller, R., Sahoo, S. and Zhang, G.Q., 2013, September. Querying complex federated clinical data using ontological mapping and subsumption reasoning. In 2013 IEEE International Conference on Healthcare Informatics (pp. 351-360). IEEE.

Jayapandian, C.P., Chen, C.H., Bozorgi, A., Lhatoo, S.D., Zhang, G.Q. and Sahoo, S.S., 2013. Electrophysiological signal analysis and visualization using cloudwave for epilepsy clinical research. Studies in health technology and informatics, 192, p.817.

Lebo, T., Sahoo, S., McGuinness, D., Belhajjame, K., Cheney, J., Corsar, D., Garijo, D., Soiland-Reyes, S., Zednik, S. and Zhao, J., 2013. Prov-o: The prov ontology. W3C recommendation, 30.

Asiaee, A.H., Doshi, P., Minning, T., Sahoo, S., Parikh, P., Sheth, A. and Tarleton, R.L., 2013. From questions to effective answers: On the utility of knowledge-driven querying systems for life sciences data. In Data Integration in the Life Sciences: 9th International Conference, DILS 2013, Montreal, QC, Canada, July 11-12, 2013. Proceedings 9 (pp. 38-45). Springer Berlin Heidelberg.


Jayapandian, C.P., Zhao, M., Ewing, R.M., Zhang, G.Q. and Sahoo, S.S., 2012. A semantic proteomics dashboard (SemPoD) for data management in translational research. BMC systems biology, 6, pp.1-13.

Zhang, G.Q., Sahoo, S.S. and Lhatoo, S.D., 2012. From classification to epilepsy ontology and informatics. Epilepsia, 53, pp.28-32.

Teagno, J., Kiefer, R.C., Pathak, J., Zhang, G.Q. and Sahoo, S.S., 2012. A Distributed Semantic Web Approach for Cohort Identification. In AMIA.

Zhang, G.Q., Luo, L., Ogbuji, C., Joslyn, C., Mejino, J. and Sahoo, S.S., 2012. An analysis of multi-type relational interactions in FMA using graph motifs with disjointness constraints. In AMIA Annual Symposium Proceedings (Vol. 2012, p. 1060). American Medical Informatics Association.

Jayapandian, C.P., Zhao, M., Ewing, R.M., Zhang, G.Q. and Sahoo, S.S., 2012. A semantic proteomics dashboard (SemPoD) for data management in translational research. BMC systems biology, 6, pp.1-13.

Parikh, P.P., Zheng, J., Logan-Klumpler, F., Stoeckert, C.J., Louis, C., Topalis, P., Protasio, A.V., Sheth, A.P., Carrington, M., Berriman, M. and Sahoo, S.S., 2012. The Ontology for Parasite Lifecycle (OPL): towards a consistent vocabulary of lifecycle stages in parasitic organisms. Journal of biomedical semantics, 3(1), pp.1-13.

Cui, L., Bozorgi, A., Lhatoo, S.D., Zhang, G.Q. and Sahoo, S.S., 2012. EpiDEA: extracting structured epilepsy and seizure information from patient discharge summaries for cohort identification. In AMIA Annual Symposium Proceedings (Vol. 2012, p. 1191). American Medical Informatics Association.

Parikh, P.P., Minning, T.A., Nguyen, V., Lalithsena, S., Asiaee, A.H., Sahoo, S.S., Doshi, P., Tarleton, R. and Sheth, A.P., 2012. A semantic problem solving environment for integrative parasite research: Identification of intervention targets for Trypanosoma cruzi. PLoS neglected tropical diseases, 6(1), p.e1458.

Sahoo, S.S., Zhao, M., Luo, L., Bozorgi, A., Gupta, D., Lhatoo, S.D. and Zhang, G.Q., 2012. OPIC: ontology-driven patient information capturing system for epilepsy. In AMIA Annual Symposium Proceedings (Vol. 2012, p. 799). American Medical Informatics Association.


Sahoo, S.S., Nguyen, V., Bodenreider, O., Parikh, P., Minning, T. and Sheth, A.P., 2011. A unified framework for managing provenance information in translational research . BMC bioinformatics, 12(1), pp.1-18.

Zhao, J., Sahoo, S.S., Missier, P., Sheth, A. and Goble, C., 2010. Extending semantic provenance into the web of data. IEEE Internet Computing, 15(1), pp.40-48.

Sahoo, S.S., 2011. Towards Desiderata for Provenance Ontologies in Biomedicine . In ICBO.

Mueller R, Sahoo SS, Dong X, Redline S, Arabandi S, Luo L, Zhang GQ. Mapping multi-institution data sources to domain ontology for data federation: the PhysioMIMI approach . AMIA Clinical Research Informatics Summit (CRI), 2011.

Zhang GQ, Mueller R, Jonhson N, Arabandi S, Sahoo SS, Redline S. Online Exploration of Case-control Study Designs in VISAGE . AMIA Clinical Research Informatics Summit (CRI), 2011.

Sahoo, S.S., Ogbuji, C., Luo, L., Dong, X., Cui, L., Redline, S.S. and Zhang, G.Q., 2011. Midas: automatic extraction of a common domain of discourse in sleep medicine for multi-center data integration. In AMIA Annual Symposium Proceedings (Vol. 2011, p. 1196). American Medical Informatics Association.


Sahoo SS, Groth P, Hartig O, Miles S, Coppens S, Myers J, Gil Y, Moreau L, Zhao J, Panzer M, Garijo D. Provenance Vocabulary Mappings . W3C Provenance Incubator Group Report, 2010.

Barga R, Simmhan Y, Chinthaka-Withana E, Sahoo SS, Jackson J, Araujo N. Provenance for Scientific Workflows Towards Reproducible Research . IEEE Data Engineering Bulletin, 2010. Vol. 33(3). pp. 50-58.

Sahoo SS, Bodenreider O, Hitzler P, Sheth AP, Thirunarayan K. Provenance Context Entity (PaCE): Scalable provenance tracking for scientific RDF data . The 22nd International Conference on Scientific and Statistical Database Management (SSDBM), 2010. pp. 46

Missier P,Sahoo SS, Zhao J, Goble C, Sheth A. Janus: from workflows to semantic provenance and linked open data . The 3rd International Provenance and Annotation Workshop (IPAW), Lecture Notes in Computer Science, Vol. 6378/2010, 2010. pp. 129-141.

Deus H, Zhao J, Sahoo SS, Samwald M, Prud’hommeaux E, Miller M, Marshall MS, Cheung K. Provenance of Microarray Experiments for a Better Understanding of Experiment Results . The 2nd International Workshop on Role of Semantic Web in Provenance Management (

Patni H, Sahoo SS, Henson C, Sheth A. Provenance Aware Linked Sensor Data , The 2nd International Workshop on Trust and Privacy on the Social and Semantic Web, co-located with ESWC, 2010.


Sahoo SS, Weatherly DB, Mutharaju R, Anantharam P, Sheth AP, Tarleton RL. Ontology-driven Provenance Management in eScience: an Application in Parasite Research . The 8th International Conference on Ontologies, DataBases, and Applications of Semantics, (OD

Sahoo SS, Sheth A. Provenir ontology: Towards a Framework for eScience Provenance Management. Microsoft eScience Workshop, 2009. http://cci.case.edu/cci/images/7/7a/Framework_for_eScience_Provenance_Management_CR.pdf

Sahoo SS, Halb W, Hellmann S, Idehen K, Thibodeau Jr. T, Auer S, Sequeda J, Ezzat A. A Survey of Current Approaches for Mapping of Relational Databases to RDF . W3C RDB2RDF Incubator Group Report, 2009. http://cci.case.edu/cci/images/0/04/RDB2RDF_SurveyRep


Valerio MD, Sahoo SS, Barga RS, Jackson JJ. Capturing Workflow Event Data for Monitoring, Performance Analysis, and Management of Scientific Workflows . SWBES08, co-located with the 4th IEEE International Conference on eScience, 2008. pp. 626-33. http://cc

Sahoo SS, Bodenreider O, Rutter JL, Skinner KJ, Sheth AP. An ontology-driven semantic mash-up of gene and biological pathway information: Application to the domain of nicotine dependence . Journal of Biomedical Informatics (Special Issue: Semantic Mashup o

Sahoo SS, Sheth AP, Henson C. Semantic Provenance for eScience: ‘Meaningful’ Metadata to Manage the Deluge of Scientific Data. IEEE Internet Computing, Web-Scale Workflow Track, M.B. Blake and M. Huhns (Eds.) , 2008. Vol. 12(4). pp.46-54. (Featured in Asso

Sheth A, Henson C,Sahoo SS. Semantic Sensor Web . IEEE Internet Computing, 2008. Vol. 12(4). pp. 78-83. http://cci.case.edu/cci/images/2/2a/SHS08-IC-Column-SSW.pdf


Sahoo SS, Sheth A, Hunter B, York WS. SemBOWSER–Adding Semantics to biological Web services registry. Semantic Web: Revolutionizing Knowledge Discovery in the Life Sciences. Baker CJO, Cheung KO (Eds.) , Springer, 2007. pp. 317–40. http://cci.case.edu/cci/

Sahoo SS, Zeng K, Bodenreider O, Sheth AP. From ‘glycosyltransferase’ to ‘congenital muscular dystrophy’: Integrating knowledge from NCBI Entrez Gene and the Gene Ontology . The 12th World Congress on Health (Medical) Informatics (Medinfo), 2007. pp. 1260–

Sahoo SS, Bodenreider O, Zeng K, Sheth AP. An experiment in integrating large biomedical knowledge resources with RDF: Application to associating genotype and phenotype information . International Workshop on Health Care and Life Sciences Data Integration


Sahoo SS, Thomas C, Sheth AP, York WS, Tartir S. Knowledge Modeling and Its Application in Life Sciences: A Tale of Two Ontologies . The 15th International World Wide Web (WWW) Conference, 2006. pp. 317-26 http://cci.case.edu/cci/images/f/f2/P1088-sahoo.pd

Sahoo SS, Sheth A. Bioinformatics applications of Web Services, Web Processes and role of Semantics. Semantic Web Processes and Their Applications. Cardoso J, Sheth A (Eds.) , Springer, 2006. pp. 305–22.


Sahoo SS, Thomas C, Sheth AP, Henson C, York WS. GLYDE-An expressive XML standard for the representation of glycan structure . Carbohydrate Research, 2005. Vol. 340(18). pp.2802-7. PMID: 16242678 https://www.ncbi.nlm.nih.gov/pubmed/16242678

Atwood III J,Sahoo SS, Alvarez-Manilla G, Weatherly DB, Kolli K, Orlando R, York WS. Rapid Communications Mass Spectrometry, 2005. Vol. 19(21). pp.3002-6. PMID: 16196021 https://www.ncbi.nlm.nih.gov/pubmed/16196021">Simple modification of a protein databa Simple modification of a protein database for mass spectral identification of N-linked glycopeptides

Alvarez-Manilla G, Atwood. III J,Sahoo SS, Guo Y, Warren NL, York WS, Orlando R, Pierce M. Tools for glycoproteomic analysis: size-exclusion chromatography facilitates identification of tryptic glycopeptides with N-linked glycosylation site . Glycobiology

Sahoo SS, Sheth AP, York WS, Miller JA. Semantic Web Services for N-glycosylation Process . International Symposium on Web Services for Computational Biology and Bioinformatics, 2005. http://cci.case.edu/cci/images/c/c6/UGA-VBI-Symposium-Abstract-Submissio

Aleman-Meza A, Halaschek-Wiener C,Sahoo SS, Sheth A, Arpinar B. Template Based Semantic Similarity for Security Applications . The IEEE Intl. Conference on Conference on Intelligence and Security Informatics (ISI-2005) 2005. pp: 621-622. http://cci.case.ed


Sheth, A.P., York, W.S., Thomas, C., Nagarajan, M., Miller, J.A., Kochut, K., Sahoo, S.S. and Yi, X., 2004. Semantic Web technology in support of Bioinformatics for Glycan Expression.


Biomedical & Health Informatics Doctoral Program PQHS 416: Introduction to Computing in Biomedical Health Informatics
The Biomedical & Health Informatics (BHI) doctoral program trains researchers in biomedicine, population health, and clinical care. Program trainees will acquire a core set of skills spanning computing, biostatistics, and biomedical research through a combination of course work and participation in the study in the Population and Quantitative Health Sciences (PQHS) department. The doctoral program is designed for students to acquire skills in the three areas of concentration: Data Analytics with a focus on statistics and data wrangling, Biomedical Health with a focus on systems biology, clinical, and health issues and Computational and System Design with a focus on knowledge representation, information retrieval, and Big Data. “PQHS 416 introduces students to computational techniques and concepts that underpin biomedical and health informatics data management and analysis. In particular, the course will focus on the three topics of: (1) Biomedical terminologies and formal logic used in building knowledge models such as ontologies; (2) Natural language processing (NLP), and (3) Big Data technologies, including components of Hadoop stack and Apache Spark. This is a lecture-based course that relies on both materials covered in class and out-of-class readings of published literature. Students will be assigned reading assignments, homework exercise assignments and they are expected to complete homework assignment for each class. The students will be involved in a team project and they will be expected to prepare a project report at the end of the semester.”

Our Team

Satya Sahoo, PhD

headshot of team member

Assoc. Prof. of Medical Informatics

Katrina Prantzalos, MS

headshot of team member

PhD Candidate

Pedram Golnari, MD

headshot of team member

PhD Student

Nasim Shafiabadi, MD

headshot of team member

Research Fellow

Dipak Upadhyaya, MPH

headshot of team member

PhD Candidate

Pranav Nampoothiripad

headshot of team member

Undergraduate Researcher

Keerthi Sevugan

headshot of team member

Undergraduate Researcher

Leonora Lipson

headshot of team member

Undergraduate Researcher

Our Alumni

1. Jianzhe Zhang

MS (First employer: ByteDance)

2. Arthur Gershon

PhD (Status: Post-Doctoral Scholar)

3. Catherine Jayapandian

PhD (Status: Post-Doctoral Scholar)

4. Priya Ramesh

MS (First employer: CoverMyMeds)

5. Xinting Hong


6. Pramith Devulapalli

BS (Status: PhD at Purdue University)

7. Vimig Socrates

BS, MS (Status: PhD at Yale University)

8. Meng Zhao

MS (First employer: IBM Explorys)

9. Li Wang


10. Chien-Hung Chen


11. Chang Liu

MS (First employer: Microsoft Corporation)

12. Annan Wei

MS (First employer: Google Inc)

Funding Agencies

National Institute of Biomedical Imaging and Bioengineering

logo of funding_agency

National Institute on Drug Abuse

logo of funding_agency

Department of Defence, Congressionally Directed Medical Research Programs

logo of funding_agency

Dravet Syndrome Foundation

logo of funding_agency

U.S. Department of Veterans Affairs

logo of funding_agency

© 2023 Case Western Reserve University
10900 Euclid Ave. Cleveland, Ohio 44106 216.368.2000
Department of Population and Quantative Health Sciences
Phone Number: 216-368-3286
Mailing Address: 2103 Cornell road, Iris S. & Bert l. Wolstein Research Building, Cleveland, OH44106-7291