Our Research

word cloud

Our research is focused on developing artificial intelligence (AI) methods to analyze heterogeneous biomedical big data for translational applications. This ongoing work brings together two branches of AI: knowledge representation reasoning and machine learning algorithms to characterize brain network dynamics and electronic health records (EHR) data.

Knowledge representation and reasoning involves development of knowledge models or ontologies. We have led the development of new methods to use ontology engineering principles across multiple stages of machine learning workflows, including feature engineering and model validation. This involves the development of deep neural network (DNN) models and the use of classical machine learning algorithms such as support vector machines (SVM) for integrative analysis of multi-modal brain connectivity data in neurological disorders such as epilepsy and Parkinson's Disease. To address the challenges of data quality and scientific reproducibility, we have led the development of a provenance metadata framework called ProvCaRe using ontology engineering and natural language processing techniques.

Research Interests

Epilepsy seizure networks; Structural connectivity networks derived from MRI; Functional connectivity networks derived from EEG; Provenance metadata; Ontology engineering; Data integration; High performance computing


2023 Big Data Neuroscience Workshop, Sep 14 and 15, Ohio State University, Columbus, Ohio

NLP Working Group Pre-Symposium, November 11, 2023, New Orleans, LA

The 15th International Epilepsy Colloquium: Ictal Semiology and its Value in Epilepsy Surgery, 2023, Case Western Reserve University, Cleveland, Ohio


No Openings at this time.

We do not currently have any job openings, but as soon as we do, we will post the open position here on our lab website.

Projects and Resources

Brain Connectivity in Neurological Disorders

We study underlying mechanisms that influence the generation and progression of abnormal electrophysiological signals in epilepsy, which is a serious neurological disorder affecting more than 50 million individuals worldwide with debilitating seizures. Our research uses high resolution signal data recorded using intracranial EEG with multiple contacts. However, this approach involves querying and analyzing large volume of multi-modal data. To address this challenge, we incorporate techniques of Big Data analytics, including the development of new data models that are compatible with techniques of large-scale data analysis, such as parallel and distributed computing. We have developed flexible analysis workflows with multiple measures of statistical correlation that can quantitatively assess the strength of the connections among the brain regions active during a seizure event. More information is available on the project page. NIC Workflow Website


As an extension of these data-processing workflows, we have also developed MaTiLDA as an integrated web platform for analyzing abnormal electrophysiological signals using topological data analysis (TDA) and machine learning algorithms. MaTiLDA features a graphical user interface that enables users to apply topological data analysis and machine learning algorithms to analyze datasets from neurophysiological recordings without requiring substantial domain knowledge in mathematics or computing. More information on MaTiLDA can be found here. MaTiLDA

Provenance Metadata for Scientific Reproducibility

Scientific reproducibility is key to scientific progress as it allows the research community to build on validated results, protect patients from potentially harmful trial drugs derived from incorrect results, and reduce wastage of valuable resources. To address this challenge in the biomedical research domain, we are developing the Provenance for Clinical and Healthcare Research (ProvCaRe) framework using World Wide Web Consortium (W3C) PROV specifications, including the PROV Ontology (PROV-O). In the ProvCaRe project, we are extending PROV-O to create a formal model of provenance information that is necessary for scientific reproducibility in biomedical research. ProvCaRe framework aims to model, extract, and analyze provenance information. The ProvCaRe framework consists of the S3 Model that extends the PROV specifications to model provenance metadata describing Study Method, Study Tools, Study Data in a research study. We have developed a provenance-specific text processing pipeline that uses the ProvCaRe ontology to identify and extract provenance metadata from published literature describing biomedical research studies. The ProveCaRe knowledge repository contains provenance "triples" extracted from published research studies that can be queried and explored by users using "hypothesis-based search". ProvCaRe Website



Upadhyaya D.P,. Prantzalos K, Thyagaraj, S., Shafiabadi N, Fernandez-BacaVaca G, Sivagnanam S, Majumdar A, Sahoo SS. Machine Learning Interpretability Methods to Characterize Brain Network Dynamics in Epilepsy. medrxiv 2023.06.25.23291874; doi: https://doi.org/10.1101/2023.06.25.23291874, 2023.

Upadhyaya DP, Tarabichi Y, Prantzalos K, Ayub S, Kaelber DC, Sahoo SS. Characterizing the Importance of Hematologic Biomarkers in Screening for Severe Sepsis using Machine Learning Interpretability Methods. medRxiv 2023.05.30.23290757; doi: https://doi.org/10.1101/2023.05.30.23290757, 2023.

Wang L, Ambite JL, Appaji AM, Bijsterbosch J, Dockès J, Herrick R, Kogan A, Lander HM, Lenzini P, Marcus D, Moore SM, Poline J-B, Rajasekar A, Sahoo SS, Turner MD, Wang X, Wang Y, Turner JA. NeuroBridge: A Prototype Platform for Discovery of The Long-Tail Neuroimaging Data. Frontiers in Neuroinformatics (accepted), 2023.

Prantzalos K, Upadhyaya DP, Shafiabadi N, Gurski N, Fernandez-BacaVaca G, Yoshimoto K, Sivagnanam S, Majumdar A, Sahoo SS. MaTiLDA: An Integrated Machine Learning and Topological Data Analysis Platform for Brain Network Dynamics. Pacific Symposium on Biocomputing (accepted), 2023.

Sahoo SS, Turner MD, Wang L, Ambite JL, Appaji AM, Rajasekar A, Lander HM, Wang Y, Turner JA. NeuroBridge ontology: Computable provenance metadata to give the long tail of neuroimaging data a FAIR chance for secondary use. Frontiers in Neuroinformatics, 2023.


Turner JA, Turner MD, Appaji A, Rajasekar AK, Wang L, Sahoo SS. NeuroBridge ontology development for shared neuroimaging datasets. International Neuroinformatics Coordinating Facility (INCF) Assembly, 2022 -Abstract, 2022.

Lander H, Rajasekar AK, Wang Y, Watson M, Sahoo SS, Turner J, Poline J-B, Wang L. Linking NeuroBridge and NeuroQuery with Deep Semantic Matching. International Neuroinformatics Coordinating Facility (INCF) Assembly, 2022 (poster), 2022.

Sahoo SS, Kobow K, Zhang J, Buchhalter J, Dayyani M, Upadhyaya DP, Prantzalos K, Bhattacharjee M, Blumcke I, Wiebe S, Lhatoo SD. Ontology-based feature engineering in machine learning workflows for heterogeneous epilepsy patient records. Scientific Reports, 2022.

Wang X, Wang Y, Ambite J-L, Appaji A, Lander H, Moore S, Rajasekar AK, Turner JA, Turner MD, Wang L, Sahoo SS. Enabling Scientific Reproducibility through FAIR Data Management: An ontology-driven deep learning approach in the NeuroBridge Project. AMIA Annual Symposium Proceedings, 2022.

Spilsbury JC, Hernandez E, Kiley K, Gillerlane EH, Prasanna S, Shafiabadi N, Rao P, Sahoo SS. Social service workers’ use of social media to obtain client information: Current practices and perspectives on a potential informatics platform. Journal of Social Service Research. 2022 (accepted), 2022.

Gupta DK, Prantzalos K, Hiller AL, Lobb BM, Chan K, Boyd J, Sahoo SS. Ontology-based, Real-time, Machine learning Informatics System for Parkinson Disease (ORMIS-PD). International Congress of Parkinson’s Disease and Movement Disorders 2022 (poster), 2022.


Gupta DK, Marano M, Aurora R, Boyd J, Sahoo SS. Movement disorders ontology for clinically-oriented and clinicians-driven data mining of multi-center cohorts in Parkinson's disease (Poster). AMIA Annual Symposium Proceedings, 2021.

Prantzalos K, Zhang J, Shafiabadi N, Fernandez- BacaVaca G, Sahoo SS. Epilepsy-Connect: An Integrated Knowledgebase for Characterizing Alterations in Consciousness State of Pharmacoresistant Epilepsy Patients. AMIA Annual Symposium Proceedings, 2021.

Zhang J, Bauman R, Shafiabadi N, Gurski N, Fernandez-BacaVaca G, Sahoo SS. Characterizing Brain Network Dynamics using Persistent Homology in Patients with Refractory Epilepsy. AMIA Annual Symposium Proceedings, 2021.


Lhatoo SD, Bernasconi N, Blumcke I, Braun K, Buchhalter J, Denaxas S, Galanopoulou A, Josephson C, Kobow K, Lowenstein D, Ryvlin P, Schulze-Bonhage A, Sahoo SS, Thom M, Thurman D, Worrell G, Zhang GQ, Wiebe S. Big Data in Epilepsy: Clinical and Research Considerations. Report from the Epilepsy Big Data Task Force of the International League Against Epilepsy, Epilepsia, 2020.

Carr SJ, Gershon A, Shafiabadi N, Lhatoo SD, Tatsuoka C, Sahoo SS. An Integrative Approach to Study Structural and Functional Network Connectivity in Epilepsy using Imaging and Signal Data. Frontiers in Integrative Neuroscience, 2020.

Liu C, Kim M, Rueschman M, Sahoo SS. ProvCaRe: A Large-Scale Semantic Provenance Resource for Scientific Reproducibility, in Knowledge Graphs and RDF Data Provenance: AI Actions with Machine-Interpretable Data. Springer book series on Advanced Information & Knowledge Processing, 2020.

Sahoo SS, Gershon A, Shafiabadi N, Ghosh K, Tatsuoka C, Lhatoo SD, Fernandez-BacaVaca G. NeuroIntegrative Connectivity (NIC) Informatics Tool for Brain Functional Connectivity Network Analysis in Cohort Studies. AMIA Annual Symposium Proceedings 2020, 2020.


Hong X, Liu C, Momotaz H, Cassidy K, Sajatovic M, Sahoo SS. Enhancing Multi-Center Patient Cohort Studies in the Managing Epilepsy Well (MEW) Network: Integrated Data Integration and Statistical Analysis. AMIA Annual Symposium Proceedings, 2019.

Sahoo SS, Valdez J, Rueschman M, Kim M. Semantic Provenance Graph for Reproducibility of Biomedical Research Studies: Generating and Analyzing Graph Structures from Published Literature. International Medical Informatics Association (IMIA), MedInfo, 2019.

Sahoo SS, Valdez J, Kim M, Rueschman M, Redline S. ProvCaRe: Characterizing scientific reproducibility of biomedical research studies using semantic provenance metadata. International Journal of Medical Informatics, 2019.

Gershon A, Devulapalli P, Zonjy B, Ghosh K, Tatsuoka C, Sahoo SS. Computing Functional Brain Connectivity in Neurological Disorders: Efficient Processing and Retrieval of Electrophysiological Signal Data. AMIA Joint Summits 2019, 2019.

Yang S, Ghosh K, Sakaie K, Sahoo SS, Carr S, Tatsuoka C. A Simplified Crossing Fiber Model in Diffusion Weighted Imaging. Frontiers in Neuroscience, 2019.

Socrates V, Gershon A, Sahoo SS. Computation of Brain Functional Connectivity Network Measures in Epilepsy: A Web-based Platform for EEG Signal Data Processing and Analysis (Poster). International Medical Informatics Association (IMIA), MedInfo, 2019.


Gershon AL, Lhatoo SD, Tatsuoka C, Ghosh K, Loparo K, Sahoo SS. Scalable Signal Data Processing for Measuring Functional Connectivity in Epilepsy Neurological Disorder. Biomedical Signal Processing in Big Data, Ervin Sejdic, Tiago Falk (Eds), 2018.

Valdez J, Kim M, Rueschman M, Redline S, Sahoo SS. Classification of Provenance Triples for Scientific Reproducibility: A Comparative Evaluation of Deep Learning Models in the ProvCaRe Project. International Provenance Annotation Workshop (IPAW) 2018 Proceedings, Springer, 2018.


Valdez J, Kim M, Rueschman M, Socrates V, Redline S, Sahoo SS. ProvCaRe Semantic Provenance Knowledgebase: Evaluating Scientific Reproducibility of Research Studies (Finalist for Distinguished Paper Award). American Medical Informatics Association (AMIA) Annual Symposium, 2017.

Sajatovic M, Tatsuoka C, Welter E, Friedman D, Spruill TM, Stoll S, Sahoo SS, Bukach A, Bamps YA, Valdez J, Jobst BC. Correlates of quality of life among individuals with epilepsy enrolled in self-management research: From the US Centers for Disease Control and Prevention Managing Epilepsy Well Network. Epilepsy Behavior, 2017.

Gershon AL, Zonjy B, Tatsuoka C, Ghosh K, Lhatoo SD, Sahoo SS. A Flexible Computational Neuroinformatics Workflow for Computing Functional Networks in Epilepsy Neurological Disorder (Abstract). American Medical Informatics Association (AMIA) Annual Symposium, Washington DC, 2017.

Valdez J, Rueschman M, Kim M, Arabyarmohammadi S, Redline S, Sahoo SS. An Extensible Ontology Modeling Approach Using Post Coordinated Expressions for Semantic Provenance in Biomedical Research. The 16th International Conference on. Ontologies, DataBases, and Applications of Semantics (ODBASE), Rhodes, Greece, 2017.

Gershon AL, Lhatoo SD, Tatsuoka C, Ghosh K, Loparo K, Sahoo SS. Scalable Signal Data Processing for Measuring Functional Connectivity in Epilepsy Neurological Disorder. Biomedical Signal Processing in Big Data, Ervin Sejdic, Tiago Falk (Eds), 2017.


Sahoo SS, Valdez J, Rueschman M. Scientific Reproducibility in Biomedical Research: Provenance Metadata Ontology for Semantic Annotation of Study Description. American Medical Informatics Association (AMIA) Annual Symposium, 2016.

Valdez J, Rueschman M, Kim M, Redline S, Sahoo SS. An Ontology-Enabled Natural Language Processing Pipeline for Provenance Metadata Extraction from Biomedical Text. 15th International Conference on Ontologies, DataBases, and Applications of Semantics (ODBASE), 2016.

Sahoo SS, Ramesh P, Welter E, Bukach A, Valdez J, Tatsuoka C, Bamps Y, Stoll S, Jobst BC, Sajatovic M. Insight: An Ontology-based Integrated Database and Analysis Platform for Epilepsy Self-Management Research. International Journal of Medical Informatics, 2016.

Sahoo SS, Wei A, Valdez J, Wang L, Zonjy B, Tatsuoka C, Loparo KA, Lhatoo SD. NeuroPigPen: a Scalable Toolkit for Processing Electrophysiological Signal Data in Neuroscience Applications using Apache Pig. Frontiers in Neuroinformatics, 2016.

Sahoo SS, Wei A, Tatsuoka C, Ghosh K, Lhatoo SD. Processing Neurology Clinical Data for Knowledge Discovery: Scalable Data Flows Using Distributed Computing (Book Chapter). {null}, 2016.

Dean DA, Goldberger AL, Mueller R, Kim M, Rueschman M, Mobley D, Sahoo SS, Jayapandian C, Cui L, Morrical MG, Surovec S, Zhang GQ, Redline S. Scaling up Scientific Discovery in Sleep Medicine. The National Sleep Research Resource, 2016.


LaFrance Jr. WC, Ranieri R, Bamps Y, Stoll S, Sahoo SS, Welter E, Sams J, Tatsuoka C, Sajatovic M. Comparison of common data elements from the Managing Epilepsy Well (MEW) Network integrated database and a well-characterized sample with nonepileptic seizures. Epilepsy & Behavior, 2015.

Yang S, Tatsuoka C, Ghosh K, Lacuey-Lecumberri N, Lhatoo SD, Sahoo SS. Comparative Evaluation for Brain Structural Connectivity Approaches: Towards Integrative Neuroinformatics Tool for Epilepsy Clinical Research (Nominated for the Best Student Paper Award). AMIA 2016 Joint Summits on Translational Science, 2015.

Ramesh P, Wei A, Sams J, Welter E, Lhatoo S, Sajatovic M, Sahoo SS. Insight: Semantic Provenance and Analysis Platform for Multi-center Neurology Healthcare Research. Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2015.

Sahoo SS, Rao P. Provenance Analysis and RDF Query Processing: W3C PROV for Data Quality and Trust. In the 14th International Semantic Web Conference (ISWC 2015), Bethlehem, PA, 2015.

Jayapandian C, Wei A, Ramesh P, Zonjy B, Lhatoo SD, Loparo K, Zhang GQ, Sahoo SS. A Scalable Neuroinformatics Data Flow for Electrophysiological Signals using MapReduce. Frontiers in Neuroinformatics, 2015.

Sahoo SS, Zhang GQ, Bamps Y, Fraser R, Stoll S, Lhatoo SD, Tatsuoka C, Welter E, Sajatovic M. Managing Information Well: Toward an Ontology-driven Informatics Platform for Data Sharing and Secondary Use in Epilepsy Self-Management Research Centers. Health Informatics Journal, 2015.

Sahoo SS, Rueschman M, Valdez J, Hsu W, Lhatoo SD, Redline S. Provenance Analysis over Biomedical Big Data Using PROV: Towards Effective Secondary Data Analysis Across Multiple Studies (Poster). NIH Big Data to Knowledge (BD2K) Meeting, Bethesda MD, 2015.


Cui L, Sahoo SS, Lhatoo SD, Garg G, Rai P, Bozorgi A, Zhang GQ. Complex Epilepsy Phenotype Extraction from Narrative Clinical Discharge Summaries. Journal of Biomedical Informatics, 2014.

Jayapandian CP, Chen CH, Dabir A, Zhang GQ, Lhatoo SD, Sahoo SS. Domain Ontology As Conceptual Model for Big Data Management: Application in Biomedical Informatics. Proceedings of the 33rd International Conference on Conceptual Modeling (ER 2014), 2014.

Zhang GQ, Cui L, Lhatoo, SD, Schuele SU, Sahoo SS. MEDCIS: Multi-Modality Epilepsy Data Capture and Integration System. American Medical Informatics Association (AMIA) Annual Symposium, 2014.

Sahoo SS, Tao S, Parchman A, Luo Z, Cui L, Mergler P, Lanese R, Barnholtz-Sloan JS, Meropol NJ, Zhang GQ. Trial Prospector: Matching Patients with Cancer Research Studies using an Automated and Scalable Approach. Journal of Cancer Informatics, 2014.

Sahoo SS, McIntyre C, Lhatoo SD. A Match Made in Cloud? Meeting the Requirements of the Next Generation Neuroscience Research Using Configurable Cloud Infrastructure. National Science Foundation (NSF) Cloud Workshop, 2014.


Sahoo SS, Jayapandian C, Garg G, Kaffashi F, Chung S, Bozorgi A, Chen CH, Loparo K, Lhatoo SD, Zhang GQ. Heartbeats in the Cloud: Distributed Analysis of Electrophysiological “Big Data” using Cloud Computing for Epilepsy Clinical Research. Journal of American Medical Informatics Association JAMIA (special issue on Big Data in Healthcare and Biomedical Research), 2013.

Parchman AJ, Zhang GQ, Mergler P, Barnholtz-Sloan J, Lanese R, Miller DW, Opper C,Sahoo SS, Tao S, Teagno J, Warfe J, Meropol NJ. Trial prospector: An automated clinical trials eligibility matching program. Proceedings of the American Society of Clinical Oncology (ASCO) Annual Meeting, 2013.

Asiaee AH, Doshi P, Minning T, Sahoo SS, Parikh P, Sheth A, Tarleton RL. From Questions to Effective Answers: On the Utility of Knowledge-Driven Querying Systems for Life Sciences Data. The 9th International Conference on Data Integration in the Life Sciences (DILS), 2013.

Jayapandian CP, Chen CH, Bozorgi A, Lhatoo SD, Zhang GQ, Sahoo SS. Electrophysiological Signal Analysis and Visualization using Cloudwave for Epilepsy Clinical Research. The 14th World Congress on Medical and Health Informatics (MedInfo), Stud Health Technol Inform, 2013.

Sahoo SS, Zhang GQ, Lhatoo SD. Epilepsy Informatics and an Ontology-driven Infrastructure for Large Database Research and Patient Care in Epilepsy. Review Paper, Epilepsia, 2013.

Lebo T, Sahoo SS, McGuinness D. (eds.). PROV-O: The PROV Ontology. W3C Recommendation, 2013.

Cui L, Mueller R, Sahoo SS, Zhang GQ. Querying Complex Federated Clinical Data Using Ontological Mapping and Subsumption Reasoning. IEEE International Conference on Healthcare Informatics 2013 (ICHI 2013), 2013.

Bozorgi A, Chung S, Kaffashi F, Loparo KA, Sahoo SS, Zhang GQ, Kaiboriboon K, Lhatoo SD. Significant postictal hypotension: expanding the spectrum of seizure-induced autonomic dysregulation. Epilepsia, 2013.

Sahoo SS, Lhatoo SD, Gupta DK, Cui L, Zhao M, Jayapadian C, Bozorgi A, Zhang GQ. Epilepsy and Seizure Ontology: Towards an Epilepsy Informatics Infrastructure for Clinical Research and Patient Care. Journal of American Medical Informatics Association (JAMIA), 2013.

Jayapandian CP, Chen CH, Bozorgi A, Lhatoo SD, Zhang GQ, Sahoo SS. Cloudwave: Distributed Processing of “Big Data” from Electrophysiological Recordings for Epilepsy Clinical Research Using Hadoop. American Medical Informatics Association (AMIA) Annual Symposium, 2013.


Jayapandian C, Ewing R, Zhang GQ, Sahoo SS. A Semantic Proteomics Dashboard (SemPoD) for Proteomics Data Management in Translational Research. AMIA Clinical Research Informatics Summit (CRI), 2012.

Jayapandian C, Zhao M, Ewing R, Zhang GQ, Sahoo SS. A semantic proteomics dashboard (SemPoD) for data management in translational research. BMC Systems Biology, 2012.

Parikh PP, Zheng J, Logan-Klumper F, Stoeckert Jr. CJ, Louis C, Topalis P, Protasio AV, Sheth AP, Carrington M, Berriman M, Sahoo SS. The Ontology for Parasite Lifecycle (OPL): Towards a Consistent Vocabulary of Lifecycle Stages in Parasitic Organisms. Journal Biomedical Semantics (JBMS), 2012.

Zhang GQ, Sahoo SS, Lhatoo SD. From Classification to Epilepsy Ontology and Informatics. Epilepsia, 2012.

Parikh PP, Minning TA, Nguyen V, Lalithsena S, Asiaee AH, Sahoo SS, Doshi P, Tarleton R, Sheth AP. A Semantic Problem Solving Environment for Integrative Parasite Research: Identification of Intervention Targets for Trypanosoma cruzi. PLoS Neglected Tropical Diseases, 2012.

S.S. Sahoo, M. Zhao, L. Luo, A. Bozorgi, D. Gupta, S.D Lhatoo, GQ Zhang. OPIC: Ontology-driven Patient Information Capturing System for Epilepsy. Proceedings of the American Medical Informatics Association (AMIA) Annual Symposium, 2012.

Cui L, Bozorgi A, Lhatoo SD, Zhang GQ, Sahoo SS. EpiDEA: Extracting Structured Epilepsy and Seizure Information from Patient Discharge Summaries for Cohort Identification. American Medical Informatics Association (AMIA) Annual Symposium, 2012.

Zhang GQ, Luo L, Ogbuji C, Joslyn C, Mejino J, Sahoo SS. An Analysis of Multi-type Relational Interactions in FMA Using Graph Motifs. American Medical Informatics Association (AMIA) Annual Symposium, 2012.

Teagno J, Kiefer RC, Pathak J, Zhang GQ, Sahoo SS. A Distributed Semantic Web Approach for Cohort Identification. Proceedings of the American Medical Informatics Association (AMIA) Annual Symposium, 2012.


Mueller R, Sahoo SS, Dong X, Redline S, Arabandi S, Luo L, Zhang GQ. Mapping multi-institution data sources to domain ontology for data federation: the PhysioMIMI approach. AMIA Clinical Research Informatics Summit (CRI), 2011.

Sahoo SS, Nguyen V, Bodenreider O, Parikh PP, Minning T, Sheth AP. A unified framework for managing provenance information in translational research. BMC Bioinformatics, 2011.

Zhao J, Sahoo SS, Missier P, Sheth AP, Goble C. Extending Semantic Provenance into the Web of Data. IEEE Internet Computing, 2011.

Sahoo SS, Ogbuji C, Luo L, Dong X, Cui L, Redline SS, Zhang GQ. MiDas: Automatic Extraction of a Common Domain of Discourse in Sleep Medicine for Multi-center Data Integration. American Medical Informatics Association (AMIA) Annual Symposium, 2011.

Sahoo SS. Towards Desiderata for Provenance Ontologies in Biomedicine. International Conference on Biomedical Ontologies (ICBO), 2011.

Zhang GQ, Mueller R, Jonhson N, Arabandi S, Sahoo SS, Redline S. Online Exploration of Case-control Study Designs in VISAGE. AMIA Clinical Research Informatics Summit (CRI), 2011.


Patni H, Sahoo SS, Henson C, Sheth A. Provenance Aware Linked Sensor Data. The 2nd International Workshop on Trust and Privacy on the Social and Semantic Web, co-located with ESWC, 2010.

Barga R, Simmhan Y, Chinthaka-Withana E, Sahoo SS, Jackson J, Araujo N. Provenance for Scientific Workflows Towards Reproducible Research. IEEE Data Engineering Bulletin, 2010.

Sahoo SS, Bodenreider O, Hitzler P, Sheth AP, Thirunarayan K. Provenance Context Entity (PaCE): Scalable provenance tracking for scientific RDF data. The 22nd International Conference on Scientific and Statistical Database Management (SSDBM), 2010.

Missier P,Sahoo SS, Zhao J, Goble C, Sheth A. Janus: from workflows to semantic provenance and linked open data. The 3rd International Provenance and Annotation Workshop (IPAW), Lecture Notes in Computer Science, 2010.

Deus H, Zhao J, Sahoo SS, Samwald M, Prud’hommeaux E, Miller M, Marshall MS, Cheung K. Provenance of Microarray Experiments for a Better Understanding of Experiment Results. The 2nd International Workshop on Role of Semantic Web in Provenance Management (SWPM 2010), co-located with ISWC, 2010.

Sahoo SS, Groth P, Hartig O, Miles S, Coppens S, Myers J, Gil Y, Moreau L, Zhao J, Panzer M, Garijo D.. Provenance Vocabulary Mappings. W3C Provenance Incubator Group Report, 2010.


Sahoo SS, Sheth A.. Provenir ontology: Towards a Framework for eScience Provenance Management.. Microsoft eScience Workshop, 2009.

Sahoo SS, Weatherly DB, Mutharaju R, Anantharam P, Sheth AP, Tarleton RL. Ontology-driven Provenance Management in eScience: an Application in Parasite Research. The 8th International Conference on Ontologies, DataBases, and Applications of Semantics, (ODBASE), 2009.

Sahoo SS, Halb W, Hellmann S, Idehen K, Thibodeau Jr. T, Auer S, Sequeda J, Ezzat A. A Survey of Current Approaches for Mapping of Relational Databases to RDF. W3C RDB2RDF Incubator Group Report, 2009.


Sheth A, Henson C,Sahoo SS. Semantic Sensor Web. IEEE Internet Computing, 2008.

Sahoo SS, Sheth AP, Henson C.. Semantic Provenance for eScience: ‘Meaningful’ Metadata to Manage the Deluge of Scientific Data. IEEE Internet Computing, Web-Scale Workflow Track, M.B. Blake and M. Huhns (Eds.). (Featured in Association of Computing Machinery (ACM) TechNews 2008), 2008.

Sahoo SS, Bodenreider O, Rutter JL, Skinner KJ, Sheth AP. An ontology-driven semantic mash-up of gene and biological pathway information: Application to the domain of nicotine dependence. Journal of Biomedical Informatics (Special Issue: Semantic Mashup of Biomedical Data), 2008.

Valerio MD, Sahoo SS, Barga RS, Jackson JJ. Capturing Workflow Event Data for Monitoring, Performance Analysis, and Management of Scientific Workflows. SWBES08, co-located with the 4th IEEE International Conference on eScience, 2008.


Sahoo SS, Bodenreider O, Zeng K, Sheth AP. An experiment in integrating large biomedical knowledge resources with RDF: Application to associating genotype and phenotype information. International Workshop on Health Care and Life Sciences Data Integration for the Semantic Web, co-located with WWW2007, 2007.

Sahoo SS, Zeng K, Bodenreider O, Sheth AP. From ‘glycosyltransferase’ to ‘congenital muscular dystrophy’: Integrating knowledge from NCBI Entrez Gene and the Gene Ontology. The 12th World Congress on Health (Medical) Informatics (Medinfo), 2007.

Sahoo SS, Sheth A, Hunter B, York WS. SemBOWSER–Adding Semantics to biological Web services registry. Semantic Web: Revolutionizing Knowledge Discovery in the Life Sciences. Baker CJO, Cheung KO (Eds.). Springer, 2007.


Sahoo SS, Thomas C, Sheth AP, York WS, Tartir S. Knowledge Modeling and Its Application in Life Sciences: A Tale of Two Ontologies. The 15th International World Wide Web (WWW) Conference, 2006.

Sahoo SS, Sheth A. Bioinformatics applications of Web Services, Web Processes and role of Semantics. Semantic Web Processes and Their Applications. Cardoso J, Sheth A (Eds.). Springer, 2006.


Sahoo SS, Sheth AP, York WS, Miller JA. Semantic Web Services for N-glycosylation Process. International Symposium on Web Services for Computational Biology and Bioinformatics, 2005.

Sahoo SS, Thomas C, Sheth AP, Henson C, York WS. GLYDE-An expressive XML standard for the representation of glycan structure. Carbohydrate Research, 2005.

Atwood III J,Sahoo SS, Alvarez-Manilla G, Weatherly DB, Kolli K, Orlando R, York WS. Simple modification of a protein database for mass spectral identification of N-linked glycopeptides. Rapid Communications Mass Spectrometry, 2005.

Alvarez-Manilla G, Atwood. III J,Sahoo SS, Guo Y, Warren NL, York WS, Orlando R, Pierce M. Tools for glycoproteomic analysis: size-exclusion chromatography facilitates identification of tryptic glycopeptides with N-linked glycosylation site. Glycobiology, 2005.

Aleman-Meza A, Halaschek-Wiener C,Sahoo SS, Sheth A, Arpinar B. Template Based Semantic Similarity for Security Applications. The IEEE Intl. Conference on Intelligence and Security Informatics (ISI-2005), 2005.


Sheth A, York WS, Thomas C, Nagarajan M, Miller JA, Kochut K, Sahoo SS, Yi X. Semantic Web technology in support of Bioinformatics for Glycan Expression. W3C Workshop on Semantic Web for Life Sciences, 2004.


Biomedical & Health Informatics Doctoral Program PQHS 416: Introduction to Computing in Biomedical Health Informatics
The Biomedical & Health Informatics (BHI) doctoral program trains researchers in biomedicine, population health, and clinical care. Program trainees will acquire a core set of skills spanning computing, biostatistics, and biomedical research through a combination of course work and participation in the study in the Population and Quantitative Health Sciences (PQHS) department. The doctoral program is designed for students to acquire skills in the three areas of concentration: Data Analytics with a focus on statistics and data wrangling, Biomedical Health with a focus on systems biology, clinical, and health issues and Computational and System Design with a focus on knowledge representation, information retrieval, and Big Data. “PQHS 416 introduces students to computational techniques and concepts that underpin biomedical and health informatics data management and analysis. In particular, the course will focus on the three topics of: (1) Biomedical terminologies and formal logic used in building knowledge models such as ontologies; (2) Natural language processing (NLP), and (3) Big Data technologies, including components of Hadoop stack and Apache Spark. This is a lecture-based course that relies on both materials covered in class and out-of-class readings of published literature. Students will be assigned reading assignments, homework exercise assignments and they are expected to complete homework assignment for each class. The students will be involved in a team project and they will be expected to prepare a project report at the end of the semester.”

Our Team

Satya Sahoo, PhD

headshot of team member

Assoc. Prof. of Medical Informatics

Nasim Shafiabadi, MD

headshot of team member

Research Fellow

Katrina Prantzalos, MS

headshot of team member

PhD Candidate

Dipak Prd. Upadhyaya, MPH

headshot of team member

PhD Candidate

Pedram Golnari, MD

headshot of team member

PhD Student

Pranav Nampoothiripad

headshot of team member

Undergraduate Researcher

Keerthi Sevugan

headshot of team member

Undergraduate Researcher

Our Alumni

1. Jianzhe Zhang

MS (First employer: ByteDance)

2. Arthur Gershon

PhD (Status: Post-Doctoral Scholar)

3. Catherine Jayapandian

PhD (Status: Post-Doctoral Scholar)

4. Priya Ramesh

MS (First employer: CoverMyMeds)

5. Xinting Hong


6. Pramith Devulapalli

BS (Status: PhD at Purdue University)

7. Vimig Socrates

BS, MS (Status: PhD at Yale University)

8. Meng Zhao

MS (First employer: IBM Explorys)

9. Li Wang


10. Chien-Hung Chen


11. Chang Liu

MS (First employer: Microsoft Corporation)

12. Annan Wei

MS (First employer: Google Inc)

Funding Agencies

National Institute of Biomedical Imaging and Bioengineering

logo of funding_agency

National Institute on Drug Abuse

logo of funding_agency

Department of Defence, Congressionally Directed Medical Research Programs

logo of funding_agency

Dravet Syndrome Foundation

logo of funding_agency

U.S. Department of Veterans Affairs

logo of funding_agency

© 2021 Case Western Reserve University
10900 Euclid Ave. Cleveland, Ohio 44106 216.368.2000
Department of Population and Quantative Health Sciences
Phone Number: 216-368-3286
Mailing Address: 2103 Cornell road, Iris S. & Bert l. Wolstein Research Building, Cleveland, OH44106-7291