Over the last years, large amounts of molecular profiling data (also called “omics data”) have become available. This has raised hopes to identify so-called disease modules, i.e., sets of functionally related molecules constituting candidate disease mechanisms. However, omics data tend to be overdetermined and noisy; and modules identified via purely statistical means are hence often unstable and functionally uninformative. Hence, network-based disease module mining methods (DMMMs) project omics data onto biological networks such as protein-protein interaction (PPI) networks, gene regulatory networks (GRNs), or microbial interaction networks (MINs). Subsequently, network algorithms are used to identify disease modules consisting of small subnetworks. This dramatically decreases the size of the search space and prioritizes disease modules consisting of functionally related molecules, positively affecting both stability and functional relevance of the discovered modules.
However, to the best of our knowledge, all existing DMMMs are subject to at least one of the following two limitations: Firstly, existing DMMMs are typically supervised, in the sense that they try to find subnetworks explaining differences in the omics data between predefined case and control patients or pre-defined disease subtypes. This is potentially problematic, because it implies that existing DMMMs are biased by our current disease ontologies, which are mostly symptom- or organ-based and therefore often too coarse-grained. For instance, around 95 % of all patients with hypertension are diagnosed with so-called “essential hypertension” (code BA00.Z in the ICD-11 disease ontology), meaning that the cause of the hypertension is unknown. In fact, there are probably several disjoint molecular mechanisms causing “essential hypertension”, and the same holds true for many other complex diseases such as Alzheimer’s disease, multiple sclerosis, and Crohn’s disease. Supervised DMMMs which take existing disease definitions for granted hence risk overlooking the molecular mechanisms causing mechanistically distinct subtypes.
Secondly, most existing DMMMs are designed for static omics data and do not support longitudinal data where the patients’ molecular profiles are observed over time. Existing analysis frameworks for longitudinal omics data largely use purely statistical means. Consequently, network medicine approaches for time series data are needed.
To the best of our knowledge, there are only three DMMMs which, in part, overcome these limitations: BiCoN and GrandForest allow unsupervised disease module mining but do not support longitudinal omics data. TiCoNE supports longitudinal data but requires predefined case vs. control or subtype annotations as input. There is hence an unmet need for unsupervised DMMMs for longitudinal omics data. Developing such methods is the main objective of the proposed project.
The Biomedical Network Science (BIONETS) lab investigates molecular disease mechanisms using techniques from network science, combinatorial optimization, and artificial intelligence. We develop algorithms and tools to mine multi-omics data for such mechanisms and to individuate novel strategies for mechanistically grounded drug repurposing and causally effective treatments of complex diseases. We also develop privacy-preserving decentralized biomedical AI solutions, which enable cross-institutional studies on sensitive data.
Research projects
Federated network medicine for laboratory data in paediatric oncology
(Third Party Funds Group – Overall project)
Funding source: BMBF / Verbundprojekt
In FLabNet, we will harness the potential of algorithmic network biology and distributed machine learning to address two exemplary unmet needs in paediatric oncology: prediction ofchemotherapy side effects like neutropenic fever and early-stage detection of rare malignantdiseases such as myeloproliferative neoplasms. Based on >54 million laboratory test resultsfrom >500,000 patients from the Core Dataset of the German Medical Informatics Initiative (MII),we will create personalised networks, where nodes represent individual laboratory measurementsand edges encode patient-specific relationships. We hypothesise the emerging personal graph representations to capture the unique spectra and dependencies of the individual patients’ health anddisease characteristics. The networks will be used as signatures for label-efficient graph-based pre-dictors such as graph kernels; and we will provide privacy-preserving federated implementationsof our predictors that are fully interoperable with MII standards. To achieve its objectives, ourconsortium combines expertise in algorithmic systems biology (FAU), paediatric oncology (UKER),quantitative analysis of laboratory data (UKER), federated learning for biomedicine (Bitspark GmbH& FAU), and professional software development (Bitspark GmbH). These synergistic skill sets willenable us to combine laboratory diagnostics, computational systems medicine, and privacy-preserving machine learning, advancing the state of the art in quantitative analysis of laboratory data for precision medicine in paediatric oncology and beyond.
High-resolution protein-protein interaction networks for biomedical research
(Third Party Funds Group – Overall project)
Funding source: andere Förderorganisation
URL: https://www.cobinet.ai/
A Platform for Dynamic Exploration of the Cooperative Health Research in South Tyrol Study Data via Multi-Level Network Medicine
(Third Party Funds Single)
Funding source: Deutsche Forschungsgemeinschaft (DFG)
URL: https://www.dyhealthnet.ai/
The Cooperative Health Research in South Tyrol (CHRIS) study offers a comprehensive overview of the health state of >13,000 adults in the middle and upper Val Venosta. It is the largest population-based molecular study in Italy with a longitudinal lookout to investigate the genetic and molecular basis of age-related common chronic conditions and their interaction with lifestyle and environment in the general population. In CHRIS, the combination of molecular profiling data, such as genomics and metabolomics, together with important baseline clinical and lifestyle data offers vast opportunities for understanding physiological changes that could lead to clinical complications or indicate the prevalence or early onset of diseases together with their molecular underpinnings.
Where disease-focused studies often have a clear hypothesis that dictates the necessary statistical analyses, population-based cohorts such as CHRIS are more versatile and allow both testing existing hypotheses as well as generating new hypotheses that arise from statistically significant associations of the available data. Ideally, this type of explorative analysis is open to biomedical researchers that do not necessarily have experience with data analysis or machine learning. Network-based approaches are ideally suited for studying heterogeneous biomedical data, giving rise to the field of network medicine. However, network medicine techniques have so far mainly been used in the context of studies focusing on individual diseases. Network-based platforms for the explorative analysis of population-based cohort data do not exist.
In DyHealthNet, we will close this gap and develop a network-based data analysis platform, which will allow to integrate heterogeneous data and support explorative data analytics over dynamically generated subsets of the CHRIS study data. To fully leverage the potential of the available multi-level data, the DyHealthNet platform combines (1) data integration using standardized medical information models (HL7 FHIR), (2) innovative index structures for scalable dynamic analysis, (3) machine learning, and (4) visual analytics. DyHealthNet will render the CHRIS population cohort data accessible for state-of-the-art privacy-preserving, network-based data analysis. DyHealthNet will hence enable mining of context-specific pathomechanisms for precision medicine, and will serve as a blueprint for dynamic explorative analysis of multi-level cohort data worldwide. To achieve these objectives, the DyHeathNet consortium combines expertise in population-based cohort studies (Fuchsberger) and in the development of complex algorithms for the analysis of molecular networks (Blumenthal), applied biomedical AI and software systems (List), and customized index structures for scalable data management (Gamper).
AI4MDD: AI-Powered Prognosis of Treatment Response in Major Depression Disorder
(Third Party Funds Single)
Funding source: Industrie
Dimensionality reduction for molecular data based on explanatory power of differential regulatory networks
(Third Party Funds Group – Overall project)
Funding source: Bundesministerium für Bildung und Forschung (BMBF)
URL: https://www.netmap.ai/
Rapid advances in single-cell RNA sequencing (scRNA-seq) technology are leading to ever-increasing dimensions of the generated molecular data, which complicates data analyses. In NetMap, new scalable and robust dimensionality reduction approaches for scRNA-seq data will be developed. To this end, dimensionality reduction will be integrated into a central task of the systems medicine analysis of scRNA-seq data: inference of gene regulatory networks (GRNs) and driver transcription factors based on cell expression profiles. Each resulting dimension will correspond to a driver GRN, and the coordinate of a cell in this low-dimensional representation will quantify the extent to which the particular driver GRN explains the cell's gene expression profile. These new methods will be implemented as a user-friendly software platform for exploratory expert-in-the-loop analysis and in silico prediction of drug repurposing candidates.
As a case study, we will investigate CD4 helper T cell exhaustion, a potential limiting factor in immunotherapy. NetMap's strategy consists of (1) analyzing phenotypic heterogeneity of depleted CD4 T cells, (2) identifying transcriptional mechanisms that control this heterogeneity, (3) amplifying/eliminating specific subsets and testing their functional impact. This will allow the development of an atlas of the gene regulatory landscape of depleted CD4 T cells, while the in vivo testing of key regulatory transcription factors will help demonstrate the power of the developed methods and allow evaluation and improvement of predictions.
Unsupervised Network Medicine for Longitudinal Omics Data
(FAU Funds)
Over the last years, large amounts of molecular profiling data (also called “omics data”) have become available. This has raised hopes to identify so-called disease modules, i.e., sets of functionally related molecules constituting candidate disease mechanisms. However, omics data tend to be overdetermined and noisy; and modules identified via purely statistical means are hence often unstable and functionally uninformative. Hence, network-based disease module mining methods (DMMMs) project omics data onto biological networks such as protein-protein interaction (PPI) networks, gene regulatory networks (GRNs), or microbial interaction networks (MINs). Subsequently, network algorithms are used to identify disease modules consisting of small subnetworks. This dramatically decreases the size of the search space and prioritizes disease modules consisting of functionally related molecules, positively affecting both stability and functional relevance of the discovered modules.
However, to the best of our knowledge, all existing DMMMs are subject to at least one of the following two limitations: Firstly, existing DMMMs are typically supervised, in the sense that they try to find subnetworks explaining differences in the omics data between predefined case and control patients or pre-defined disease subtypes. This is potentially problematic, because it implies that existing DMMMs are biased by our current disease ontologies, which are mostly symptom- or organ-based and therefore often too coarse-grained. For instance, around 95 % of all patients with hypertension are diagnosed with so-called “essential hypertension” (code BA00.Z in the ICD-11 disease ontology), meaning that the cause of the hypertension is unknown. In fact, there are probably several disjoint molecular mechanisms causing “essential hypertension”, and the same holds true for many other complex diseases such as Alzheimer’s disease, multiple sclerosis, and Crohn’s disease. Supervised DMMMs which take existing disease definitions for granted hence risk overlooking the molecular mechanisms causing mechanistically distinct subtypes.
Secondly, most existing DMMMs are designed for static omics data and do not support longitudinal data where the patients’ molecular profiles are observed over time. Existing analysis frameworks for longitudinal omics data largely use purely statistical means. Consequently, network medicine approaches for time series data are needed.
To the best of our knowledge, there are only three DMMMs which, in part, overcome these limitations: BiCoN and GrandForest allow unsupervised disease module mining but do not support longitudinal omics data. TiCoNE supports longitudinal data but requires predefined case vs. control or subtype annotations as input. There is hence an unmet need for unsupervised DMMMs for longitudinal omics data. Developing such methods is the main objective of the proposed project.
2024
2023
2022
2021
2020
Related Research Fields
Contact: