The compilation and analysis of numerous databases is essential to biomedical research. To develop new therapies or disease prevention strategies, scientists increasingly need better quality data. However, the quality is highly variable and the integration of different data sets is very complicated. The Computational Health and Fitness Heart of Helmholtz Munich, one of Europe’s largest research centers for artificial intelligence in the medical sciences, collaborated with the Technical University of Munich (TUM) to discover alternatives to these problems and enable medical innovations for a healthier society. They have presented three of these in the journal Natural Methods.
Hemlotz’s Computational Health and Fitness Heart is developing new AI-powered computational tools to accelerate discovery and translation. It does this by developing predictive algorithms as well as mechanistic models to analyze molecular, imaging, and clinical data in human health and disease. In this way, it cooperates to create innovative diagnostics and new treatments for environmentally triggered diseases. The Technical University of Munich, one of the first universities in Germany to be named a University of Excellence, is committed to human-centered research and innovation.
Research based on single-cell genomics
Fabian Theis, scientific director of Hemlotz AI and professor of mathematical modeling of biological systems at TUM, said:
“It’s been a crazy 4 weeks, with many of our scientific stories and methods coming together in the same time window. Our research groups are focused on using single-cell genomics to understand the origin of disease in a mechanistic way – to do this, we are leveraging and developing machine learning approaches to better represent these complex data. In the three new papers, we worked on single-cell data integration, trajectory learning, and spatial resolution, respectively. In addition to the applications presented in the papers, we plan to support the next generation of single-cell research towards understanding disease.”
The following are the latest solutions developed by Helmholtz Munich and TUM researchers:
Atlas-level data integration benchmarking in single-cell genomics
To find out whether an observation made in a single dataset can be generalized, one has to check whether the same thing can be observed in other datasets of the same system. In single-cell data, so-called batch effects make it difficult to combine data sets in this way. There are differences between molecular profiles in samples because they were generated at a different time, in a different location, or from a different person. Overcoming these effects is a central challenge in single-cell genomics with over 50 proposed solutions.
A group of researchers around Malte Lücken carefully organized 86 data sets and compared 16 of the most popular data integration methods on 13 tasks. After more than 55,000 hours of computation time and a detailed evaluation of 590 results, they built a guide for optimized data integration. This allows for improved observations of disease processes in population-scale datasets.
CellRank for directed single-cell fate mapping
Many questions in biology revolve around ongoing processes like development or regeneration. While single-cell RNA sequencing can measure gene expression, it is a destructive method for cells and scientists only get static snapshots. Furthermore, although many algorithms have been developed to reconstruct continuous processes from snapshots of gene expression, they tell researchers nothing about the direction of the process.
To address these problems, Marius Lange and colleagues had developed the CellRank algorithm, which estimates directed trajectories of the cell state by combining previous reconstruction approaches with RNA velocity, a concept that can estimate gene up- or down-regulation. In both in vitro and in vivo applications, CellRank correctly inferred fate results and recovered previously known genes.
In an example of lung regeneration, CellRank predicted new intermediate cell states on a dedifferentiation pathway that was experimentally validated. CellRank is an open source software package used by biologists and bioinformaticians around the world to analyze complex cell dynamics in situations such as cancer, reprogramming or regeneration.
Squidpy: an evolutionary framework for spatial omics analysis
New technologies for measuring gene expression variation in tissues are emerging that allow scientists to view cells in context, thereby studying the principles of tissue organization and cell interaction. Giovanni Palla, Hannah Spitzer and colleagues have developed a new computational framework, called Squidpy, that allows analysts and developers to manage spatial gene expression data. Squidpy provides an efficient infrastructure and numerous analysis methods that allow for efficient storage, manipulation and visualization of spatial omics data. In addition, it is extensible and can be interfaced with a variety of machine learning tools in the Python ecosystem. Scientists around the world are already using it to analyze spatial molecular data.
article sources:
Lücken et al. 2021: Comparative analysis of atlas-level data integration in single-cell genomics. Natural Methods DOI: 10.1038/s41592-021-01336-8.
Lange et al. 2022: CellRank for directed single-cell fate mapping. Natural Methods DOI: 10.1038/s41592-021-01346-6.
Palla, Spitzer et al. 2022: Squidpy: an evolutionary framework for spatial omics analysis. Natural MethodsDOI: 10.1038/s41592-021-01358-2.
Translated from Des chercheurs munichois développent des méthodes d’IA pour la recherche biomédicale de nouvelle génération