Studium + Lehre

Praktika, Bachelor‑ und Masterarbeitsthemen

Potential topics for internships and thesis at the Predictive Deep Learning for Medicine and Healthcare Lab

The research group Predictive Deep Learning for Medicine and Healthcare (PredLMed) led by Prof. Dr. Anne-Christin Hauschild at the department for medicine at Justus-Liebig-Univerity Giessen, university hospital Giessen and Marburg (UKGM) offers the following internship, Bachelor, and Masterthesis topics:

Graph-Structured Federated Learning for CT Imaging Using Foundation Model Representations

Offered as:

Masterthesis/Bachelorthesis (work in process)

Intro

Computed tomography (CT) imaging is widely used in clinical practice, yet developing robust deep learning models is challenged by data silos, privacy constraints, and substantial inter-institutional heterogeneity. Federated learning enables collaborative training without sharing raw data, but it is particularly vulnerable to local overfitting under non-IID distributions and to performance degradation when certain classes are absent at individual sites. While foundation models provide strong and transferable representations, their effectiveness in federated settings remains limited when label imbalance and class-missing scenarios are present. This thesis investigates the role of graph neural networks as a structural mechanism to model relationships across institutions and feature spaces, with the goal of mitigating local overfitting and improving knowledge propagation under class-missing conditions. By integrating foundation model embeddings with graph-based federated aggregation, the proposed framework aims to enhance generalization, robustness, and fairness in multi-institutional CT image analysis.

Goal

This thesis aims to develop federated learning algorithm for CT images based on foundation models and graph neural networks.
The objectives include:

Providing benchmark performance of foundation models for CT images classification in Federated Learning.
Answering the question of position of foundation models in federated learning for medical images.
Developing fairness federated learning algorithm for CT images classification based on graph neural networks.
Investigating the ability of graph neural networks for addressing local overfitting issue of federated learning.
Investigating the ability of graph neural networks for addressing the issue of missing a specific class’s samples in training phase.

References

[1] Li, Wenxuan, et al. “AbdomenAtlas: A large-scale, detailed-annotated, & multi-center dataset for efficient transfer learning and open algorithmic benchmarking.” Medical Image Analysis 97 (2024): 103285.
[2] Shui, Zhongyi, et al. “Large-scale and fine-grained vision-language pre-training for enhanced ct image understanding.” arXiv preprint arXiv:2501.14548 (2025).
[3] https://github.com/MrGiovanni/SuPreM?tab=readme-ov-file
[4] Pai, Suraj, et al. “Vision foundation models for computed tomography.” arXiv preprint arXiv:2501.09001 (2025).
[5] Blankemeier, Louis, et al. “Merlin: A vision language foundation model for 3d computed tomography.” Research Square (2024): rs-3.
[6] https://huggingface.co/project-lighter
[7] Wu, Chaoyi, et al. “Towards generalist foundation model for radiology by leveraging web-scale 2d&3d medical data.” Nature Communications 16.1 (2025): 7866.
[8] https://github.com/chaoyi-wu/RadFM
[9] Kim, Sungwon, et al. “Subgraph federated learning for local generalization.” arXiv preprint arXiv:2503.03995 (2025).

Contact

Tien Nguyen: anh.t.nguyen@uni-giessen.de
Anne-Christin Hauschild : anne-christin.hauschild@uni-giessen.de

Clinically-Informed vs. Uniform Patching in Graph-Based ECG Classification

Offered as:

Masterthesis/Bachelorthesis (work in process)

Intro

Cardiovascular disease (CVD) poses a significant global health risk, with the incidence continuing to rise due to lifestyle changes and an aging population. Electrocardiogram (ECG) is a non-invasive tool for real-time monitoring of cardiac electrical activity, playing a key role in CVD diagnosis. Traditional machine learning models often process ECG data as unstructured sequences, potentially overlooking the intricate relationships between different waveform components. Recent advancements have introduced graph-based models that segment ECG signals into uniform patches, treating each as a node in a graph. While this method captures structural information, it may not align with clinically significant features such as P-waves, QRS complexes, and T-waves. This misalignment could limit both the diagnostic performance and interpretability of the models. Incorporating clinically-informed patching strategies might enhance the model’s ability to focus on medically relevant signal segments, potentially improving classification accuracy and providing more meaningful insights.

Goal

This thesis aims to develop and evaluate a graph-based ECG classification framework that compares clinically-informed patching with uniform patching strategies. The objectives include:

Designing a clinically-informed patching method
Training and evaluating the GNN with an already existing pipeline
Assessing model performance and interpretability
Comparing clinically-informed patching with uniform patching

References

[1] https://doi.org/10.1109/JBHI.2023.3327025
[2] https://doi.org/10.21203/rs.3.rs-7721630/v1

Contact

miriamcindy.maurer@med.uni-goettingen.de
anne-christin.hauschild@uni-giessen.de

Style-Aware Unsupervised Clustring of Pancreatic Cancer Whole Slide Images

Offered as:

Masterthesis/Bachelorthesis

Intro

Well-annotated and plentiful data sources are a crucial element in medical image analysis using deep learning.
The challenge lies in getting this data and having it annotated by experts in the field.
The difficulty of this is exaggerated when working with histopathological whole slide images.
These images can have resolutions of 100,000 pixels, contain multiple cancer types, and the staining process during a scan has a pronounced influence on the resulting image [1].
Previous works, such as CTransPath[1], have addressed these issues by using self-supervised learning, eliminating the need for a fully annotated dataset and requiring only a few annotated samples for validation and testing.
One can also draw inspiration from image processing algorithms such as “Image Style Transfer Using Convolutional Neural Networks” by L. Gatys et al.[2].

Example Image Style Transfer, [2]Gatys et al. Figure 3

While the primary objective of neural style transfer is visual synthesis, the underlying style representation is highly relevant to histopathological image analysis.
By explicitly modeling histological style, it becomes possible to capture complementary information that is not directly tied to tissue labels but may still be critical for understanding latent phenotypic structure within whole slide images.

High Level Model Architecture. Extension of an existing Model by (1) Extracting intermediare Encodings, (2) Computing the Gramm Matrix of the Encoding, (3) Concatenating the original Output and the Gramm Matrix, [1]Wang, Xiyue, et al. Figure 2, [3]Hien et al. Figure 4

Goal

The goal of this thesis is to incorporate style-transfer feature representations into unsupervised clustering frameworks and evaluate their usefulness for performance and model interpretability.
Specifically your tasks in this Thesis will be: preparing the dataset(s), extracting features from the WSI, clustering the WSI samples based on the features and evaluating the clustering.

References

[1] Wang, Xiyue, et al. “Transformer-based unsupervised contrastive learning for histopathological image classification.” Medical image analysis 81 (2022): 102559.
[2] Gatys et al. “Image style transfer using convolutional neural networks.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
[3] Hien, N. L. H., L. Van Huy, and N. Van Hieu. “Artwork style transfer model using deep learning approach,‖ Cybern.” (2021): 127-137.

Contact

Jonas Harriehausen: jonas.harriehausen@uni-giessen.de
Anne-Christin Hauschild : anne-christin.hauschild@uni-giessen.de

Federated Learning for Distributed Omics Data Under Batch Effects and Cohort Shift

Offered as:

Masterthesis/Bachelorthesis

Intro

Biomedical omics data are frequently distributed across institutions, studies, or consortia, and data sharing is often constrained by governance rules, contractual agreements, and privacy considerations. Federated Learning (FL) enables collaborative model training without exchanging raw data, by aggregating locally trained model updates into a global model. While FL is attractive for sensitive biomedical applications, omics data pose specific challenges: strong cohort and batch effects, heterogeneous patient populations, variable measurement protocols, and frequent class imbalance. These factors result in non-IID client data and can lead to global models that perform unevenly across sites, limiting their translational value. This thesis explores federated learning for omics prediction tasks, focusing on robustness across cohorts and the mitigation of cohort-specific biases.

Goal

This thesis aims to benchmark federated learning for multi-cohort gene expression prediction under batch effects and cohort shift by implementing a federated baseline (e.g., FedAvg) and comparing it to centralized training on a multi-cohort task, evaluating robustness under cohort-based client splits with emphasis on both average and worst-client performance, and testing one heterogeneity-aware improvement (e.g., FedDAW) to document when and why it improves robustness.

References

[1] https://arxiv.org/abs/1602.05629

[2] https://arxiv.org/abs/1812.06127

[3] https://arxiv.org/abs/1910.06378

Contact

maryam.moradpour@med.uni-goettingen.de
anne-christin.hauschild@uni-giessen.de

Species Agnostic Tranferlearning for species of evolutionary varying distance

Offered as:

Masterthesis/Bachelorthesis

Intro

Huge advances in next-generation sequencing technologies have enabled new avenues to study molecular mechanisms in a large number of organisms. In frequently examined organisms such as human and mouse, machine learning algorithms have been developed and employed to strengthen these analyses (Ren et al., 2022; Spänig et al., 2021). However, the application of ML methods often requires large quantities of data and the sparsity of it in less studied organisms restricts the utility. These methods still rely on a defined set of genes, which restricts analysis to the same species. Cross-species analyses mostly rely on the utilization of gene orthologies, reducing the amount of genes to a commonly merged set, which is frequently incomplete (Zhao et al., 2021). We have recently developed a novel machine learning algorithm called Species-Agnostic-Transfer-Learning (SATL), to enable the transfer of knowledge encoded in predictive ML models built on data from one species to allow for the prediction of corresponding classes such as cell types in another species without relying on orthology predictions (Park et al., 2021). Moreover, the linear combination of species-specific genes spanning the latent space during the alignment, allows for functional annotations based on genes that would be missing from orthology based approaches.

Goal

This project aims to develop Species-Agnostic-Transfer-Learning (SATL) for Cross-Insecta and Cross-Protostoma Analysis, enabling knowledge transfer across across evolutionary more distant species. By leveraging existing public datasets, we will further develop and validate cross-species SATL for prediction and functional annotation for instance in fly, beetle and other species.

The proposed work includes:

Extract suitable datasets from public databases
Training and evaluating the SATL with an already existing pipeline
Assessing model performance and interpretability
Comparing Results according to evolutionary distance of species

References

[1] https://academic.oup.com/bib/article/25/2/bbae004/7596256

Contact

anne-christin.hauschild@uni-giessen.de

LIME for xGNN4MI - Adapting an explainability method for Grapf Neural Networks

Offered as:

Masterthesis

Intro

Deep Neural Networks (DNNs) are increasingly used in the medical domain to model biosignals such as electrocardiograms (ECGs) or electroencephalograms (EEGs). While these models can achieve high classification performance, their interpretability remains limited—particularly when inputs are derived from time-series data.

Local Interpretable Model-agnostic Explanations (LIME) is a widely-used technique that provides intuitive, local explanations by perturbing input features and fitting an interpretable surrogate model. However, its standard formulation is not directly applicable to graph-structured time-series inputs. Adapting LIME for time-aware graph explanations could bridge the gap between model accuracy and clinical interpretability, helping clinicians trust and understand predictions based on time-evolving signals.

Goal

The goal of this thesis is to adapt and evaluate the LIME framework for use with time-series inputs in Graph Neural Networks. The proposed work includes:

Reviewing existing literature on LIME, DNN explainability and time-series interpretation techniques
Defining a time-aware pertubation strategy for graph-based inputs derived from time-series data
Developing a modified LIME implementation that fits local surrogate models using temporal segments and graph-aware features
Evaluating the method on a real world classification task using biosignal data and comparing it to integrated gradients (or other XAI method)
Validating clinical relevance

References

[1] https://doi.org/10.1038/s41746-026-02367-1
[2] https://github.com/emanuel-metzenthin/Lime-For-Time
[3] https://github.com/mdhabibi/LIME-for-Time-Series/
[4] https://github.com/HauschildLab/xGNN4MI

Contact

miriamcindy.maurer@med.uni-goettingen.de
philip.zaschke@med.uni-goettingen.de
anne-christin.hauschild@uni-giessen.de

LIME for Multi-Channel Timeseries - Adapting an explainability method for Biosignals

Offered as:

Internship, Bachelorthesis

Intro

Local Interpretable Model-agnostic Explanations (LIME) is a widely-used technique that provides intuitive, local explanations by perturbing input features and fitting an interpretable surrogate model. However, its standard formulation is not directly applicable to multi-channel time-series inputs. Adapting LIME for time-aware ECG and EEG explanations could bridge the gap between model accuracy and clinical interpretability, helping clinicians trust and understand predictions based on time-evolving signals.

Goal

The goal of this thesis is to adapt and evaluate the LIME framework for use with time-series inputs in Graph Neural Networks. The proposed work includes:

Reviewing existing literature on LIME, DNN explainability and time-series interpretation techniques
Defining a time-aware pertubation strategy for multi-channel inputs derived from ECG or EEG data
Developing a modified LIME implementation that fits local surrogate models using temporal segments and multi-channel features
Evaluating the method on a real world classification task using biosignal data and comparing it to integrated gradients (or other XAI method)
Validating clinical relevance

References

[1] https://doi.org/10.48550/arXiv.1602.04938
[2] https://github.com/emanuel-metzenthin/Lime-For-Time
[3] https://github.com/mdhabibi/LIME-for-Time-Series/

Contact

miriamcindy.maurer@med.uni-goettingen.de
philip.zaschke@med.uni-goettingen.de
anne-christin.hauschild@uni-giessen.de

Federated learning for medical data

Offered as:

Masterthesis

Intro

Sensitive patient information such as clinical data and medical registry data is often stored in critical healthcare infrastructure distributed across institutes. The analysis of such data harbours privacy risks and thus falls under a variety of legal regulations such as the General Data Protection Regulation (GDPR), making the application of traditional machine learning algorithms often impossible. Essentially data exchange among institutions over the internet is posing a roadblock hampering big-data-based medical innovations. Federated Learning techniques aim to overcome the barrier of exchanging raw patient data and move towards large-scale medical data mining. The idea is to build a generalized global model without access to a shared dataset by merging locally trained models that capture the essence of the data.

Goal

In your thesis you will implement federated learning FL algorithms that follow a privacy by design architecture. We will employ the algorithms to different clinical data sets such as eICU or MIMIC-III . Finally, you will evaluate whether the federated models can compete with a centralized machine learning model. In particular, we will focus, if a federated machine learning model can cope with challenges such as biases between different sites.

Contact

Anne-Christin Hauschild : anne-christin.hauschild@uni-giessen.de

Multiple Response Optimization Applied in Machine Learning for Medical and Biomedical data

Offered as:

Masterthesis

Intro

In many fields, especially in medical and biomedical research, the optimization of different quality features is of great importance. These characteristics often depend on multiple control factors in a complex and non-linear manner. The challenge arises when adjusting these control factors to optimize one characteristic potentially compromises the optimization of other characteristics. This complexity requires a holistic approach to ensure that all quality attributes are optimized simultaneously. To address this challenge, our proposed framework integrates machine learning techniques with multi-objective optimization algorithms. This integration is designed to meet the complex requirements of medical and biomedical data analysis, where the relationships between quality attributes and their control factors are multifaceted.

Goal

Our goal is to tackle multi-response optimization problems, which are common in scenarios that require the simultaneous optimization of multiple response variables through the adjustment of control factors. It involves using a multi-objective optimization approach capable of navigating complex, high-dimensional spaces. This strategy focuses on the use of different machine learning models for the modeling phase, along with the use of multi-objective meta-heuristic algorithms, such as NSGA-II, during the optimization phase. This approach allows experimenting with different machine learning methods, using multi-objective and single-objective optimization algorithms, and simultaneously maximizing important metrics such as accuracy, specificity, and sensitivity. This effort also includes a critical evaluation of the choice of multi-objective optimization over single-objective optimization, with the aim of identifying the best meta-parameters to achieve optimal results among multiple objectives. This comprehensive approach emphasizes the importance of a multifaceted optimization strategy in increasing the quality and efficiency of medical and biomedical data analysis.

References

https://doi.org/10.1080/23311916.2018.1502242
https://doi.org/10.1155/2017/5907264
https://doi.org/10.1515/eqc-2018-0024

Contact

Maryam Moradpour: maryam.moradpour@med.uni-goettingen.de

Benchmarking Explainable AI Methods for Graph-Based ECG Classification using xGNN4MI

Offered as:

Masterthesis

Intro

Cardiovascular disease (CVD) poses a significant global health risk, with the incidence continuing to rise due to lifestyle changes and an aging population. Electrocardiography (ECG) is a cornerstone of non-invasive cardiac diagnostics, providing real-time insights into cardiac electrical activity. In recent years, graph neural networks (GNNs) have emerged as a powerful paradigm for ECG analysis by representing ECG signals as graphs, enabling the modeling of complex inter-lead and temporal relationships.

The xGNN4MI framework ([2]) introduces an explainable graph-based approach for myocardial infarction (MI) classification from multi-lead ECG signals, combining high-performance graph learning with post-hoc explainability techniques. While multiple explainable AI (XAI) methods have been proposed for GNNs, there is currently a lack of systematic benchmarking in the context of clinically relevant ECG graph models. As a result, it remains unclear which explainability techniques provide the most reliable, clinically meaningful, and stable explanations for ECG-based MI classification.

Goal

The main objective of this thesis is to design and conduct a structured benchmark of multiple XAI methods within the xGNN4MI framework for graph-based ECG classification. Specifically, the thesis will:

Integrate and apply multiple state-of-the-art XAI methods for GNNs within the existing xGNN4MI pipeline
Benchmark XAI methods with respect to explanation quality, stability, and faithfulness
Evaluate the clinical plausibility of explanations
Analyze trade-offs between computational cost, interpretability, and explanation consistency
Provide practical recommendations on suitable XAI techniques for explainable ECG-based GNN models

References

[1] https://doi.org/10.1109/JBHI.2023.3327025
[2] https://doi.org/10.21203/rs.3.rs-7721630/v1

Contact

miriamcindy.maurer@med.uni-goettingen.de
anne-christin.hauschild@uni-giessen.de

Extending xGNN4MI for Multi-Labele Graph-based ECG Classification

Offered as:

Masterthesis

Intro

Cardiovascular diseases (CVDs) often present with overlapping pathological patterns, making multi-label diagnosis a common and clinically relevant scenario in electrocardiography (ECG) analysis. In contrast to single-label classification, where each ECG recording is assigned exactly one diagnostic class, real-world ECGs may exhibit multiple concurrent conditions such as different myocardial infarction (MI) localizations or coexisting abnormalities.

Graph neural networks (GNNs) have shown strong performance in ECG analysis by modeling ECG signals as graphs that capture both temporal and inter-lead dependencies. The xGNN4MI framework introduces an explainable graph-based approach for MI classification, combining high-performing GNN architectures with post-hoc explainability methods to support clinical interpretation.

However, the current xGNN4MI framework is designed for single-label classification and does not explicitly address multi-label diagnostic settings. Extending xGNN4MI to support multi-label classification would significantly increase its clinical relevance and allow for more realistic modeling of complex cardiac conditions. Furthermore, multi-label learning introduces additional challenges for explainability, as explanations must reflect multiple simultaneous predictions.

Goal

The main objective of this thesis is to adapt and extend the xGNN4MI framework for multi-label graph-based ECG classification. Specifically, the thesis will:

Extend the xGNN4MI pipeline to support multi-label classification tasks
Adapt loss functions, output representations, and evaluation metrics for multi-label learning
Train and evaluate multi-label GNN models on ECG datasets with overlapping diagnostic labels
Analyze how multi-label predictions affect graph-based explanations
Investigate whether explanations remain stable and clinically meaningful across multiple predicted labels
Compare single-label and multi-label setups in terms of performance, interpretability, and clinical plausibility

References

[1] https://doi.org/10.1109/JBHI.2023.3327025
[2] https://doi.org/10.21203/rs.3.rs-7721630/v1

Contact

miriamcindy.maurer@med.uni-goettingen.de
anne-christin.hauschild@uni-giessen.de

Web-Based Clinical Demo Platform for Federated X-ray Analysis

Offered as:

Internship/Bachelorthesis

Intro

Artificial intelligence methods are increasingly used in medical imaging, but their outputs are often difficult to explore in a transparent and interactive way. In many real-world healthcare settings, imaging data and models are distributed across institutions, which makes federated learning (FL) a relevant framework for privacy-preserving collaborative AI. At the same time, clinicians and researchers need user-friendly tools that allow them to upload medical images, inspect predictions, and understand how different data sources or client models contribute to the final result.

This project focuses on building a web-based demonstration platform for federated chest X-ray analysis. The platform should allow users (e.g., doctors or researchers) to upload an X-ray image, run preprocessing and model inference in the backend, and display prediction results in an interpretable format. In addition to the global prediction, the interface should show client-wise predictions (or contributions) and support interactive adjustment of client weights, enabling users to explore how the aggregated FL prediction changes when certain clients are considered more relevant.

Goal

This internship aims to develop a functional prototype of a clinical-style web platform for federated X-ray prediction by implementing an image upload and preprocessing workflow, connecting a backend inference service that returns global and client-wise prediction results, and creating an interactive frontend interface where users can inspect a result table and manually adjust client aggregation weights to observe how the final prediction changes. As an optional creative extension, the system may include an API-based text summary module (e.g., OpenAI API) that generates a short interpretation of the prediction table for the user.

References

[1] https://arxiv.org/abs/1602.05629
[2] https://react.dev/
[3] https://fastapi.tiangolo.com/

Contact

maryam.moradpour@med.uni-goettingen.de
anne-christin.hauschild@uni-giessen.de

Federated Patch-Based Learning for Histopathology Image Classificaton

Offered as:

Masterthesis

Intro

Whole Slide Images (WSI) in histopathology are extremely large (gigapixel scale) and are commonly processed by splitting the slide into smaller image patches. In clinical practice and multi-center research, WSI data are distributed across institutions and cannot be shared easily due to legal and governance constraints. Federated Learning (FL) enables collaborative model training without exchanging raw data by aggregating locally trained model updates. However, patch-based pipelines introduce additional challenges: different staining protocols, scanner characteristics, tissue preparation, and site-specific sampling lead to domain shifts and heterogeneous patch distributions across sites. These factors can degrade performance and limit generalization. This thesis focuses on implementing a practical federated patch-based histopathology pipeline using the Camelyon dataset (lymph node metastasis detection) or a similar public WSI dataset, and evaluating how data heterogeneity influences model performance.

Goal

This thesis aims to implement and benchmark a federated patch-based pipeline for histopathology classification by setting up a patch extraction and baseline training workflow on Camelyon, implementing a federated learning baseline (e.g., FedAvg) and comparing it to centralized training, and evaluating robustness under heterogeneous “site” splits (non-IID clients) by reporting both overall performance and performance per client.

References

[1] https://camelyon17.grand-challenge.org/

[2] https://arxiv.org/abs/1602.05629

[3] https://arxiv.org/abs/1812.06127

[4] https://arxiv.org/abs/1910.06378

Contact

maryam.moradpour@med.uni-goettingen.de
anne-christin.hauschild@uni-giessen.de

Class-Wise Federated Dual Annealing Weighting for Multi-Class Medical Data

Offered as:

Masterthesis

Intro

Federated Learning (FL) enables collaborative model training across multiple institutions without sharing raw patient data, making it highly relevant for medical applications. However, in heterogeneous clinical settings, client data distributions often differ substantially, especially in multi-class or multi-label tasks where class prevalence and label quality vary across sites. Standard aggregation methods such as FedAvg use a single client-level weight and therefore treat each client contribution uniformly across all classes. This can be suboptimal when a client provides strong information for some classes but weak or biased information for others.

This thesis builds on the existing Federated Dual Annealing Weighting (FedDAW) approach, which adaptively computes client aggregation weights using a dual annealing strategy with fairness and robustness considerations. The project extends FedDAW toward class-wise aggregation, where client contributions are weighted differently for different classes. The goal is to improve performance and fairness in heterogeneous medical datasets, especially for multi-class or multi-label problems such as CheXpert.

Goal

This thesis aims to extend Federated Dual Annealing Weighting (FedDAW) with a class-wise aggregation mechanism for heterogeneous medical classification tasks. The student will implement a class-aware variant of FedDAW in which client contributions are weighted differently per class (initially focusing on the classifier head), evaluate the method on a multi-class or multi-label medical dataset such as CheXpert, and compare it against standard FedAvg and the original FedDAW in terms of overall performance, per-class performance, and robustness/fairness across clients.

References

[1] https://federated-learning.org/fl-aaai-2022/Papers/FL-AAAI-22_paper_43.pdf

[2] https://dl.acm.org/doi/abs/10.1145/3589335.3651514

[3] https://stanfordmlgroup.github.io/competitions/chexpert/

Contact

maryam.moradpour@med.uni-goettingen.de
anne-christin.hauschild@uni-giessen.de

Transformer architecture based encodings for medical data and diagnosis with deep learning

Offered as:

Masterthesis

Intro

Population-based health data is often characterized by a limited number of samples and a vast number of unknown (zero-valued) entries, a sparsity which poses a significant hurdle for the application of AI models (Lee et al. 2017). Moreover, hierarchical relations, such as those inherent in disease and symptom classification (e.g. ICD-10), are often ingored, hampering the capacities to capture nuanced relationships between medical entities in current ML models.

Goal

To address the challenges of data sparsity, heterogeneity and implicit hierarchies, This project aims to utilize word embedding algorithms like BERT (Bidirectional Encoder Representrations from Transformers) to encode distinctive medical identifiers such as ICD10 codes. BERT employs a transfomer architecture, a type of neural network that excels in capturing contextual relationships in sequential data (Devlin et al., 2019). As suggested by recent literature (Cho et al., 2023), we will tailor such word embedding algorithms such as BERT to facilitate the creation of a feature space-agnostic representation of EHR on large public clinical datasets such as eICU (Pollard et al., 2018) or MIMIC-III MIMIC-III (Johnson et al., 2016). Finally, you will evaluate whether the transformer based algorithms compete with traditional machine learning models in terms of predictive performance for clinical outcomes, such as treatment success or survival time.

Contact

Anne-Christin Hauschild : anne-christin.hauschild@uni-giessen.de
Jonas Harriehausen: jonas.harriehausen@uni-giessen.de

Quantifizierung von Messgenauigkeiten in der T1 Magnetresonanzbildgebung

Offered as:

Bachelorthesis

Intro

Mittels quantitativer Rekonstruktionsverfahren in der MRT Bildgebung können
Gewebeeigenschaften wie z.B. T1,T2,T2* dargestellt werden. Diese Parameter dienen in der
radiologischen Diagnostik als Indikatoren für verschiedene Krankheiten.
In diesem Projekt geht es im Speziellen um die Untersuchung sogenannter T1-Karten. Hierbei handelt es sich um ein Verfahren der quantitativen Bildgebung, welches in einem untersuchten Körper die unterschiedlichen gewebespezifischen T1-Zeiten in einer farbcodierten Karte abbilden kann.

Im Rahmen einer medizinischen Untersuchung macht man sich zu Nutze, dass die gewebespezifischen T1-Zeiten bei krankhaften Veränderungen innerhalb eines Organs variieren. So kann beispielsweise am Herzen nach Veränderungen gefahndet werden, die nach stattgehabtem Herzinfarkt auftreten. An der Prostata können die T1-Karten eingesetzt werden, um beispielsweise nach entzündlichen Alterationen zu suchen oder um ein Krankheitsstadium bei bestehendem Prostatakarzinom zu bestimmen.

Um die Variabilität der verwendeten quantitativen Rekonstruktionsverfahren zu
quantifizieren, werden in regelmäßigen Abständen MRT- Referenzmessungen eines
standardisierten Phantoms durchgeführt. Die Referenzmessungen werden dabei an
verschiedenen MRTs durchgeführt und alle rekonstruierten T1,T2,T2*-Karten werden anschließend automatisiert ausgewertet.

Ziel des Projektes

Im Rahmen der Bachelorarbeit sollen Algorithmen der
Mustererkennung aus der Computer Vision (z.B. [2]) in Python
weiterentwickelt werden, um die automatisierte Auswertung der
Zeitreihen zu ermöglichen. Da in der bildgebenden Diagnostik
Daten im DICOM- Format mittels PACS Datenbanken in der
klinischen IT- Infrastruktur bereitgestellt werden, werden die
Erstellung von PACS API-Queries Teil des Projektes (z.B. [3]). Da
sich die Bildqualität für verschiedene quantitative Mappings
T1,T2,T2*,… stark unterscheidet, soll schließlich die Performance
der Verfahren für den Einsatz evaluiert werden.

References

[1] Al-Bourini et. al. European Journal of Radiology (2023)
[2] docs.opencv.org/4.x/d4/dc6/tutorial_py_template_matching.html (abgerufen am 28.05.24)
[3] dicom.offis.de/en/dcmtk/ (abgerufen am 28.05.24)

Contact

Anne-Christin Hauschild : anne-christin.hauschild@uni-giessen.de
martin.heide@med.uni-goettingen.de

Vorlesungen

Wintersemester 2026/27

ML4Omics

_{Fotobearbeitung: JLU/Anna Sposato; unbearbeitetes Originalfoto "Studium": JLU/Katrina Friese; unbearbeitetes Originalfoto "Lehre": colourbox.de}