Open Topics

Here, you can find open topics for internships, Master’s and Bachelor’s theses, as well as research projects within our group. If you’re interested in any of these topics, please feel free to contact us to discuss your preferences in more detail. If you have your own research topic idea, please don’t hesitate to contact us to discuss and develop your concept further. You can start a research project in our group and develop it for your thesis in a direction aligned with your interests.

Improvements for MCC (Internship)

• Keywords: Metabolic modeling, Software Development, Curation

Overview
Long-term usability is a key attribute of good software. Project and code maintenance, however, can be quite challenging, especially in academia. Recently, Mostolizadeh et al. published the Mass-Charge Curation Tool. To the best of our knowledge, it is the first tool of its kind to automate curation of mass and charge balances in genome-scale metabolic models (GEMs). In the context of GEMs, this is a major advancement because annotating and curating such models is still a manual process. To enhance the user experience and create a codebase that is easy to maintain in the long term, we aim to improve MCC.

Objective
We identified several ways to improve the MCC code base. In addition to improving the code’s general performance, we plan to enhance its structure and replace some dependencies. We hope these changes will improve the user and developer experience.

Use uv as modern project management system
Replace pandas with polars under the hood
Implementation of a command line interface
Download of databases instead of API requests
General code improvements

Requirements
• Good knowledge of python is essential
•Basic knowledge about metabolic models
•Initial experience with libraries like polars or numpy are a plus

References:
• MCC
• Polars on GitHub
• UV on GitHub

Contact: lukas.beierle
________________________________________________________________________________

Assessment of the Functionality in the Reconstructed Genome-scale Metabolic
Models (Internship/ Thesis)

• Keywords: Python, metabolic modeling, MEMOTE, Pathways

Overview
The MEMOTE score serves as the standard for evaluating genome-scale metabolic models and is widely accepted within systems biology. However, this score exhibits certain limitations. Its primary components are stoichiometric consistency and model annotation. While the MEMOTE score reflects the degree of curation of a metabolic
model, it does not assess model functionality or the underlying metabolic processes. This project aims to introduce a set of complementary metrics designed to evaluate and compare metabolic models based on the metabolic functions they represent.

Objective
A set of complementary metrics has been defined to assess the underlying genome and metabolism represented in a metabolic model. The objective is to quantify the extent to which a model captures an organism’s metabolic potential. These metrics combine annotations from the reference genome or proteome with pathway databases. Several metrics can be computed directly from the model using established methods, including flux balance analysis and flux variance analysis. A detailed list of the metrics is available upon request. These metrics are intended to quantify the metabolic potential represented and facilitate comparison of the performance of multiple metabolic models.

Requirements
• A good knowledge of Python is mandatory.
• Initial experience with metabolic models in SBML format is a plus.
• Knowledge / experience with databases like KEGG/BRENDA/MetaCyc is also a plus.

References:
• MEMOTE on GitHub
• Cobrapy on GitHub

Contact: lukas.beierle

________________________________________________________________________________

Curation of gene annotation in metabolic models (Internship)

• Keywords: Metabolic models, Reference Genome

Overview
The annotation and curation of metabolic models require significant time and effort, particularly when annotating the model's genes with high accuracy. Recreating or reannotating most existing metabolic models is challenging because reference genomes, gene identifiers, and annotations change over time. This project aims to systematically review the current state of gene annotations in genome-scale metabolic models (GEMs) and assess the need for updates.

Objective
The project will calculate gene and annotation coverage metrics using a set of reference GEMs and their corresponding source genomes. These metrics will inform decisions on whether to update primary gene identifiers for specific GEMs. In addition to evaluating annotation quality, the project aims to incorporate missing annotations into the models using existing data. The resulting application will enable users to either populate missing annotation fields or update existing identifiers to the latest versions.

Requirements
• Good knowledge of Python is essential.
• Initial experience with metabolic models in SBML/cobrapy is a plus.
• Basic knowledge about genome file formats is also a plus.

Contact: lukas.beierle

________________________________________________________________________________

Antiviral peptide simulation against Epstein-Barr virus (Thesis)

• Keywords: Molecular dynamics simulation, Epstein-Barr virus, antiviral peptides

Overview
This project is planned as a collaboration between the research group and the group of Prof. Franz Cemič at the University of Applied Sciences (THM) in Giessen. The primary goal is to identify an antiviral peptide from literature or databases that demonstrates activity against the Epstein-Barr virus. Subsequently, the project will simulate the peptide’s attachment or binding to the viral protein surface.

Objective
The initial stage involves identifying an antiviral peptide with experimentally validated activity against Epstein-Barr virus (EBV) by searching relevant literature and specialized databases, such as AVPdb. The subsequent step is to select an appropriate simulation target, such as blocking or binding to viral surface proteins or inducing membrane lysis. The final stage consists of setting up and conducting the simulations.

Requirements
• A strong background in maths or physics is recommended.
• Knowledge of GROMACS or other MD-simulation software is a plus.
• Knowledge of the Linux command-line and Python is required.

Contact: lukas.beierle

________________________________________________________________________________

The impact of annotation quality on genome-scale metabolic reconstructions
(Internship / Thesis)

• Keywords: Metabolic modeling, Metabolic reconstruction, genome annotation

Overview
Annotated genomes are essential for constructing draft genome-scale metabolic models. This project aims to investigate how annotation quality and completeness affect the resulting draft reconstructions. To this end, we will create test cases based on wellannotated genomes used to create GEMs. These test cases will assess the impact of missing, incomplete, or incorrect annotations on reconstruction tools. The project will utilize the carveme and gapseq tools for draft reconstruction.

Objective
The initial step involves precisely defining the test cases and determining the methods for modifying genome annotations, such as manual or random alterations. The analysis may be extended to include the core and pan genomes of the selected organisms. The resulting draft reconstructions will be evaluated for functionality and completeness.

Requirements
• A good knowledge of Python is recommended.
• Any knowledge about genome annotation or metabolic models is a plus.

References:
• Carveme on GitHub
• Gapseq on GitHub

Contact: lukas.beierle

________________________________________________________________________________

Large-scale Literature and Text-mining pipeline (Internship (master) / Thesis)

• Keywords: Python, PubMed, Text-mining, Literature crawling

Overview
Text and literature mining is a flexible tool for knowledge discovery with numerous applications. Previously, a prototype workflow was developed to identify closely related publications from a set of references. The workflow utilized the titles, keywords, and abstracts of all publications available in the download section of the PubMed literature database. The application was trained on a dataset of reference publications using a support vector machine. Text, titles, and abstracts were embedded using a pretrained large language model (LLM). To enhance user experience and eliminate dependency on reference publications, the project aims to create a precomputed search index of all text documents. This central index will offer greater flexibility and can be updated with new publications as required. The next step is to identify an appropriate search framework or algorithm, such as vector databases or retrieval-augmented generation (RAG) frameworks that utilize large language models (LLMs) to query the search index.

Objective
The first step is to implement a small download mechanism that will allow us to update the search index based on PubMed database releases. Next, we will compare tools such as FAISS to identify an appropriate, user-friendly data structure for the search index. The third step is to implement the search function to identify related publications or extract specific information directly from the index. Future plans include extending the search function to support pre-selection of full-text PDFs, potentially using additional tools such as Docling.

Additionally, this pipeline will be integrated into the NBREATH-DB database to facilitate information updates and reduce manual workload.

Requirements
• A good knowledge of Python is recommended.
• Basic knowledge of LLMs, text mining, or vector databases is a plus.

References:
• FAISS on GitHub
• pubmed_parser on GitHub
• Docling on GitHub

Contact: lukas.beierle

________________________________________________________________________________

Architectures for deep learning based antimicrobial peptide generation (Internship/
Thesis)

• Keywords: neural networks, dense, convolutional, recurrent, embeddings

Overview
The demand for new antibiotic treatments or alternative medications remains unmet. In a recent work (see the preprint below), we compared different model types for their ability to generate antimicrobial peptides, a promising class of molecules for new antibiotics. In this project, we want to further compare the architectures of the models used for any clear preferences for sequence generation. Therefore, a systematic comparison of different neural network architectures for variational and Wasserstein autoencoders is planned.

Objective
We want to systematically compare several neural network architectures, which refers to the type of layers used: Dense, Convolutional, or Recurrent, for any preferences for sequence generation. In simple terms, we want to see whether there is a single architecture that performs best for a given autoencoder in sequence generation. Different sequence encodings can also be tested alongside the architectures.

Requirements
• Good knowledge of Python and the Linux command-line is essential
• Knowledge about workflow management systems like nextflow or snakemake is an advantage
• Knowledge about genome-scale metabolic models is also a plus

References:
• metaGEM on GitHub
• CarveMe on GitHub (metabolic model creation tool)
• Zorrilla, Francisco, et al. "metaGEM: reconstruction of genome scale metabolic models directly from metagenomes." Nucleic acids research 49.21 (2021): e126-e126. (It is for metagenomics)

Contact: lukas.beierle

________________________________________________________________________________

pymCADRE (Internship)

• Keywords: mCADRE, algorithms, optimization, Python packaging

Overview
In certain cases, analyzing the metabolism of a particular tissue or cell requires developing a context-specific metabolic model. This process is frequently referred to as ontextualization. In previous studies, members of our research group contributed to the development of pymCADRE, a contextualization algorithm for metabolic models. pymCADRE is essentially the Python implementation of the original mCADRE algorithm, which was written in MATLAB. In certain studies, mCADRE-based algorithms have demonstrated superior accuracy in contextualizing mammalian cells compared to analogous algorithms. The objective of this project is to modernize and refine pymCADRE's implementation, with the overarching goal of enhancing its performance, usability, and long-term maintainability.

Objective
A review of the current implementation of pymCADRE is necessary to assess its quality and performance. In addition to dependency updates, it is imperative to consider potential additions that could enhance overall runtime and computational performance. To enhance maintainability, it is necessary to integrate a contemporary Python packaging toolchain, such as UV, into the project. Furthermore, the implementation of supplementary evaluations with Pytest is currently under consideration, along with the creation of a comprehensive user and developer manual.

Requirements
• A very good knowledge of Python is mandatory.
• Initial experience with the following libraries is also an advantage: cobrapy and numpy (optional: numba and polars).
• A good knowledge of algorithms and mathematics is also required.
• Knowledge about Python packaging is also recommended.
• Initial knowledge with contextualization or metabolic models is a plus.

References:
• mCADRE publication
• pymCADRE publication
• pymCADRE GitHub
• UV tool on GitHub

Contact: lukas.beierle

________________________________________________________________________________

Metabolic tasks for model validation (Internship)

• Keywords: metabolic models, metabolic reactions, simulation

Overview
Metabolic models constitute a versatile framework for in silico studies of metabolic processes. Recent studies have demonstrated that incorporating specific metabolic tasks during model development and validation can yield beneficial outcomes. In general, these tasks refer to essential reactions that most cells and tissues must perform. Richelle et al. published a list of 210 of these tasks.

Objective
The objective of this project is to implement a Python script that evaluates a cobrapy model against the 210 metabolic tasks. It is imperative that the script be free of any external dependencies, except for cobrapy. The model's evaluation across all tasks should be summarized or visualized at the end.

Requirements
• Good knowledge of Python is essential.
• First experience with SBML models or cobrapy is a plus.

References:
• Troppo a tool implementing these tasks
• Gopalakrishnan et al
• Richelle et al

Contact: lukas.beierle

________________________________________________________________________________

Proof of concept: Namespace translation for metabolic models (Internship)

• Keywords: Metabolic models, BiGG, VMH, MetaNetX, data mining

Overview
The primary namespace of a metabolic model is derived from identifiers specific to the designated database used to create the model at hand. This may result in complications during subsequent tasks, including model evaluation and contextualization. The utilization of specific namespaces is a prerequisite for certain systems biology applications. Conversely, some systems biology applications are incompatible with the namespaces of different databases. Presently, no application is available to facilitate the translation of a model's primary namespace from one database to another. A critical concern is the potential for entities to be omitted during translation, as it is not guaranteed that all objects from one database will have an entry in the target database.

Objective
The objective of this project is to demonstrate the feasibility of translating one model’s primary namespace to another target namespace. If entities are absent after translation, it is necessary to cross-check them against multiple databases. The present investigation uses the Ensembl Biomart tool and the MetaNetX database as proxies for the most accurate translation.

Requirements
• Good knowledge of Python is essential.
• First experience with SBML models, cobrapy, or metabolic modeling is a plus.

References:
• BiGG database content
• VMH database content
• MetaNetX database content
• Biomart database
• Mergem a tool for partially translating namespaces

Contact: lukas.beierle

________________________________________________________________________________

Large-scale draft reconstructions of microbial communities (Internship / B.Sc Thesis)

• Keywords: Workflow, reconstruction, genome-scale metabolic models

Overview
A particular concern that arises in the context of genome-scale metabolic models is the potential unavailability of the code utilized for their creation. This is notable in studies that have developed multiple models, such as a series of draft reconstructions or a community model. Ensuring reproducibility is of great importance in scientific research, as it guarantees that, in the event that a created model is later adopted by other researchers, they will be able to recreate it with the most up-to-date annotations and references. Furthermore, the utilization of a reproducible pipeline during the model creation process proves to be of considerable benefit when conducting research on microbial communities. This approach ensures that all models are constructed employing the same tools and under identical conditions, thereby enhancing the reliability of the research outcomes. Another important aspect in terms of modeling a microbial community is that the creation of high-quality models for each member can take years (in the worst case). The objective of this workflow is to initiate the process with annotated genomes, aiming to generate a draft community model. This approach is designed to guarantee that each member has at least a draft model and uses a high level of automation.

Objective
The objective of this project is to develop a small workflow for the creation of multiple genome-scale metabolic models directly based on annotated genomes. A comparative analysis of the available tools for reconstruction and annotation is necessary, followed by their integration into the workflow. Our group maintains a list of a particular microbial community that can be used as a direct test case to reconstruct a set of models based on the genomes of the community.

Requirements
• Good knowledge of Python and the Linux command-line is essential
• Knowledge about workflow management systems like nextflow or snakemake is an advantage
• Knowledge about genome-scale metabolic models is also a plus

References:
• metaGEM on GitHub
• CarveMe on GitHub (metabolic model creation tool)
• Zorrilla, Francisco, et al. "metaGEM: reconstruction of genome scale metabolic models directly from metagenomes." Nucleic acids research 49.21 (2021): e126-e126. (It is for metagenomics)

Contact: lukas.beierle

________________________________________________________________________________

Quality reporting and visualization of metabolic models (Internship / B.Sc project)

• Keywords: Workflow, visualization, genome-scale metabolic models

Overview
Metabolic models are abstract representations of the complete metabolism of a microorganism, a specific tissue, or a cell. Such a model is typically comprised of interconnected reactions, metabolites, and genes. These models tend to be very complex, especially for eukaryotic organisms. Moreover, such models are not stored in a format that is easily accessible or manually readable by humans. This factor makes their analysis a challenging task. A detailed analysis is crucial for understanding metabolic models. However, there is a paucity of tools capable of generating comprehensive and readily comprehensible reports on metabolic models. This project has as its primary focus the exploration of diverse methodologies for visualizing the interconnected components of metabolic models, with the objective of enhancing analysis and making these models more accessible and user-friendly.

Objective
The objective of this project is to produce a comprehensive and accessible report for metabolic models. This encompasses the retrieval of metadata, the analysis of annotations relating to reactions and metabolites in the model, and the visualization of individual components (compartments, reactions, genes). A further consideration is the visualization of the biomass objective function, or more precisely, the objective function of the model, inclusive of all its reactants and products. The final step in the process involves compiling all figures, along with their respective descriptions, into a single PDF file.

Requirements
• Good knowledge of Python
• Knowledge of metabolic models is a plus.

References
Several tools for the analysis of metabolic models:
• refineGEMs on GitHub
• Memote on GitHub
Sample visualization of different reactions: CORDA algorithm on GitHub

Contact: lukas.beierle

_______________________________________________________________________________

Gene expression in genome-scale metabolic models (Internship / Thesis)

• Keywords: Genome-scale metabolic models, gene expression, modeling

Overview
A substantial proportion of genome-scale metabolic models focuses on prokaryotic organisms. The transition to models for eukaryotic organisms poses a significant challenge. For instance, the modeling of eukaryotic gene expression with a GEM is possible; however, it is imperative that each component interacting with the gene, such as enzymes, cofactors, or transcription factors, be endowed with its own reactions for definition. This results in a sizable collection of sub-reaction networks for expressing individual genes.

Objective
This project aims to develop a proof-of-concept model for the preliminary design of components associated with eukaryotic gene expression reactions. In this regard, genome-scale metabolic models of plants may serve as a valuable template. In the event of a favorable outcome, our objective is to incorporate the reactions of the eukaryotic gene expression into our model of the Epstein-Barr virus.

Requirements
• Basic knowledge of Python and the Linux command-line
• Knowledge about genome-scale metabolic models is an advantage
• No fear of literature research

References:
• Lynch, Michael, and Georgi K. Marinov. "The bioenergetic costs of a gene." Proceedings of the National Academy of Sciences 112.51 (2015): 15690-15695.
• Feist, Adam M., et al. "Reconstruction of biochemical networks in microorganisms." Nature Reviews Microbiology 7.2 (2009): 129-143.

Contact: lukas.beierle

________________________________________________________________________________

Updating / re-implementing popular systems biology software (Internship)

• Keywords: Systems biology, software development

Overview
Like other bioinformatics domains, systems biology relies on open-source software developed by different research groups or individuals. In many instances, the termination of software project funding results in the subsequent abandonment of these projects.

Objective
Most systems biology projects focused on genome-scale metabolic model reconstruction rely on one of these tools: BOFdat for creating biomass objective functions. Another one is a toll for creating nasal microbial community, NCMW. The goal is to completely rewrite one of these tools and add bugfixes and useful features that have accumulated in recent years without any development activity. The precise implementation details and objectives are to be delineated with the supervisor at the commencement of the project.

Requirements
• Good knowledge of Python is essential
• Knowledge about the Linux command line and software development is a plus
• Knowledge about the tools mentioned is a plus

References:
• BOFdat on GitHub
• Lachance, Jean-Christophe, et al. "BOFdat: Generating biomass objective functions for genome-scale metabolic models from experimental data." PLoS computational biology 15.4 (2019): e1006971.
• NCMW on GitHub

Contact: Reihaneh.Mostolizadeh

___________________________________________________________________________

Kmer analysis in Python (Internship)

• Keywords: Peptides, Kmers, visualization, statistics

Overview
Kmers have been demonstrated to be useful tools for various sequence analysis applications in bioinformatics. This process entails multiple methodologies, including counting, comparison, visualization, and statistical analysis.

Objective
This project aims to augment a Kmer counter, written in Python, with additional functionality. In addition to the counting function, the software should be capable of simultaneously comparing multiple groups of peptides. A range of visualization and statistical assessments should be employed to conduct a comprehensive comparative analysis.

Requirements
• Basic knowledge of Python
• Experience with the following Python libraries is a plus:
‣ Matplotlib, Seaborn, Bokeh
‣ Polars, scipy

Contact: lukas.beierle

________________________________________________________________________________

Peptide clustering benchmark (Internship / B.Sc Thesis)

• Keywords: Clustering, Embeddings, Benchmark, UMAP, t-SNE

Overview
Clustering peptides or proteins is a common task in bioinformatics. Since computers are not inherently capable of directly comprehending sequences, a range of techniques has been devised to facilitate the encoding of these sequences in a manner that is more amenable to digital processing. However, the selection of an appropriate encoding technique remains a crucial challenge, given the multifaceted nature of these tasks.

Objective
The objective is to establish a set of general criteria for evaluating the efficacy of peptide clustering methodologies employing manifold learning techniques such as UMAP and tSNE. A comparison of the various methods for encoding peptides is essential for a comprehensive evaluation. This evaluation should be based on a set of reference peptides and the established criteria. The following encoding methods were selected for further analysis: peptide property-based, pseudo amino acid composition, and embeddings based on neural networks.

Requirements
• Good knowledge of Python and the Linux command-line is essential
• Knowledge about manifold learning algorithms like UMAP or t-SNE are a plus

References:
• OpenTNSE on GitHub
• umap-learn on GitHub

Contact: lukas.beierle

________________________________________________________________________________

Re-implementation of the Wasserstein Auto-Encoder in Keras 3 (Internship)

• Keywords: Auto-encoders, deep learning, generative models

Overview
Generative models, such as auto-encoders, have found widespread application in tasks related to generating peptides or molecules in the context of drug discovery. The Wasserstein auto-encoder (WAE) represents a sophisticated variation of the conventional variational auto-encoder, yet it frequently remains unnoticed due to the complexity of its underlying theory.

Objective
This project aims to re-implement an existing WAE in the most recent version of the deep learning framework Keras. Moreover, implementing additional kernel functions, in addition to the current ones, is necessary.

Requirements
• Good knowledge of python is essential
• Knowledge of Keras and TensorFlow is a plus
• No fear of higher mathematics (Kernel functions)

Reference:
• Website of the Keras framework
• Tolstikhin, Ilya, et al. "Wasserstein auto-encoders." arXiv preprint arXiv:1711.01558 (2017).

Contact: lukas.beierle

________________________________________________________________________________

Large-scale genome analysis of Epstein-Barr viruses (EBVs)

Overview
The Epstein-Barr virus (EBV) is a common virus that causes infectious mononucleosis,
often referred to as “mono” or the “kissing disease.” It is a member of the herpes virus
family. It is most well-known for its association with infectious mononucleosis, a condition
characterized by fever, sore throat, swollen lymph nodes, and extreme fatigue. Currently,
there are no approved antivirals for this virus (1). To better understand EBV, the idea is to
conduct a large-scale genomic analysis of all available viral genomes.

Objective
Implementation of an extensible and high-throughput workflow for comparing all available
EBV genomes to related viral genomes. With a focus on viral strains and the core
differences from the closest related groups of viruses.
Requirements
• Good knowledge of genomics.
‣ Focus on viruses is a plus.
• Experience with workflow management systems like Nextflow or Snakemake.
• Good knowledge of Python.

References
1. Young, Lawrence S., and Alan B. Rickinson. "Epstein–Barr virus: 40 years on." Nature Reviews
Cancer 4.10 (2004): 757-768.

Contact: lukas.beierle

________________________________________________________________________________

Energetic parameters for genome-scale metabolic models of DNA viruses

Overview
Given the persistent threat posed by viral outbreaks, a comprehensive understanding of
their mechanisms is imperative for developing efficacious countermeasures.
Genome-scale metabolic models (GEMs) have emerged as a powerful tool for simulating
viral replication, particularly in the case of RNA viruses. However, a critical gap remains in
the lack of comprehensive GEMs for DNA viruses. We want to contribute to mitigating this
discrepancy by developing and refining GEMs for DNA viruses, thereby expanding
the applicability of these models to a more extensive range of pathogens. To simulate the
replication of the virus with GEM, information about metabolic reactions and metabolites
(and precursors) is crucial.

Objective
The goal is to create a foundation model for DNA viruses based on current literature and
existing GEMs for eukaryotes like plants. This involves some literature research and work
with existing GEMs.
Based on that, the following steps should be achieved:
• Identify the energetic requirements (ATP) for reactions involved in RNA synthesis
• Add missing reactions and metabolites to a base model
• Research about the base level of metabolites in a human cell
‣ Or a method to identify/estimate the metabolite level

Requirements
• Good knowledge of Python.
• Background with viral infection is a plus.
• Knowledge about GEMs, SBML, and COBRAPy is also a plus.

References
1. Morens, David M., Peter Daszak, and Jeffery K. Taubenberger. "Escaping Pandora’s box—
another novel coronavirus." New England Journal of Medicine 382.14 (2020): 1293-1295.
2. Aller, Sean, et al. "Integrated human-virus metabolic stoichiometric modelling predicts hostbased antiviral targets against Chikungunya, Dengue and Zika viruses." Journal of The Royal
Society Interface 15.146 (2018): 20180125.

Contact: lukas.beierle

Improvements for MCC (Internship)• Keywords: Metabolic modeling, Software Development, Curation

ObjectiveWe identified several ways to improve the MCC code base. In addition to improving the code’s general performance, we plan to enhance its structure and replace some dependencies. We hope these changes will improve the user and developer experience.

Requirements• Good knowledge of python is essential•Basic knowledge about metabolic models•Initial experience with libraries like polars or numpy are a plus

References:• MCC • Polars on GitHub• UV on GitHub

Contact: lukas.beierle________________________________________________________________________________

Assessment of the Functionality in the Reconstructed Genome-scale MetabolicModels (Internship/ Thesis)• Keywords: Python, metabolic modeling, MEMOTE, Pathways

Requirements• A good knowledge of Python is mandatory.• Initial experience with metabolic models in SBML format is a plus.• Knowledge / experience with databases like KEGG/BRENDA/MetaCyc is also a plus.

References:• MEMOTE on GitHub• Cobrapy on GitHub

Contact: lukas.beierle________________________________________________________________________________

Curation of gene annotation in metabolic models (Internship)• Keywords: Metabolic models, Reference Genome

Requirements• Good knowledge of Python is essential.• Initial experience with metabolic models in SBML/cobrapy is a plus.• Basic knowledge about genome file formats is also a plus.

Contact: lukas.beierle________________________________________________________________________________

Antiviral peptide simulation against Epstein-Barr virus (Thesis)• Keywords: Molecular dynamics simulation, Epstein-Barr virus, antiviral peptides

Requirements• A strong background in maths or physics is recommended.• Knowledge of GROMACS or other MD-simulation software is a plus.• Knowledge of the Linux command-line and Python is required.

Contact: lukas.beierle________________________________________________________________________________

The impact of annotation quality on genome-scale metabolic reconstructions(Internship / Thesis)• Keywords: Metabolic modeling, Metabolic reconstruction, genome annotation

Requirements• A good knowledge of Python is recommended.• Any knowledge about genome annotation or metabolic models is a plus.

References:• Carveme on GitHub• Gapseq on GitHub

Contact: lukas.beierle________________________________________________________________________________

Large-scale Literature and Text-mining pipeline (Internship (master) / Thesis) • Keywords: Python, PubMed, Text-mining, Literature crawling

Requirements• A good knowledge of Python is recommended.• Basic knowledge of LLMs, text mining, or vector databases is a plus.

References:• FAISS on GitHub• pubmed_parser on GitHub• Docling on GitHub

Contact: lukas.beierle________________________________________________________________________________

Architectures for deep learning based antimicrobial peptide generation (Internship/Thesis)• Keywords: neural networks, dense, convolutional, recurrent, embeddings

Requirements• Good knowledge of Python and the Linux command-line is essential• Knowledge about workflow management systems like nextflow or snakemake is an advantage• Knowledge about genome-scale metabolic models is also a plus

References:• metaGEM on GitHub• CarveMe on GitHub (metabolic model creation tool)• Zorrilla, Francisco, et al. "metaGEM: reconstruction of genome scale metabolic models directly from metagenomes." Nucleic acids research 49.21 (2021): e126-e126. (It is for metagenomics)

Contact: lukas.beierle________________________________________________________________________________

pymCADRE (Internship)• Keywords: mCADRE, algorithms, optimization, Python packaging

References:• mCADRE publication• pymCADRE publication• pymCADRE GitHub• UV tool on GitHub

Contact: lukas.beierle________________________________________________________________________________

Metabolic tasks for model validation (Internship)• Keywords: metabolic models, metabolic reactions, simulation

Requirements• Good knowledge of Python is essential.• First experience with SBML models or cobrapy is a plus.

References:• Troppo a tool implementing these tasks• Gopalakrishnan et al• Richelle et al

Contact: lukas.beierle________________________________________________________________________________

Proof of concept: Namespace translation for metabolic models (Internship)• Keywords: Metabolic models, BiGG, VMH, MetaNetX, data mining

Requirements• Good knowledge of Python is essential.• First experience with SBML models, cobrapy, or metabolic modeling is a plus.

References:• BiGG database content• VMH database content• MetaNetX database content• Biomart database• Mergem a tool for partially translating namespaces

Contact: lukas.beierle________________________________________________________________________________

Large-scale draft reconstructions of microbial communities (Internship / B.Sc Thesis)• Keywords: Workflow, reconstruction, genome-scale metabolic models

Requirements• Good knowledge of Python and the Linux command-line is essential• Knowledge about workflow management systems like nextflow or snakemake is an advantage• Knowledge about genome-scale metabolic models is also a plus

References:• metaGEM on GitHub• CarveMe on GitHub (metabolic model creation tool)• Zorrilla, Francisco, et al. "metaGEM: reconstruction of genome scale metabolic models directly from metagenomes." Nucleic acids research 49.21 (2021): e126-e126. (It is for metagenomics)

Contact: lukas.beierle________________________________________________________________________________

Quality reporting and visualization of metabolic models (Internship / B.Sc project)• Keywords: Workflow, visualization, genome-scale metabolic models

Requirements• Good knowledge of Python• Knowledge of metabolic models is a plus.

ReferencesSeveral tools for the analysis of metabolic models:• refineGEMs on GitHub• Memote on GitHubSample visualization of different reactions: CORDA algorithm on GitHub

Contact: lukas.beierle_______________________________________________________________________________

Gene expression in genome-scale metabolic models (Internship / Thesis)• Keywords: Genome-scale metabolic models, gene expression, modeling

Requirements• Basic knowledge of Python and the Linux command-line• Knowledge about genome-scale metabolic models is an advantage• No fear of literature research

Contact: lukas.beierle________________________________________________________________________________

Updating / re-implementing popular systems biology software (Internship)• Keywords: Systems biology, software development

OverviewLike other bioinformatics domains, systems biology relies on open-source software developed by different research groups or individuals. In many instances, the termination of software project funding results in the subsequent abandonment of these projects.

Requirements• Good knowledge of Python is essential• Knowledge about the Linux command line and software development is a plus• Knowledge about the tools mentioned is a plus

References:• BOFdat on GitHub• Lachance, Jean-Christophe, et al. "BOFdat: Generating biomass objective functions for genome-scale metabolic models from experimental data." PLoS computational biology 15.4 (2019): e1006971.• NCMW on GitHub

Contact: Reihaneh.Mostolizadeh___________________________________________________________________________

Kmer analysis in Python (Internship)• Keywords: Peptides, Kmers, visualization, statistics

OverviewKmers have been demonstrated to be useful tools for various sequence analysis applications in bioinformatics. This process entails multiple methodologies, including counting, comparison, visualization, and statistical analysis.

Requirements• Basic knowledge of Python• Experience with the following Python libraries is a plus: ‣ Matplotlib, Seaborn, Bokeh ‣ Polars, scipy

Contact: lukas.beierle________________________________________________________________________________

Peptide clustering benchmark (Internship / B.Sc Thesis)• Keywords: Clustering, Embeddings, Benchmark, UMAP, t-SNE

Requirements• Good knowledge of Python and the Linux command-line is essential• Knowledge about manifold learning algorithms like UMAP or t-SNE are a plus

References:• OpenTNSE on GitHub• umap-learn on GitHub

Contact: lukas.beierle________________________________________________________________________________

Re-implementation of the Wasserstein Auto-Encoder in Keras 3 (Internship)• Keywords: Auto-encoders, deep learning, generative models

ObjectiveThis project aims to re-implement an existing WAE in the most recent version of the deep learning framework Keras. Moreover, implementing additional kernel functions, in addition to the current ones, is necessary.

Requirements• Good knowledge of python is essential• Knowledge of Keras and TensorFlow is a plus• No fear of higher mathematics (Kernel functions)

Reference:• Website of the Keras framework• Tolstikhin, Ilya, et al. "Wasserstein auto-encoders." arXiv preprint arXiv:1711.01558 (2017).

Contact: lukas.beierle________________________________________________________________________________

Large-scale genome analysis of Epstein-Barr viruses (EBVs)

References1. Young, Lawrence S., and Alan B. Rickinson. "Epstein–Barr virus: 40 years on." Nature ReviewsCancer 4.10 (2004): 757-768.Contact: lukas.beierle________________________________________________________________________________

Energetic parameters for genome-scale metabolic models of DNA viruses

Requirements• Good knowledge of Python.• Background with viral infection is a plus.• Knowledge about GEMs, SBML, and COBRAPy is also a plus.

Improvements for MCC (Internship)

• Keywords: Metabolic modeling, Software Development, Curation

Objective
We identified several ways to improve the MCC code base. In addition to improving the code’s general performance, we plan to enhance its structure and replace some dependencies. We hope these changes will improve the user and developer experience.

Requirements
• Good knowledge of python is essential
•Basic knowledge about metabolic models
•Initial experience with libraries like polars or numpy are a plus

References:
• MCC
• Polars on GitHub
• UV on GitHub

Contact: lukas.beierle
________________________________________________________________________________

Assessment of the Functionality in the Reconstructed Genome-scale Metabolic
Models (Internship/ Thesis)

• Keywords: Python, metabolic modeling, MEMOTE, Pathways

Requirements
• A good knowledge of Python is mandatory.
• Initial experience with metabolic models in SBML format is a plus.
• Knowledge / experience with databases like KEGG/BRENDA/MetaCyc is also a plus.

References:
• MEMOTE on GitHub
• Cobrapy on GitHub

Contact: lukas.beierle

________________________________________________________________________________

Curation of gene annotation in metabolic models (Internship)

• Keywords: Metabolic models, Reference Genome

Requirements
• Good knowledge of Python is essential.
• Initial experience with metabolic models in SBML/cobrapy is a plus.
• Basic knowledge about genome file formats is also a plus.

Contact: lukas.beierle

________________________________________________________________________________

Antiviral peptide simulation against Epstein-Barr virus (Thesis)

• Keywords: Molecular dynamics simulation, Epstein-Barr virus, antiviral peptides

Requirements
• A strong background in maths or physics is recommended.
• Knowledge of GROMACS or other MD-simulation software is a plus.
• Knowledge of the Linux command-line and Python is required.

Contact: lukas.beierle

________________________________________________________________________________

The impact of annotation quality on genome-scale metabolic reconstructions
(Internship / Thesis)

• Keywords: Metabolic modeling, Metabolic reconstruction, genome annotation

Requirements
• A good knowledge of Python is recommended.
• Any knowledge about genome annotation or metabolic models is a plus.

References:
• Carveme on GitHub
• Gapseq on GitHub

Contact: lukas.beierle

________________________________________________________________________________

Large-scale Literature and Text-mining pipeline (Internship (master) / Thesis)

• Keywords: Python, PubMed, Text-mining, Literature crawling

Requirements
• A good knowledge of Python is recommended.
• Basic knowledge of LLMs, text mining, or vector databases is a plus.

References:
• FAISS on GitHub
• pubmed_parser on GitHub
• Docling on GitHub

Contact: lukas.beierle

________________________________________________________________________________

Architectures for deep learning based antimicrobial peptide generation (Internship/
Thesis)

• Keywords: neural networks, dense, convolutional, recurrent, embeddings

Requirements
• Good knowledge of Python and the Linux command-line is essential
• Knowledge about workflow management systems like nextflow or snakemake is an advantage
• Knowledge about genome-scale metabolic models is also a plus

References:
• metaGEM on GitHub
• CarveMe on GitHub (metabolic model creation tool)
• Zorrilla, Francisco, et al. "metaGEM: reconstruction of genome scale metabolic models directly from metagenomes." Nucleic acids research 49.21 (2021): e126-e126. (It is for metagenomics)

Contact: lukas.beierle

________________________________________________________________________________

pymCADRE (Internship)

• Keywords: mCADRE, algorithms, optimization, Python packaging

References:
• mCADRE publication
• pymCADRE publication
• pymCADRE GitHub
• UV tool on GitHub

Contact: lukas.beierle

________________________________________________________________________________

Metabolic tasks for model validation (Internship)

• Keywords: metabolic models, metabolic reactions, simulation

Requirements
• Good knowledge of Python is essential.
• First experience with SBML models or cobrapy is a plus.

References:
• Troppo a tool implementing these tasks
• Gopalakrishnan et al
• Richelle et al

Contact: lukas.beierle

________________________________________________________________________________

Proof of concept: Namespace translation for metabolic models (Internship)

• Keywords: Metabolic models, BiGG, VMH, MetaNetX, data mining

Requirements
• Good knowledge of Python is essential.
• First experience with SBML models, cobrapy, or metabolic modeling is a plus.

References:
• BiGG database content
• VMH database content
• MetaNetX database content
• Biomart database
• Mergem a tool for partially translating namespaces

Contact: lukas.beierle

________________________________________________________________________________

Large-scale draft reconstructions of microbial communities (Internship / B.Sc Thesis)

• Keywords: Workflow, reconstruction, genome-scale metabolic models

Requirements
• Good knowledge of Python and the Linux command-line is essential
• Knowledge about workflow management systems like nextflow or snakemake is an advantage
• Knowledge about genome-scale metabolic models is also a plus

References:
• metaGEM on GitHub
• CarveMe on GitHub (metabolic model creation tool)
• Zorrilla, Francisco, et al. "metaGEM: reconstruction of genome scale metabolic models directly from metagenomes." Nucleic acids research 49.21 (2021): e126-e126. (It is for metagenomics)

Contact: lukas.beierle

________________________________________________________________________________

Quality reporting and visualization of metabolic models (Internship / B.Sc project)

• Keywords: Workflow, visualization, genome-scale metabolic models

Requirements
• Good knowledge of Python
• Knowledge of metabolic models is a plus.

References
Several tools for the analysis of metabolic models:
• refineGEMs on GitHub
• Memote on GitHub
Sample visualization of different reactions: CORDA algorithm on GitHub

Contact: lukas.beierle

_______________________________________________________________________________

Gene expression in genome-scale metabolic models (Internship / Thesis)

• Keywords: Genome-scale metabolic models, gene expression, modeling

Requirements
• Basic knowledge of Python and the Linux command-line
• Knowledge about genome-scale metabolic models is an advantage
• No fear of literature research

Contact: lukas.beierle

________________________________________________________________________________

Updating / re-implementing popular systems biology software (Internship)

• Keywords: Systems biology, software development

Overview
Like other bioinformatics domains, systems biology relies on open-source software developed by different research groups or individuals. In many instances, the termination of software project funding results in the subsequent abandonment of these projects.

Requirements
• Good knowledge of Python is essential
• Knowledge about the Linux command line and software development is a plus
• Knowledge about the tools mentioned is a plus

References:
• BOFdat on GitHub
• Lachance, Jean-Christophe, et al. "BOFdat: Generating biomass objective functions for genome-scale metabolic models from experimental data." PLoS computational biology 15.4 (2019): e1006971.
• NCMW on GitHub

Contact: Reihaneh.Mostolizadeh

___________________________________________________________________________

Kmer analysis in Python (Internship)

• Keywords: Peptides, Kmers, visualization, statistics

Overview
Kmers have been demonstrated to be useful tools for various sequence analysis applications in bioinformatics. This process entails multiple methodologies, including counting, comparison, visualization, and statistical analysis.

Requirements
• Basic knowledge of Python
• Experience with the following Python libraries is a plus:
‣ Matplotlib, Seaborn, Bokeh
‣ Polars, scipy

Contact: lukas.beierle

________________________________________________________________________________

Peptide clustering benchmark (Internship / B.Sc Thesis)

• Keywords: Clustering, Embeddings, Benchmark, UMAP, t-SNE

Requirements
• Good knowledge of Python and the Linux command-line is essential
• Knowledge about manifold learning algorithms like UMAP or t-SNE are a plus

References:
• OpenTNSE on GitHub
• umap-learn on GitHub

Contact: lukas.beierle

________________________________________________________________________________

Re-implementation of the Wasserstein Auto-Encoder in Keras 3 (Internship)

• Keywords: Auto-encoders, deep learning, generative models

Objective
This project aims to re-implement an existing WAE in the most recent version of the deep learning framework Keras. Moreover, implementing additional kernel functions, in addition to the current ones, is necessary.

Requirements
• Good knowledge of python is essential
• Knowledge of Keras and TensorFlow is a plus
• No fear of higher mathematics (Kernel functions)

Reference:
• Website of the Keras framework
• Tolstikhin, Ilya, et al. "Wasserstein auto-encoders." arXiv preprint arXiv:1711.01558 (2017).

Contact: lukas.beierle

________________________________________________________________________________

References
1. Young, Lawrence S., and Alan B. Rickinson. "Epstein–Barr virus: 40 years on." Nature Reviews
Cancer 4.10 (2004): 757-768.

Contact: lukas.beierle

________________________________________________________________________________

Requirements
• Good knowledge of Python.
• Background with viral infection is a plus.
• Knowledge about GEMs, SBML, and COBRAPy is also a plus.