Tantárgy adatlapja
This course explores the intersection of genomics and machine learning through the lens of genomic language models. Students will learn about the latest advancements in bioinformatics that utilize natural language processing techniques to interpret and predict genomic sequences, their functionalities, and interactions.
Weekly topics:
- Introduction to genomics and machine learning
- Basics of natural language processing
- Data acquisition and processing in genomics
- Overview of genomic language models
- Practical session: Trying out different genomic language models
- Techniques for sequence modeling
- Training models on genomic data
- Benchmarking genomic language models
- Case studies: Applications of genomic language models
- Interpretability of models in genomics
- Ethical considerations in genomic research
- Advanced topics in genomic predictions
Selected literature:
• Goodfellow, Ian, et al. "Deep Learning." MIT Press, 2016, ISBN 9780262035613 (specific pages on neural networks).
• Lesk, Arthur M. "Introduction to Bioinformatics." Oxford University Press, 2019, ISBN 9780198794141 (relevant chapters on genomics).
• Ligeti, Balázs, et al. "ProkBERT family: genomic language models for microbiome applications." Front. Microbiol., 12 January 2024, sec. Evolutionary and Genomic Microbiology, vol. 14, 2023, https://doi.org/10.3389/fmicb.2023.1331233.
• Zhou, Zhihan, et al. "DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome." Nature Methods, 28 November 2024.
• Dalla-Torre, Hugo, et al. "Nucleotide Transformer: building and evaluating robust foundation models for human genomics." Nature Methods, 2024.
Required skills:
· Ability to apply machine learning techniques to biological data
· Understanding of genomic sequence analysis using computational models
· Skill in interpreting results from bioinformatic models for practical applications