Genome-Scale Metabolic Model Reconstruction and Validation
Abstract
Microorganisms play a fundamental role in production of valuable products. Comprehending the complex metabolism of microorganisms is essential for enhancing the bioprocesses, leading to a profitable increase in both the quality and quantity of products. The construction of genome-scale metabolic models enables us to explore the metabolism of cells and their behaviors' under different environmental conditions.
This thesis presents a comprehensive exploration of genome-scale metabolic model for a recently sequenced microorganism, the thraustochytrid strain T18, with a particular focus on variation in carbon source (glucose/xylose) in the culture media and the investigation of the production of docosahexaenoic acid (DHA). The first genome-scale metabolic model for T18 was constructed with 2252 reactions and 1952 metabolites. The analysis of T18 revealed that the fundamental difference between growth on glucose and xylose was the utilization of cofactors such as NADPH and NADH. Furthermore, we identified 148 reactions essential for growth on xylose that were not required for growth on glucose. However, due to the broad incompleteness of T18 genomic data, approximately 40 percent dead-end metabolites were present in the model, and no amount of gap-filling was sufficient to simulate the production of fatty acids such as DHA.
Consequently, to identify the ideal tool for gap-filling in our model, a comprehensive and systematic assessment of available gap-filling tools was conducted. We developed our own straightforward evaluation framework, which enabled us to demonstrate that gap-filling is primarily model-dependent. Even with widely used tools like COBRApy, Meneco, and CarveMe, only about 50 percent of essential reactions removed/gaps across all 108 published models available on the BiGG database could be identified.
Furthermore, we developed and successfully implemented a novel ranking approach in the Python-based SBPRank package. SBPRank incorporates network topology properties, including betweenness and proximity, as well as phylogenetic information, similarity, to rank reactions for the gap-filling process. Our innovative trimming approach efficiently narrows down a large pool of reaction database to a smaller and more specific subset, significantly improving the speed of the gap-filling process and enhancing model validity. The results indicate that when 10-30 percent of reactions are missing from a model, searching the top 5 percent of the ranked universal pool can be sufficient for gap-filling. Moreover, in cases with 10-85 percent of missing reactions, exploring only the top 20 percent of the ranked universal pool can identify suitable reactions. This reduction in the size of the universal database enables more manageable simulation times while still achieving effective and accurate gap-filling results.