Where do ensembles come from?

In the constraint-based reconstruction and analysis (COBRA) field, there are a variety of analyses and approaches that have alternative optimal solutions. Examples of this come up often when analyses involve integer programming, where variables in a problem are switched on/off to maximize or minimize an objective value. When integer programming is applied to fill in gaps in a genome-scale metabolic model (GEM) to fulfill a function, such as production of biomass, there are generally multiple sets of equally likely solutions to the problem (e.g. there is more than one unique set of reactions of the same size that enable growth).

Beyond examples of analyses where multiple solutions with the exact same objective value might exist, keeping track of multiple sub-optimal solutions might be valuable because a biologically-correct solution is almost never the exact same as the most likely numerical solution. Instead, we can hedge our bets by maintaining an ensemble of feasible solutions to a problem that all meet some minimum acceptable likelihood.

The primary aim of medusa is to make ensemble analyses more accessible to the COBRA field so that we can start accounting for this uncertainty to improve our predictions and improve our reconstructions more efficiently. The software is written and developed with usability as the top priority, secondary to performance and novelty. Thus, user feedback is essential–please do not hesitate to provide your thoughts or ask questions via an issue on the medusa github repository.