Introduction
Domesticated ruminants are a crucial source of high-quality proteins to meet human protein and nutrient requirements. They depend on the rumen microbiome to produce the primary source of energy, nutrients, and precursors for protein production. This unique microbiome features multi-kingdoms of remarkably diverse microbes, including bacteria and archaea as prokaryotes, protozoa and fungi as eukaryotes, and viruses. Bacteria are the most abundant and diverse, encompassing thousands of species (Creevey et al. Reference Creevey, Kelly and Henderson2014; Kim et al. Reference Kim, Morrison and Yu2011). Rumen archaea are primarily methanogens, while protozoa are nearly exclusively ciliates. Despite being less diverse and abundant, only 104 – 105 individuals per ml of rumen fluid, protozoa can match bacteria in terms of biomass (Andersen et al. Reference Andersen, Altshuler and Vera-ponce de Leon2023). Rumen fungi represent only 10 – 16% of total rRNA transcript abundance (Elekwachi et al. Reference Elekwachi, Wang and X.2017) and less than 20% of the rumen microbial biomass (Rezaeian et al. Reference Rezaeian, Beakes and Parker2004). Despite being the smallest, rumen viruses are diverse and abundant (Gilbert et al. Reference Gilbert, Townsend and Crew2020; Yan et al. Reference Yan, Pratama and Somasundaram2023). These microbes form a dynamic and finely-tuned ecosystem. Their populations and metabolism can shift in response to changes in diet, allowing ruminants to adapt to different nutritional regimes.
Previous studies have provided fundamental information on the capability of rumen microbes, primarily bacteria. As the most abundant microbes, bacteria play the most crucial role in the rumen functions, such as feed digestion, fermentation, and microbial protein synthesis. Archaea produce enteric CH4, a potent greenhouse gas that raises significant environmental concerns associated with ruminant production. Protozoa participate in feed digestion and fermentation, but as predators, they engulf microbial cells and degrade microbial protein, significantly contributing to the intraruminal recycling of microbial protein, a process primarily responsible for the lower nitrogen utilization efficiency in ruminants than in nonruminants. Fungi are not abundant but possess a unique ability and high activity to digest feed fiber (Bhagat et al. Reference Bhagat, Kumar and Kumari2023). Rumen viruses do not directly digest or ferment feed. Still, by lysing their hosts or providing auxiliary metabolic genes and other genes, they can profoundly impact the functions and metabolic activities of various rumen microbes, including those that form the core rumen microbiome, in both top-down and bottom-up manners (Yan et al. Reference Yan, Pratama and Somasundaram2023). The rumen microbes constitute an intricate ecosystem by interacting with each other, the diet, and the hosts, and this ecosystem is responsible for converting feed into energy, nutrients, and precursors that ruminants can utilize. Therefore, the rumen microbiome can profoundly affect feed efficiency, animal health, productivity, quality of products (meat, milk, and wool), and the environmental footprint of the ruminant industry. Understanding the diversity, composition, and functions of the rumen microbiome and its interaction with diet and host has been a long-term pursuit of research over the past century.
Despite considerable progress in understanding the rumen microbiome, knowledge gaps remain regarding the interactions of most rumen microbes with diet and host and their contributions to animal nutrition and productivity. First, the role and significance of specific microbial species in enhancing nutrient utilization and reducing CH4 emissions are not yet fully understood. Second, the crosstalk between different microbial taxa and their dynamic interactions with the diet and the host requires further investigation. Third, the heritability (h2) of rumen microbes, which reflects the influence of host genotypes on shaping the rumen microbiome and its functions (Martinez Boggio et al. Reference Martinez Boggio, Meynadier and Buitenhuis2022), needs to be explored more. Fourth, the microbiability (m2) of key animal production traits, calculated as the proportion of variance in a specific production trait explained by the rumen microbiome (Difford et al. Reference Difford, Plichta and Lovendahl2018), has only started to be assessed. Fifth, comprehensive investigations into the resilience of the rumen microbiome to environmental stressors, such as heat stress, are necessary to develop sustainable livestock management practices. Finally, the exploration of the rumen virome is in its initial stages, and its influence on the populations of rumen microbes or the overall rumen functions remains to be determined
Understanding the complex relationship between diet, rumen microbiome, host, and specific production traits presents some challenges. The rumen microbiota (or microbial) composition can considerably vary even among cohorts of the same breed fed the same diet, making it challenging to attribute different production traits to differences in the rumen microbiome. Experimental design and analyses, including sequence data processing and bioinformatic analyses, lack standardization, which makes it difficult to compare results across different studies. Additionally, the microbiome data generated from metataxonomics and metagenomics are sparse, high-dimensional, zero-inflated, and compositional, necessitating complicated statistical analyses. Furthermore, correlations can be observed among diets, the rumen microbiome, rumen functions, and production traits. However, the complex and dynamic nature of this diverse microbiome poses challenges in establishing unequivocal causal relationships. Additionally, most studies identify rumen microbes only at the genus level, but species, even strains, can vary significantly in their metabolism, activity, and contributions to overall rumen functions. Furthermore, from an analytical perspective, the data layers generated by individual meta-omics technologies can exhibit interactions with animal production traits. Therefore, the integration of various omics technologies and data analyses is crucial for a comprehensive understanding of the rumen microbiome and its relationship with diet and animal nutrition. This integrated multi-omics approach, combined with integrated analysis of the multiple layers of data, is referred to as “rumen microbiome nutriomics.”
Omics technologies for rumen microbiome nutriomics
Since the early to mid-2000s, omics have become the primary technologies in microbiome research, including metataxonomics, metagenomics, metatranscriptomics, metaproteomics, and metabolomics coupled with bioinformatics. These meta-omics technologies have enabled comprehensive investigations of the rumen microbiome, leading to an unrepresented understanding and appreciation of its vast diversity, composition, functional capacity, and association with diets and key animal production traits, such as feed efficiency and methane emission. However, each of these omics technologies has its inherent limitations.
Metataxonomics
The 16S rRNA gene is among the few phylogenetic markers analyzed through high-throughput sequencing in early studies profiling microbiomes, including the rumen microbiome. Metataxonomics involves PCR amplification, high-throughput sequencing of phylogenetic markers, and bioinformatic analysis to taxonomically identify the microbes within microbiomes (Denman et al. Reference Denman, Morgavi and McSweeney2018). It is the first omics technology used in comprehensively profiling the rumen microbiome, greatly contributing to our understanding of its extensive diversity. Although it can help taxonomically identify most cellular rumen microbes, it has several limitations (Denman et al. Reference Denman, Morgavi and McSweeney2018). First, the preparation of amplicon sequencing libraries involved PCR, but PCR introduces biases (Silverman et al. Reference Silverman, Bloom and Jiang2021) stemming from the choice of phylogenetic regions targeted and the primers used (Laursen et al. Reference Laursen, Dalgaard and Bahl2017; Tremblay et al. Reference Tremblay, Singh and Fern2015; Yu and Morrison Reference Yu and Morrison2004). These biases can compromise differential abundance analysis (DAA), especially for the minor taxa, posing challenges in comparing DAA results among studies that use different marker regions and primers. Second, while metataxonomics can cost-effectively detect and identify most cellular microbes, the short amplicon sequences lack the necessary taxonomic resolution to support species-level classification (Johnson et al. Reference Johnson, Spakowicz and Hong2019). This limitation is particularly profound in the analysis of rumen ciliates due to the highly conserved nature of their 18S rRNA gene (Somasundaram and Yu Reference Somasundaram and Yu2024). Third, although comparing the marker sequences to databases with specific tools like PICRUSt2 (Douglas et al. Reference Douglas, Maffei and Zaneveld2020) and CowPI (Wilkinson et al. Reference Wilkinson, Huws and Edwards2018) can help predict the functional capability of the rumen microbiome, it does not provide direct evidence of its functional capacities. Additionally, the lack of species-level identification and the detection of numerous unclassified microbes constrain the depth of functional insights. Finally, metataxonomics cannot detect viruses or phages because they do not have conserved phylogenetic markers. Nevertheless, metataxonomics is still valuable in rumen microbiome studies. Sequencing alternative markers, such as the internal transcribed spacers and 23S or 28S rRNA genes, can help enhance the taxonomic resolution, particularly by sequencing the entire length of these markers. The full lengths of all the commonly used phylogenetic markers can be sequenced using synthetic long-read sequencing technologies, such as LoopSeq (Callahan et al. Reference Callahan, Grinevich and Thakur2021), or long-read sequencing technologies, such as MinION (https://nanoporetech.com/) and Sequell II (https://www.pacb.com/technology/hifi-sequencing/sequel-system/), enhancing the accuracy and resolution of taxonomic assignments (Abellan-Schneyder et al. Reference Abellan-Schneyder, Siebert and Hofmann2021). Furthermore, as demonstrated by Greengene2 (McDonald et al. Reference McDonald, Jiang and Balaban2023), amalgamating databases of phylogenetic markers and genomes can improve the utility of metataxonomics in analyzing the rumen microbiome.
Metagenomics
In brief, metagenomics encompasses shotgun sequencing and a series of bioinformatic analyses of DNA directly extractive from microbiome samples. This omics technology is commonly used to unveil the taxonomic diversity and functional capacities of microbiomes, and it has proven to be one of the most powerful omics technologies in rumen microbiome research (e.g., [Amin et al. Reference Amin, Zhang and Zhang2022]). Through taxonomic assignments of metagenomic sequences, contigs, or metagenome-assembled genomes (MAGs), metagenomics can potentially identify all microbes, including viruses, providing insights into the overall diversity, composition, and structure. For sequence-based taxonomic assignments, several bioinformatics programs are available, such as MetaPhlAn2 (Truong et al. Reference Truong, Franzosa and Tickle2015), Kraken2 (Lu and Salzberg Reference Lu and Salzberg2020), mOTUs2 (Milanese et al. Reference Milanese, Mende and Paoli2019), and Kaiju (Menzel et al. Reference Menzel, Ng and Krogh2016). Contig-based taxonomic assignment enhances classification accuracy, and several bioinformatics tools are available for this purpose, including DIAMOND (Buchfink et al. Reference Buchfink, Xie and Huson2015), CAT (von Meijenfeldt et al. Reference von Meijenfeldt, Arkhipova and Cambuy2019), and MetaBinG2 (Mirdita et al. Reference Mirdita, Steinegger and Breitwieser2021). MAG-based taxonomic assignment further enhances taxonomic classification. Species-level taxonomic assignment can be achieved with GTDB-Tk v2 (Chaumeil et al. Reference Chaumeil, Mussig and Hugenholtz2022) and its genome database and taxonomy (Parks et al. Reference Parks, Chuvochina and Waite2018). As sequencing costs decrease, metagenomic sequencing depth increases, increasing the number of high-quality MAGs (>90% complete with <5% contamination) and thus filling some of the gaps in genome databases. Improvements in reference genome databases will facilitate profile microdiversity and population dynamics at species, even strain levels. Notably, strain-level profiling of metagenomes has been demonstrated using inStrain (Olm et al. Reference Olm, Crits-Christoph and Bouma-Gregson2021) in a recent study on the interactions between rumen microbiome and virome (Yan and Yu Reference Yan and Yu2024). Therefore, genome-centric and genome-resolved metagenomics will further enhance taxonomic profiling of the rumen microbiome, particularly at the species and strain levels.
Metagenomics can uncover the functional potential of the entire rumen microbiome, along with discovering novel genes, enzymes, and pathways. Indeed, early metagenomic studies revealed an incredible repertoire of various genes, shining new light on the functional diversity and potential of the rumen microbiome (Brulc et al. Reference Brulc, Yeoman and Wilson2011). However, metagenomics cannot distinguish genes from dead versus viable microbes. Additionally, the “bag-of-genes” generated through gene-centric metagenomics provides scant insight into genomic architecture. This approach also has limited capacity to unveil new microbial species or reconstruct the metabolic networks (MN) of individual microbes (Frioux et al. Reference Frioux, Singh and Korcsmaros2020). Genome-centric and genome-resolved metagenomics can address some of the limitations by constructing MAGs and also provide opportunities to estimate the growth rates of individual prokaryotes represented by MAGs (Joseph et al. Reference Joseph, Chlenski and Litman2022; Korem et al. Reference Korem, Zeevi and Suez2015), illuminating the population dynamics of individual microbes within rumen microbiomes (Zhang et al. Reference Zhang, Lin and Yu2022). Nevertheless, genome-centric and genome-resolved metagenomics face several challenges. First, metagenomic sequences are often short (<300 bp), making it challenging and computing-demanding to assemble MAGs, particularly for rumen microbes at low abundance, including ciliates and fungi, which also have large complex genomes. Metagenomic sequences from multiple samples of the same individual or treatment can be co-assembled and binned to help recover MAGs of low abundance species, but this approach leads to poor results when the samples have high intraspecies diversity, and it is computationally consuming (Delgado and Andersson Reference Delgado and Andersson2022). Second, genome reconstruction also has biases (Nelson et al. Reference Nelson, Tully and Mobberley2020), leading to over- or under-representation of specific microbial taxa, affecting the accuracy of metagenomic analysis. Third, assigning functions to some genes in metagenomic datasets can be challenging due to gaps in reference genome databases and many unknown or hypothetical genes. Indeed, about one-third of the protein-coding genes from bacterial genomes could not be functionally annotated (Bileschi et al. Reference Bileschi, Belanger and Bryant2022). Deep learning models emerge as an effective tool to enhance functional annotation (Bileschi et al. Reference Bileschi, Belanger and Bryant2022).
Ongoing research efforts are focusing on addressing the above challenges. Integrating short- and long-read sequencing technologies can improve sequence assembly, increasing high-quality MAGs. Developing and refining bioinformatics tools can enhance the quality of MAGs and streamline the metagenomics process. For example, using machine learning, CheckM2 improves MAG quality assessment (Chklovski et al. Reference Chklovski, Parks and Woodcroft2022). Further, expanding and refining reference databases can improve the accuracy of taxonomic classification and functional annotation (Stewart et al. Reference Stewart, Auffret and Warr2019; Xie et al. Reference Xie, Jin and H.2021; Yan and Yu Reference Yan and Yu2024). Integration of metagenomics, particularly genome-centric and genome-resolved metagenomics, with other omics technologies, such as metatranscriptomics and metaproteomics, along with the continued refinement of bioinformatics tools, can provide more comprehensive insights into the rumen microbiome and its complex interactions with diets, animals, and production traits. It should also be noted that genome-centric and genome-resolved metagenomic studies have predominantly focused on rumen bacteria and archaea, thereby neglecting rumen protozoa, fungi, and viruses. Leveraging on the recent bioinformatics tools specifically developed for viral sequence analyses, such as VirSorter2 (Guo et al. Reference Guo, Bolduc and Zayed2021), VIBRANT (Kieft et al. Reference Kieft, Zhou and Anantharaman2020), and CheckV (Nayfach et al. Reference Nayfach, Camargo and Schulz2021), several recent studies have successfully revealed that the rumen virome is highly diverse and can infect a wide range of rumen microbes, including the core rumen microbiome (Yan et al. Reference Yan, Pratama and Somasundaram2023), responds to diets (Anderson et al. Reference Anderson, Sullivan and Fernando2017), and associates with microbial diversification, community dynamic, and specific production traits (Yan and Yu Reference Yan and Yu2024). New bioinformatics tools capable of discerning eukaryotic signals amid metagenomic sequences, coupled with newly sequenced genomes of rumen protozoa and fungi, will significantly enhance the analysis of these rumen eukaryotic microbes within rumen metagenomic datasets.
Metatranscriptomics
Through sequencing and bioinformatic analysis of RNA, metatranscriptomics reveals actively expressed genes, collectively referred to as the transcriptome. rRNA is commonly removed before conducting RNA-Seq to enhance sequencing efficiency and allow for more precise sequencing of mRNA alongside non-coding RNA and small RNA. Hence, metatranscriptomics illuminates the ongoing metabolic and other biological processes within microbiomes. This omics technology has yielded valuable insights into how the rumen microbiome responds to dietary alterations or interfaces with specific rumen functionalities and production traits at the transcriptional level. Previous metatranscriptomic investigations have focused on genes exhibiting differential expressions between diets of feed additives (e.g., (Jize et al. Reference Jize, Zhuoga and Xiaoqing2022; Pitta et al. Reference Pitta, Indugu and Melgar2022)), animal productivities (e.g., (Park et al. Reference Park, Cersosimo and Radloff2022; Xue et al. Reference Xue, Xie and Zhong2022)), or breeds (e.g., (Li et al. Reference Li, Hitch and Chen2019; Zhang et al. Reference Zhang, F. and Chen2020)). Linking the expressed genes to the specific host microbes can be challenging with such a ene-centric metatranscriptomic approach. Furthermore, rumen metatranscriptomes have been analyzed for transcripts of prokaryotes. The eukaryotic transcripts and the genomes of RNA viruses should also be analyzed in future studies.
Genome-centric metatranscriptomics focuses on the analysis of transcriptional activity within microbiomes, potentially at the level of individual genomes. This approach employs RNA-Seq and compares transcript sequences to individual genomes or MAGs. Hence, it enables researchers to (i) associate transcripts with the expressing genomes or MAGs and (ii) reconstruct genome-scale MN or models for individual microbes. Such information facilitates a more precise evaluation of the contributions of those microbes to the critical metabolic processes, such as feed digestion and fermentation, protein synthesis, and CH4 emissions. Additionally, differential gene expression (DGE) and pathway enrichment analyses are crucial in revealing how microbial activities respond to variations in diets and interface with rumen functions and animal productivity. Furthermore, genome-centric metatranscriptomics facilitates identifying the rumen fungi that produce microRNA-like RNAs (no evidence is available indicating that bacteria, archaea, and protozoa produce microRNAs) and all cellular rumen microbes produce small RNAs. While genome-centric metatranscriptomics can potentially provide dynamic insights into rumen functions at the genome level and dynamics in the rumen microbiome, it faces several challenges with low-abundance transcripts. Gaps in reference genome databases and the presence of unknown or hypothetical genes further hinder the identification of some expressed genes (Shakya et al. Reference Shakya, Lo and Chain2019). Furthermore, genome-centric metatranscriptomics can be biased toward cultured microbes with well-annotated genomes. As sequencing costs decrease and reference genome databases expand, metagenome-centric metatranscriptomics is poised to surpass gene-centric metatranscriptomics.
Metaproteomics
Metaproteomics, the study of all the proteins expressed in a microbiome, the metaproteome, offers a snapshot of the expressed proteins therein. Unlike metagenomics or metatranscriptomics, it provides a “snapshot” of actively working proteins, revealing the actual metabolic landscape at the sampling time. Pathway enrichment analysis can help identify the pathways corresponding to the identified proteins, providing dynamic insights into the activities of a microbiome. This extends beyond the capabilities of metagenomics or metatranscriptomics, furnishing a more direct perspective on the actual functional processes and their connection to animal productivity (Andersen et al. Reference Andersen, Kunath and Hagen2021). Studies have used metaproteomics in investigating the metabolic influence of rumen protozoa within the rumen microbiome (Andersen et al. Reference Andersen, Altshuler and Vera-ponce de Leon2023) and its responses to dietary interventions (Trautmann et al. Reference Trautmann, Schleicher and Koch2023) and heat stress (Li et al. Reference Li, Zang and Zhao2021). Metaproteomics can also help identify biomarkers associated with specific microbial functions, microbiome dysbiosis, rumen functions, or production traits. However, metaproteomics can face several challenges, as demonstrated in other microbiomes, including the complexity and diversity of the rumen microbiome, limitations in detecting low-abundance proteins, and issues with identical peptides from homologous proteins (Heyer et al. Reference Heyer, Schallert and Zoun2017; Lohmann et al. Reference Lohmann, Schape and Haange2020; Miura and Okuda Reference Miura and Okuda2023). Finally, the lack of complete genomes and protein databases for many rumen microbes, particularly rumen fungi and protozoa, hinders precise annotation and taxonomic assignment, leaving some identified proteins with unknown origins. These challenges are further exacerbated by the presence of dietary proteins in the rumen. To fully harness the potential of genomic-centric metaproteomics for studying the rumen microbiome, comprehensive reference genome databases specific to this microbiome are essential. The Rumen Microbial Global Network or a similar international network can facilitate collaborative efforts to compile existing and future genomics data including MAGs. These databases, designed to minimize gaps in the representation of key rumen microbes, will enable genome-centric metaproteomics. Such an approach promises unprecedented insights into the roles of key rumen microbes and their impacts on various rumen functions and production traits. Ultimately, this information will empower efforts to optimize the rumen microbiome for improved animal health and productivity.
Metabolomics
Metabolomics leverages proton nuclear magnetic resonance (NMR)spectroscopy and gas or liquid chromatography coupled with mass spectrometry (GC-MS or LC-MS) or tandem MS (GC-MS/MS or LC-MS/MS) to separate and identify individual metabolites. Targeted metabolomics analyzes a predefined set of related metabolites, whereas untargeted metabolomics involves global metabolic profiling. Metabolomics enables the elucidation of the complex metabolic profiles within the rumen microbiome, offering valuable insights into its functional activities. Univariate and multivariate statistical analyses can help identify specific metabolites that differ between animals or treatments. Moreover, pathway enrichment analysis can help identify the metabolic pathways that are influenced or differentially expressed. Metabolomics has been used in examining how the rumen metabolic profiles respond to dietary shifts (Ali et al. Reference Ali, X. and Khan2023; Ren et al. Reference Ren, Zhaxi and Ciwang2023), dietary supplements (de Poppi et al. Reference de Poppi, Lazzari and Gomes2021; Li et al. Reference Li, Zeng and Wang2022a), stresses (Feng et al. Reference Feng, Zhang and Liu2022; Li et al. Reference Li, Mao and Zang2023), and health status (Eom et al. Reference Eom, Kim and Lee2021; Mu et al. Reference Mu, Qi and Zhang2022). Furthermore, metabolomics aids in the identification of rumen metabolites or pathway enrichment indicative of divergent rumen functions or production traits, including residual body weight gain (Idowu et al. Reference Idowu, Taiwo and Sidney2023), RFI (Liu et al. Reference Liu, Wu and Chen2022b), and efficiency of milk production in dairy cows (Xue et al. Reference Xue, Xie and Zhong2022).
Rumen metabolomics also faces several challenges. First, the rumen microbiome produces a myriad of metabolites at various concentrations, but only a relatively small number of them can be detected or identified. Second, identifying and annotating rumen microbiome metabolites face challenges because many metabolites lack known reference standards, leading to uncertainties in result interpretation. Third, the accurate assessment of the metabolic response of the rumen metabolome necessitates the quantification of metabolites, but the complex matrices of rumen samples may compromise the reliability of quantification. Fourth, several metabolomic databases like BMDB (www.bovinedb.ca), MetaboBank (https://metabo.ca), and MetaboLights (http://www.ebi.ac.uk/metabolights/), as well as pathway databases like KEGG (https://www.genome.jp/kegg/) and MetaCyc (https://metacyc.org/) can be used to map metabolites to their corresponding metabolic pathways. However, gaps in these databases constrain the reliable identification of metabolites and linking metabolites or metabolic pathways to the producers. The development of metabolomic databases specific to the rumen ecosystem and advancements in bioinformatics tools for metabolite annotation and pathway analysis will contribute to a more accurate and meaningful interpretation of rumen microbiome metabolomic data. Furthermore, integration with other omics technologies, such as genomics, genome-centric metagenomics, metatranscriptomics, and metaproteomics, is essential to further enhance the capability of metabolomic analysis of the rumen microbiome. As demonstrated in a recent study (Idowu et al. Reference Idowu, Taiwo and Sidney2023), integrating currently used LC-MS with other techniques, such as isotope labeling, can increase the sensitivity of metabolite detection.
Bioinformatics and databases
Bioinformatics is essential for data analysis in all meta-omics. The power of omics technologies depends on the capabilities of available bioinformatic tools in identifying and classifying microbial species, annotating sequences and proteins, predicting functional capabilities, and unveiling metabolic activities. Many bioinformatics algorithms and tools are available to analyze the omics data derived from various microbiomes. Most bioinformatics tools are initially developed for other microbiomes, and they are applied to omics investigations of the rumen microbiome. However, unlike other host-associated microbiomes, the rumen microbiome has diverse eukaryotic microbes (protozoa and fungi), which play significant roles in ruminant nutrition. These eukaryotic microbes are often overlooked and understudied due to the lack of appropriate bioinformatics tools. Bioinformatics algorithms employing machine learning are now available to analyze eukaryotes (Karlicki et al. Reference Karlicki, Antonowicz and Karnkowska2022; Levy Karin et al. Reference Levy Karin, Mirdita and Soding2020; West et al. Reference West, Probst and Grigoriev2018). A recent bioinformatics tool GutEuk, developed specifically for rumen eukaryotes, can markedly enhance the analysis of these eukaryotes (Yan et al., Reference Yan, Andersen and Pope2024). Machine learning-based bioinformatics algorithms have also been developed to extract the largely underexplored viral sequence data (Du et al. Reference Du, Fuhrman and Sun2023; Guo et al. Reference Guo, Bolduc and Zayed2021; Kieft et al. Reference Kieft, Adams and Salamzade2022; Nayfach et al. Reference Nayfach, Camargo and Schulz2021) and mobile genetic elements (Tang et al. Reference Tang, Shang and Ji2023). The advent of novel bioinformatics tools will greatly enhance comprehensive analyses of all domains and kingdoms within the rumen microbiome. It is envisaged that the near future will witness a substantial surge in data volumes capturing various facets of the rumen microbiome with unparalleled depth and resolution. Advancements in bioinformatics algorithms and tools, particularly those that can seamlessly integrate datasets from multi-omics sources, are needed to analyze this anticipated influx of diverse data effectively and adequately.
The experience from the preceding decades has shown that general-purpose genome databases have gaps, with inadequate representations of numerous microbial species. This deficiency becomes particularly evident when these databases are employed in rumen microbiome investigations. For example, a substantial portion of the biomass in the rumen is attributed to microbial eukaryotes, particularly protozoa (Andersen et al. Reference Andersen, Altshuler and Vera-ponce de Leon2023). However, the current databases have few genomes of rumen protozoa. Rumen viruses and fungi are also underrepresented in general-purpose databases. The recent bioinformatics tools tailored for viral sequence analysis have enabled the development of the first global comprehensive rumen virome database (Yan et al. Reference Yan, Pratama and Somasundaram2023). A genome database of rumen protozoa must be developed for multi-omics investigations into this important group of rumen predators. The recently sequenced 52 single-cell amplified genomes (SAGs) are a valuable initial resource (Li et al. Reference Li, Wang and Zhang2022b). Since zoospores of rumen fungi can be singularly picked, the single-cell genome sequencing approach used to sequence the SAGs of rumen protozoa may be used to sequence the genomes of rumen fungi.
Rumen microbiome nutriomics – connecting the rumen microbiome and nutrition
The intricate interplay among diet, the rumen microbiome, and ruminants establishes a dynamic nexus that forms the foundation for rumen functions, nutritional processes, and, ultimately, productivity. Investigating this nexus and identifying the rumen microbes or metabolic pathways that influence specific rumen functions or animal production traits has long been a focus of research. Through integrating multi-omics technologies and data analyses, rumen microbiome nutriomics can advance our comprehension of the roles played by rumen microbes in rumen functions and nutrition.
Rumen microbiome nutriomics through integrated omics and data analysis
Each omics technology has distinct capabilities and limitations. This recognition has led to the utilization of multiple omics in some recent studies, resulting in a more comprehensive characterization of the rumen microbiome (Liu et al. Reference Liu, Sha and W.2022a; Mu et al. Reference Mu, Qi and Zhang2022; Xu et al. Reference Xu, Liu and Sun2021). However, few studies have sufficiently integrated the analysis of the data derived from different omics technologies or established the links between the data and rumen functions or animal production traits. This challenge is attributed, in part, to the multiple layers of high-dimensional microbiome data generated by individual omics technologies (Pedersen et al. Reference Pedersen, Forslund and Gudmundsdottir2018). Several strategies can reduce data dimensionality. These include combining data normalization, binning of co-abundant features (genes or metabolites), integration with prior biological knowledge (Pedersen et al. Reference Pedersen, Forslund and Gudmundsdottir2018), and clustering MAGs into metagenomic species (Zhang et al. Reference Zhang, Wang and Liu2023b). Additionally, identifying modules of related microbiome features, such as modules of microbiome, gene expression, and metabolites, can contribute to a more cohesive analysis. Bioinformatic approaches are continually evolving to integrate data derived from multi-omics technologies. For instance, a recent study utilized weighted gene co-expression network analysis and structural equation modeling (SEM) to integrate metataxonomic, metagenomic, and metabolomic data, revealing informative connections from rumen microbes to metabolites and milk protein yield (Zhang et al. Reference Zhang, Wang and Liu2023b). Furthermore, combinatorial network and machine learning methods have demonstrated utility in identifying metagenomic and host genotypes potentially linked to CH4 emissions and feed efficiency in dairy cows (Cardinale and Kadarmideen Reference Cardinale and Kadarmideen2022). In line with these advancements, we propose an integrated genome-centric and genome-resolved multi-omics approach to holistically characterize all the rumen microbes (i.e., prokaryotes, eukaryotes, and viruses) and key aspects of the rumen microbiome and establish connections with diets, rumen functions, and animal phenotype and production traits (Fig. 1).
In brief, existing high-quality MAGs, such as the large sets of MAGs of prokaryotes reported recently (Andersen et al. Reference Andersen, Altshuler and Vera-ponce de Leon2023; Stewart et al. Reference Stewart, Auffret and Warr2019; Xie et al. Reference Xie, Jin and H.2021), viruses (Wu et al. Reference Wu, Gao and Sun2024; Yan and Yu Reference Yan and Yu2024), and genomes of the rumen microbiome such as those of the Hungate1000 project, ciliates (Li et al. Reference Li, Wang and Zhang2022b; Park et al. Reference Park, Wijeratne and Meulia2021), and anaerobic fungi (Brown et al. Reference Brown, Swift and Mondo2021; Haitjema et al. Reference Haitjema, Gilmore and Henske2017; Youssef et al. Reference Youssef, Couger and Struchtemeyer2013), along with high-quality MAGs generated from ongoing studies, are combined to develop a comprehensive genome database (rumen microbiome genome database, RMGD). The MAGs and genomes of prokaryotes are taxonomically annotated using the taxonomy implemented in GTDB, which supports species-level classification based on the phylogeny derived from a concatenated set of 120 single-copy marker proteins. The RMGD is used for taxonomic classification and functional annotation of metagenomic and metatranscriptomic data. The RMGD can also be used in classifying operational taxonomic units (OTUs) or amplificon sequence variants (ASVs) generated by metataxonomics, potentially at the species level, by sequence mapping. The existing 52 SAGs of rumen ciliates (Li et al. Reference Li, Wang and Zhang2022b) and the recent rumen virome database (RVD)(Yan et al. Reference Yan, Pratama and Somasundaram2023) can be expanded to support rumen virome analysis. However, as discussed above, concerted efforts are needed to sequence more rumen protozoan and fungal genomes to develop a comprehensive rumen eukaryotic genome database and genome-based taxonomy.
The AA sequences translated from all the open reading frames (ORFs)of the RMGD are then used to prepare a rumen microbiome proteome database (RMPD) to aid in metaproteomic investigations of the rumen microbiome. Metabolic nextworks are assembled from the pathways reconstructed from individual MAGs and genomes to assist in identifying the transcripts and metabolites detected through metatranscriptomic and metabolomic analyses, respectively. The multiple layers of omics data are analyzed in an integrated manner (Subramanian et al. Reference Subramanian, Verma and Kumar2020). While having not been used in rumen microbiome nutriomics studies, xMWAS (Uppal et al. Reference Uppal, Ma and Go2018) may be a valuable software for data integration, network visualization, clustering, and differential network analysis of data derived from two or more omics platforms. This integrated omics approach and data analysis will comprehensively characterize the rumen microbiome with respect to its many key features (Fig. 1). Furthermore, the integration of multiple omics data and analyses will enhance the accuracy of functional annotations. Such detailed data can be further interrogated in the context of diet, rumen functions, animal genotypes, and production traits.
Deciphering the interdependent labyrinth within the rumen ecosystem – Advancing toward establishing causality in the nutriomics of the rumen microbiome
The central goal of rumen microbiome nutriomics investigations is to delve into connections between various sets of data encompassing diets, rumen microbiome features identified through the omics technologies, rumen functions (i.e., feed digestion and fermentation characteristics), key production traits (e.g., feed digestibility, feed efficiency, growth, lactation performance, CH4 emissions, etc.), and response to stress (e.g., heat stress), and nutritional disorders (e.g., subacute rumen acidosis). However, determining the causal relationships among these datasets remains an arduous task. Hence, researchers have used several analyses, such as DAA, correlation, and association analyses, to infer potential relationships. DAA can identify microbial taxa (primarily genera, OTUs, or ASV), functional categories of genes, and less frequently pathway enrichment, transcripts, and proteins that are differentially abundant between diets, animal groups, treatments, and animal production traits. Several analysis methods, including analysis of compositions of microbiomes with bias correction (ANCOM-BC), which address the data features of the rumen microbiome, in particular zero inflation and compositional effects, along with partial least squares discriminant analysis (PLS-DA) and linear discriminant analysis effect size (LEfSe), have been commonly used in DAA. Studies in ruminant nutrition frequently involve repeated measurements of the same subjects (for example, using a Latin square design) and experimental designs incorporating fixed and random effects (such as the randomized complete block design). For these studies, DAA methods capable of analyzing mixed effects, like LinDA (Zhou et al. Reference Zhou, K. and Chen2022), should be used. However, all these methods have certain limitations (Nearing et al. Reference Nearing, Douglas and Hayes2022). To further improve DAA of microbiome data, some new methods that can better address the microbiome data features have been developed, such as ZicoSeq (Yang and Chen Reference Yang and Chen2022), LOCOM (Hu et al. Reference Hu, Satten and Hu2022), and CDEMI (Wang et al. Reference Wang, Liang and Chen2023a). Future rumen microbiome nutriomics studies should employ these new methods. Like DAA, DGE analysis unveils variations in the expression of microbial genes and pathway enrichment; these variations can be associated with differences in diets, rumen microbiome structure, rumen functions, or animal production traits. Correlations among these datasets can also be evaluated with appropriate, non-parametric methods. Differentially abundant microbial taxa and other microbiome features between, or those correlated with, specific rumen functions and animal production traits may guide further investigations into the causal relationships in rumen microbiome nutriomics.
Association analyses are used to reveal specific rumen microbiome taxa, often at the OTU (or ASV) and genus levels, as well as categories of functional genes linked to diets and animal production traits. Recently, microbiome, microbiota, or metagenome-wide association studies (MWAS) have been conducted to discover all taxa or functional gene categories associated with a specific host phenotype or disease status in humans and animals (Wang and Jia Reference Wang and Jia2016). Despite having been frequently used to unveil gut microbes associated with diseases in humans (e.g., (Liu et al. Reference Liu, Jiang and Gu2021)) and feed efficiency (Aliakbari et al. Reference Aliakbari, Zemb and Cauquil2022) along with intramuscular fat content (Wang et al. Reference Wang, Zhou and Zhou2022b) in pigs, MWAS has only been used in a few recent studies on ruminants (Boggio et al. Reference Boggio, Christensen and Legarra2023a; Wang et al. Reference Wang, Zhang and Zhang2023b). In a sheep study, MWAS did not identify any OTUs associated with dairy traits (Boggio et al. Reference Boggio, Christensen and Legarra2023a). The rumen microbiome has thousands of OTUs. Individual OTUs may lack sufficient “weight” to exhibit significant association. Therefore, MWAS might be more effectively applied to genera. Until now, MWAS has only been utilized to associate microbes detected through metataxonomics with animal production traits. However, assessing associations between rumen microbes and variations in diets and rumen functions will be equally applicable. Furthermore, MWAS should be able to examine associations between diet or animal production traits with rumen microbiome data derived from other omics technologies. An analogous approach, virome-wide association studies (VWAS), could be developed to identify rumen viruses associated with diets, specific rumen microbes, rumen functions, and animal production traits. This would represent a viral version of MWAS, extending the scope of broad association studies to include the viral component of the rumen ecosystem.
Many statistical or data analytics approaches, such as correlation, regression, probability, random forest, and deep learning, can be used in MWAS. However, the distinct data features of microbiomes may pose challenges to the robustness of MWAS. New methods are being developed to improve and streamline MWAS further. Recent examples include omnibus metagenome-wide association study with robustness (OMARU) (Kishikawa et al. Reference Kishikawa, Tomofuji and Inohara2022), MiATDS (Sun et al. Reference Sun, Huang and Fu2021), and multiMiAT (Sun et al. Reference Sun, Wang and Xiao2023a). OMARU rigorously controls the statistical significance of MWAS results, including the correction of hidden confounding factors and the application of multiple test comparisons (Kishikawa et al. Reference Kishikawa, Tomofuji and Inohara2022). Additionally, OMARU can evaluate pathway-level links between metagenomes, as well as links between taxa and genes in metagenomes. MiATDS performs adaptive microbiome-based association analysis to detect microbial association signals with diverse sparsity levels (i.e., sparse, low sparse, non-sparse). This is achieved by defining the probability degree to measure the associations between microbes and host phenotypes and introducing the adaptive weighted sum of powered score tests by considering both probability degree and phylogenetic information (Sun et al. Reference Sun, Huang and Fu2021). Divergently, implementing the multinomial logit model framework, multiMiAT supports MWAS between microbiomes and ordinal or nominal multicategory host phenotypes or traits (Sun et al. Reference Sun, Wang and Xiao2023a). Additionally, genome-wide association studies (GWAS) using rumen microbes as traits can identify heritable rumen microbes (Mani et al. Reference Mani, Aiyegoro and Adeleke2022; Wang et al. Reference Wang, Zhang and Zhang2023b). The integration of MWAS with GWAS, referred to as microbiome genome‐wide association studies (mGWAS), provides a comprehensive approach for identifying heritable microbes associated with a specific phenotype (Wang et al. Reference Wang, Wang and Sun2022a). For example, in growing lambs, mGWAS helped identify four genera of heritable rumen microbes associated with body weight (Wang et al. Reference Wang, Zhang and Zhang2023b). This integrated methodology holds promise in identifying rumen microbes that can serve as potential markers, facilitating selective breeding for enhanced production traits.
It should be emphasized that association studies can only identify rumen microbes or other microbiome features exhibiting statistical correlations with diets, rumen functions, or specific animal production traits. Although prior knowledge of rumen microbes and their metabolism, meta-analyses of relevant studies, and longitudinal studies can help infer the biological plausibility of associations, caution must be exercised in interpreting these associations as causal relationships. To transcend mere correlation and association, it is crucial to establish causal relationships between diets, features of the rumen microbiome, rumen functions, and key animal production traits. Experimental testing or verification of the above causal relationships is arduous due to the complexity of the rumen microbiome and its intricate interactions with diet and animals. At the microbiome level, causality can be deduced through modeling approaches that integrate causality principles (Munoz-Tamayo et al. Reference Munoz-Tamayo, Davoudkhani and Fakih2023). Several methods have been used to predict the microbes potentially driving a specific production trait in ruminants. These methodologies include SEM (Hertel et al. Reference Hertel, Heinken and Fassler2023) and causal Bayesian networks (CBNs) (Stebliankin et al. Reference Stebliankin, Sazal and Valdes2022). SEM allows researchers to investigate the direct and indirect effects of variables on one another, furnishing comprehensive insights into their complex relationships within a theoretical framework. On the other hand, CBNs utilize directed edges to represent causal relationships or data dependencies between variables, providing causal inference between rumen microbes and diet or animal production traits. SEM has proven valuable in identifying rumen microbial modules in dairy cows that potentially regulate milk protein yield (Zhang et al. Reference Zhang, Wang and Liu2023b) and CH4 emissions (Saborio-Montero et al. Reference Saborio-Montero, Gutierrez-Rivas and Garcia-Rodriguez2020). While CBN analysis has not been used in rumen microbiome nutriomics studies, it has demonstrated utility in inferring causality between the gut microbiome and colorectal cancer (Kharrat et al. Reference Kharrat, Assidi and Abu-Elmagd2019) and between infant gut microbiome and resistome (Stebliankin et al. Reference Stebliankin, Sazal and Valdes2022) in humans. Causal inference models have also found applications in other human gut microbiome research (Hughes et al. Reference Hughes, Bacigalupe and Wang2020; Sun et al. Reference Sun, Gao and Wu2023b). These analysis methods can be instrumental in deducing causal relationships in rumen microbiome nutriomics studies.
Machine learning has been increasingly used in investigating microbiomes, including rumen microbiomes. Compared with the traditional linear models commonly used in animal science research, machine learning is advantageous in analyzing large multidimensional data and inferring non-linear relationships. Machine learning has proven effective in predicting animal performance, exemplified by its application in forecasting CH4 emissions from sheep (Zhang et al. Reference Zhang, Lin and Moraes2023a), feed efficiency in dairy cows (Xue et al. Reference Xue, Xie and Zhong2022), and animal health conditions (Zhong et al. Reference Zhong, Xue and Sun2020) based on high-dimensional rumen microbiome data. Akin to its potential in human microbiome research, machine learning will prove itself to be useful in rumen microbiome nutriomics investigations, particularly in facilitating causal inference. Furthermore, recent studies have demonstrated the potential of artificial intelligence in human microbiome research (e.g., (Gao et al. Reference Gao, Gao and Zhu2023)). This suggests the prospect of applying artificial intelligence in rumen microbiome nutriomics research, including discovering causality by analyzing large datasets of diets, rumen microbiomes, and animals and identifying their patterns and associations with animal production traits.
Assessing the magnitude of the rumen microbiome contribution to animal production traits
Because of its vital role in ruminant nutrition, the rumen microbiome significantly influences key animal production traits. Recent studies have quantitatively assessed the microbiability of specific animal traits, shedding light on the contribution of the rumen microbiome. Noteworthy among these traits are CH4 emissions (m 2 of 13%) (Difford et al. Reference Difford, Plichta and Lovendahl2018), RFI (m 2 of about 20%), milk energy (m 2 of about 25%) (Boggio et al. Reference Boggio, Monteiro and Lima2023b), and milk fatty acid composition in dairy cows (m 2 > 30% for some fatty acids) (Buitenhuis et al. Reference Buitenhuis, Lassen and Noel2019), as well as milk composition (m 2 of up to 7%) (Boggio et al. Reference Boggio, Christensen and Legarra2023a) and body weight (m 2 of 20%) in sheep (Wang et al. Reference Wang, Zhang and Zhang2023b). These studies estimated the plausible contribution of the entire rumen microbiome to the production traits. Furthermore, MWAS based on single-OTU regression has revealed a small number of fecal OTUs significantly or suggestively linked to traits like RFI, feed conversion ratio (FCR), daily feed intake, and back fat thickness in pigs (Aliakbari et al. Reference Aliakbari, Zemb and Cauquil2022). In a recent study utilizing machine learning to develop prediction models for CH4 emissions from sheep, certain genera of rumen microbes were selected as predictor variables alongside animal data (Zhang et al. Reference Zhang, Lin and Moraes2023a). The incorporation of microbial prediction variables not only enhanced prediction accuracy but also bolstered model robustness. This machine learning approach identifies rumen microbes potentially associated with CH4 emissions and provides insights into their effect sizes through the coefficients of the microbial predictor variables.
No study has assessed the microbiability of rumen functions, such as feed digestibility and VFA profiles. Significant microbiability has been demonstrated for digestive efficiency in pigs (Deru et al. Reference Deru, Tiezzi and Carillier-Jacquin2022). Given the direct correlation between rumen functions, feed efficiency, and other key animal production traits, future research is warranted to investigate the microbiability of major rumen functions and the rumen microbial taxa (genera or species) contributing to those functions. Furthermore, heritable rumen bacteria contribute more to the microbiability of lactation performance than their nonheritable counterparts (Zang et al. Reference Zang, Sun and Xue2022). Given that host genetics significantly influence heritable rumen microbes, a novel metric called “holobiality” (ho 2) has been proposed. This metric combines the heritability of specific production traits with their microbial contribution (microbiability). By quantifying the joint influence of the host genome and rumen microbiome, holobiality offers promising potential for predicting improvement in these traits.
Concluding remarks and future perspectives
The ruminant industry faces challenges in optimizing feed efficiency, minimizing environmental impact, and enhancing the quality of milk and meat to meet growing global demands for dairy and meat products. The diverse rumen microbiome, as the primary supplier of metabolizable energy, protein, and precursors of milk and muscle protein, requires a more profound understanding with respect to its composition and functions, as well as its interactions with diet, animal genotypes, and production traits. Furthermore, it is essential to unveil the causal relationship among these layers of variables, including data from ruminants. While various omics technologies have been used to investigate the rumen microbiome, each has limitations. Recently, the use of two or three multi-omics has shown promise in providing a more comprehensive characterization of the rumen microbiome. However, the data generated through these technologies are often not sufficiently analyzed or interpreted in an integrated manner. The proposed concept of rumen microbiome nutriomics advocates for the integration of multiple omics and data analyses. This integrated approach aims to advance our comprehensive understanding of the rumen microbiome and its intricate interactions with both diet and animals. The establishment of the Animal Nutriomics journal serves as a vital platform for disseminating and exchanging novel findings from rumen microbiome nutriomics investigations, facilitating collaboration and knowledge dissemination within the scientific community.
Central to rumen microbiome nutriomics is the development of a comprehensive RMGD. While some researchers have developed in-house databases, there exists significant variability in their completeness and accuracy of curation, thereby exerting a significant influence on the analysis outcomes (Smith et al. Reference Smith, Glendinning and Walker2022). A serious undertaking is needed to compile the genomes and high-quality MAGs of rumen microbes into a publicly accessible RMGD. Concerted efforts among researchers in the realm of the rumen microbiome are needed to sequence more genomes of rumen protozoa, fungi (in particular), and viruses or mine the existing rumen metagenomes. Moreover, comparisons of results and findings among studies have been compromised or difficult due to the lack of sufficient metadata and technical variations across studies, such as study design; sampling; extraction or isolation of DNA, RNA, protein, and metabolites; as well as bioinformatic analyses (Hagey et al. Reference Hagey, Laabs and Maga2022; McGovern et al. Reference McGovern, Waters and Blackshields2018; Rintala et al. Reference Rintala, Pietila and Munukka2017). A set of criteria for the above technical aspects and workflows will be valuable to standardize the processes of rumen microbiome nutriomics investigations. Such standardization will be particularly invaluable for data reanalysis using big data analytics.
The “holy grail” of rumen microbiome nutriomics is to reveal and understand the causal relationships between different layers of data, ranging from diet, rumen microbiome, rumen functions, animal genotypes, and key production traits. DAA, correlation, and association analysis may help identify potential interactions and indicators of some important aspects, such as feed efficiency, product quality, or methane emissions. However, causal inference is urgently needed to establish their cause–effect relationships. Several approaches, including modeling and CBNs, can be used. Machine learning and artificial intelligence also hold potential in this pursuit in future investigations.