Genomes contain regions between protein-coding genes that produce lengthy RNA molecules that never give rise to a protein. These long intergenic non-coding RNAs (lincRNAs) are thought to have essential functions, such as regulating responses to environmental change. However, a paucity of well-annotated lincRNA data, especially for crop plants, has precluded a deeper understanding of their roles.
Until now, no systematic genome-wide studies have confirmed DNA sequences that produce lincRNAs and proposed functions for those lincRNAs. Plus, data are reported differently across studies, making direct comparisons difficult.
These barriers inspired researchers at the Boyce Thompson Institute to take a comprehensive look at the identity, production, and function of lincRNAs in four species in the mustard family, including the model organism Arabidopsis thaliana, and Brassica rapa. This species produces boy Choy, turnips, and other food crops.
The group found locations across all four genomes that encoded lincRNAs and proposed functions for them. It confirmed the function of some lincRNAs involved in germination in Arabidopsis, creating an approach that could help researchers further understand the enigmatic molecules in all species, from crops to humans.
Their results were published in The Plant Cell.
“Our goal was to generate extensive and more actionable data for researchers to understand the lincRNA function,” said Kyle Palos, first author on the study and a postdoctoral fellow in the lab of BTI Assistant Professor Andrew Nelson.
“This project started small and mushroomed after we realized we couldn’t begin to figure out lincRNA function without having thoroughly annotated genomes to know what lincRNAs were even present,” said Nelson, the corresponding author of the paper. “Kyle really led the charge on everything that went into this paper.”
The team hypothesized that lincRNA production and function are limited to certain cell types and environmental conditions. The more common data sets don’t cover that level of detail, “so it’s easy to miss a lot due to limited sampling,” Palos said. “Our comprehensive approach merges a high-throughput, top-level analysis that identifies lincRNAs with a deeper dive into their likely functions, to give the full picture.”
The study utilized a unified approach to gathering and annotating lincRNA data that other groups could easily adopt. According to Palos, this would facilitate comparisons across different experiments and species as the body of plant lincRNA data continues to grow.
The team uploaded its results to CyVerse, a free and open-science workspace where researchers can store, access and analyze data all in one place.
“We made it as simple as possible for others to search our results for lincRNAs involved in a plant trait or pathway of interest, and in responses to temperature and other environmental stressors,” Palos said.
The team’s methods could also help resolve long-standing questions with genome-wide association studies (GWAS) that identify correlations between plant traits and gene variants: What is happening with variants that fall outside of protein-coding regions? Are these variants within other genes (i.e., lincRNAs) or regulatory elements?
“You need a properly annotated genome to know that, before you can determine the variant’s effect and how to modulate it to produce your crop of choice,” said Nelson. He is also an adjunct assistant professor in the School of Integrative Plant Science at Cornell University.
In the study, the team processed over 20,000 publicly available RNA sequence data sets from the four mustard species, supplemented with its sequencing data, to identify thousands of lincRNAs and then annotated them with genomic, structural and other information.
They assigned putative functions to the lincRNAs based on their similar expression patterns to protein-coding genes with known functions. Next, the team deleted a subset of lincRNAs that appeared to play roles in seed germination and development in Arabidopsis, which led to reductions in germination, thus validating their approach to determining lincRNA function.
In addition to ongoing studies of the germination-related lincRNAs, the team is applying its methods to lincRNAs in four more essential crops for which a wealth of RNA sequence data is available – rice, maize, sorghum, and Setaria italica (foxtail millet) – and has plans to expand into another nine well-sequenced species.
“Plant genome research often falls behind mammalian research, but with lincRNAs, we’re still very much in the dark across all species,” Nelson said. “Researching lincRNA in plants could impact human health and crops alike, by helping us understand their fundamental properties, regardless of the species.”
Source: Cornell University