The Human Genome Project was an ambitious initiative to sequence every piece of human DNA. The project drew together collaborators from research institutions worldwide, including MIT's Whitehead Institute for Biomedical Research, and was finally completed in 2003.
Now, over two decades later, MIT Professor Jonathan Weissman and colleagues have gone beyond the sequence to present the first comprehensive functional map of genes expressed in human cells. The data from this project, published online in Cell, ties each gene to its job in the cell and is the culmination of years of collaboration on the single-cell sequencing method Perturb-seq.
The data are available for other scientists to use. “It’s a big resource in the way the human genome is a big resource, in that you can go in and do discovery-based research,” says Weissman, a member of the Whitehead Institute and an investigator with the Howard Hughes Medical Institute. “Rather than defining ahead of time what biology you're going to be looking at, you have this map of the genotype-phenotype relationships, and you can go in and screen the database without having to do any experiments.”
The screen allowed the researchers to delve into diverse biological questions. They used it to explore the cellular effects of genes with unknown functions, investigate the response of mitochondria to stress, and screen for genes that cause chromosomes to be lost or gained. This phenotype has proved difficult to study in the past. “I think this dataset is going to enable all sorts of analyses that we haven't even thought up yet by people who come from other parts of biology, and suddenly they just have this available to draw on,” says former Weissman Lab postdoc Tom Norman, a co-senior author of the paper.
Pioneering Perturb-seq
The project takes advantage of the Perturb-seq approach that makes it possible to follow the impact of turning on or off genes with unprecedented depth. This method was first published in 2016 by a group of researchers, including Weissman and fellow MIT professor Aviv Regev, but could only be used on small sets of genes at great expense.
The massive Perturb-seq map was made possible by foundational work from Joseph Replogle, an MD-PhD student in Weissman’s lab and co-first author of the present paper. Replogle, in collaboration with Norman, who now leads a lab at Memorial Sloan Kettering Cancer Center; Britt Adamson, an assistant professor in the Department of Molecular Biology at Princeton University; and a group at 10x Genomics, set out to create a new version of Perturb-seq that could be scaled up. The researchers published a proof-of-concept paper in Nature Biotechnology in 2020.
The Perturb-seq method uses CRISPR-Cas9 genome editing to introduce genetic changes into cells. It then uses single-cell RNA sequencing to capture information about the RNAs that are expressed resulting from a given genetic change. Because RNAs control all aspects of how cells behave, this method can help decode the many cellular effects of genetic changes.
Since their initial proof-of-concept paper, Weissman, Regev, and others have used this sequencing method on smaller scales. For example, the researchers used Perturb-seq in 2021 to explore how human and viral genes interact throughout an infection with HCMV, a common herpes virus.
In the new study, Replogle and collaborators, including Reuben Saunders, a graduate student in Weissman’s lab and co-first author of the paper, scaled up the method to the entire genome. Using human blood cancer cell lines and noncancerous cells derived from the retina, he performed Perturb-seq across more than 2.5 million cells. He used the data to build a comprehensive map tying genotypes to phenotypes.
Delving into the data
Upon completing the screen, the researchers decided to put their new dataset to use and examine a few biological questions. “The advantage of Perturb-seq is it lets you get a big dataset in an unbiased way,” says Tom Norman. “No one knows entirely the limits of what you can get out of that kind of dataset. Now, the question is, what do you do with it?”
The first, most obvious application was to look into genes with unknown functions. Because the screen also read out phenotypes of many known genes, the researchers could use the data to compare unknown genes to known ones and look for similar transcriptional outcomes, suggesting the gene products worked together as part of a larger complex.
The mutation of one gene called C7orf26, in particular, stood out. Researchers noticed that genes whose removal led to a similar phenotype were part of a protein complex called Integrator that played a role in creating small nuclear RNAs. The Integrator complex comprises many smaller subunits — previous studies had suggested 14 individual proteins — and the researchers were able to confirm that C7orf26 made up a 15th component of the complex.
They also discovered that the 15 subunits worked in smaller modules to perform specific functions within the Integrator complex. “Absent this thousand-foot-high view of the situation, it was not so clear that these different modules were so functionally distinct,” says Saunders.
Another perk of Perturb-seq is that because the essay focuses on single cells, the researchers could use the data to look at more complex phenotypes that become muddied when studied together with data from other cells. “We often take all the cells where ‘gene X’ is knocked down and average them together to look at how they changed,” Weissman says. “But sometimes when you knock down a gene, different cells losing that same gene behave differently, and the average may miss that behavior.”
The researchers found that a subset of genes whose removal led to different outcomes from cell to cell was responsible for chromosome segregation. Their removal was causing cells to lose a chromosome or pick up an extra one, a condition known as aneuploidy. “You couldn't predict what the transcriptional response to losing this gene was because it depended on the secondary effect of what chromosome you gained or lost,” Weissman says. “We realized we could then turn this around and create this composite phenotype looking for signatures of chromosomes being gained and lost. In this way, we've done the first genome-wide screen for factors required for the correct segregation of DNA.”
“I think the aneuploidy study is the most interesting application of this data so far,” Norman says. “It captures a phenotype you can only get using a single-cell readout. You can’t go after it any other way.”
The researchers also used their dataset to study how mitochondria responded to stress. Mitochondria, which evolved from free-living bacteria, carry 13 genes in their genomes. Within the nuclear DNA, around 1,000 genes are somehow related to mitochondrial function. “People have been interested for a long time in how nuclear and mitochondrial DNA are coordinated and regulated in different cellular conditions, especially when a cell is stressed,” Replogle says.
The researchers found that when they perturbed different mitochondria-related genes, the nuclear genome responded similarly to many different genetic changes. However, the mitochondrial genome responses were much more variable.
“There’s still an open question of why mitochondria still have their DNA,” said Replogle. “A big-picture takeaway from our work is that one benefit of having a separate mitochondrial genome might be having localized or very specific genetic regulation in response to different stressors.”
“If you have one mitochondrion that’s broken and another one that is broken differently, those mitochondria could respond differently,” Weissman says.
In the future, the researchers hope to use Perturb-seq on different types of cells besides the cancer cell line they started in. They also hope to continue to explore their map of gene functions and hope others will do the same. “This is the culmination of many years of work by the authors and other collaborators, and I’m pleased to see it continue to succeed and expand,” says Norman.
Written by Eva Frederick