Machine learning helps determine success of advanced genome editing

Researchers at the Wellcome Sanger Institute have developed a new tool to predict the chances of successfully inserting a gene-edited sequence of DNA into the genome of a cell, using a technique known as prime editing.

An evolution of CRISPR-Cas9 gene editing technology, prime editing has huge potential to treat genetic disease in humans, from cancer to cystic fibrosis. But thus far, the factors determining edits' success are poorly understood.

Human genome sequencing - artistic impression. Image credit: NIH NHGRI

Human genome sequencing – artistic impression. Image credit: NIH NHGRI via Flickr, CC BY-NC 2.0

The study, published in Nature Biotechnology, assessed thousands of different DNA sequences introduced into the genome using prime editors. These data were then used to train a machine learning algorithm to help researchers design the best fix for a given genetic flaw, which promises to speed up efforts to bring prime editing into the clinic.

Developed in 2012, CRISPR-Cas9 was the first easily programmable gene editing technology1. These ‘molecular scissors’ enabled researchers to cut DNA at any position in the genome in order to remove, add or alter sections of the DNA sequence. The technology has been used to study which genes are important for various conditions, from cancer to rare diseases, and to develop treatments that fix or turn off harmful mutations or genes.

Base editors were an innovation expanding on CRISPR-Cas9 and were called ‘molecular pencils’ for their ability to substitute single bases of DNA. The latest gene editing tools, created in 2019, are called prime editors. Their ability to perform search and replace operations directly on the genome with a high degree of precision has led to them being dubbed ‘molecular word processors’.

The ultimate aim of these technologies is to correct harmful mutations in people’s genes2. Over 16,000 small deletion variants – where a small number of DNA bases have been removed from the genome – have been causally linked to disease. This includes cystic fibrosis, where 70 per cent of cases are caused by the deletion of just three DNA bases. In 2022, base edited T-cells were successfully used to treat a patient’s leukaemia, where chemotherapy and bone marrow transplant had failed3.

In this new study, researchers at the Wellcome Sanger Institute designed 3,604 DNA sequences of between one and 69 DNA bases in length. These sequences were inserted into three different human cell lines, using different prime editor delivery systems in various DNA repair contexts4. After a week, the cells were genome sequenced to see if the edits had been successful or not.

The insertion efficiency, or success rate, of each sequence was assessed to determine common factors in the success of each edit. The length of sequence was found to be a key factor, as was the type of DNA repair mechanism involved.

Source: Sanger Institute