New model acts as search engine for large databases of pathology images, has potential to identify rare diseases and therapies.

Hospital diagnostics laboratory - illustrative photo.

Hospital diagnostics laboratory – illustrative photo. Image credit: Max Pixel, CC0 Public Domain

Rare diseases are often difficult to diagnose, and predicting the best course of treatment can be challenging for clinicians. To help address these challenges, investigators from the Mahmood Lab at Harvard Medical School and Brigham and Women’s Hospital have developed a deep-learning algorithm that can teach itself to learn features that can then be used to find similar cases in large pathology image repositories.

SISH (self-supervised image search for histology), the new tool, acts like a search engine for pathology images. It has many potential applications, including identifying rare diseases and helping clinicians determine which patients are likely to respond to similar therapies. A paper describing the self-teaching algorithm is published in Nature Biomedical Engineering.

“We show that our system can assist with the diagnosis of rare diseases and find cases with similar morphologic patterns without the need for manual annotations and large datasets for supervised training,” said senior author Faisal Mahmood, assistant professor of pathology at HMS at Brigham and Women’s. “This system has the potential to improve pathology training, disease subtyping, tumor identification, and rare morphology identification.”

Modern electronic databases can store vast reams of digital records and reference images, particularly in pathology, using whole slide images (WSIs). However, the gigapixel size of each WSI and the ever-increasing number of images in large repositories means that the search and retrieval of WSIs can be slow and complicated. As a result, scalability remains a pertinent roadblock for efficient use.

To solve this issue, the research team developed SISH, which teaches itself to learn feature representations that can be used to find cases with analogous features in pathology at a constant speed regardless of the size of the database.

In their study, the researchers tested the speed and ability of SISH to retrieve interpretable disease subtype information for common and rare cancers. The algorithm successfully retrieved images with speed and accuracy from a database of tens of thousands of WSIs from over 22,000 patient cases, with over 50 different disease types and over a dozen anatomical sites. The speed of retrieval outperformed other methods in many scenarios, including disease subtype retrieval, particularly as the image database size scaled into the thousands of images. Even while the repositories expanded in size, SISH was still able to maintain a constant search speed.

The algorithm, however, has some limitations, including a large memory requirement, limited context awareness within large tissue slides and the fact that it is limited to a single imaging modality.

Overall, the algorithm demonstrated the ability to retrieve images independently of repository size and in diverse datasets. It also demonstrated proficiency in diagnosis of rare disease types and the ability to serve as a search engine to recognize certain regions of images that may be relevant for diagnosis. This work may greatly inform future disease diagnosis, prognosis, and analysis approaches.

“As the sizes of image databases continue to grow, we hope that SISH will be useful in making identification of diseases easier,” said Mahmood. “We believe one important future direction in this area is multimodal case retrieval, which involves jointly using pathology, radiology, and genomic and electronic medical record data to find similar patient cases.”

Source: HMS