In a comprehensive systematic review and meta-analysis published in Translational Vision Science & Technology, researchers evaluated the diagnostic performance of artificial intelligence (AI) models in detecting inherited retinal diseases (IRDs), including retinitis pigmentosa (RP), Stargardt disease, and familial exudative vitreoretinopathy (FEVR).
The researchers reviewed 5,412 articles from multiple databases and ultimately included 22 studies in the systematic review and 21 in the meta-analysis. Using imaging modalities such as optical coherence tomography (OCT), fundus photography, widefield and ultrawidefield imaging, and fundus autofluorescence (FAF), eligible studies reported sensitivity, specificity, and confusion matrix data for AI-based diagnosis of IRDs. Most studies used deep learning with convolutional neural networks (CNNs) such as ResNet, Inception, and Xception architectures. Quality assessment was performed with the QUADAS-2 tool, and pooled estimates were calculated using random-effects models.
AI models demonstrated consistently high accuracy in diagnosing inherited retinal diseases, with their strongest results for retinitis pigmentosa (RP). Across studies, pooled sensitivity for RP was 94% and specificity 99%, with a diagnostic odds ratio (DOR) of 2,486—aligning with reported accuracies of up to 99.9%. For Stargardt disease, pooled sensitivity was 96% and specificity 99% (DOR 2,236), also within the upper ranges reported. Detection of familial exudative vitreoretinopathy (FEVR) was slightly less robust, with pooled sensitivity of 85% and specificity of 99%, though still comparable to the highest published values.
Performance by Imaging Modality
FAF showed high accuracy for RP (area under the curve [AUC] = 0.998 to 0.999, sensitivity 71% to 100%, specificity 97 to 99.5%) and Stargardt disease (AUC = 0.981 to 0.998, sensitivity 96% to 98%, specificity up to 100%).
OCT showed accuracy 97.9% to 99.6%, sensitivity 97.9% to 99.8%, and specificity 98% to 100%.
Widefield and ultrawidefield imaging showed higher and statistically significant specificity for RP detection (99% vs 98% in studies that did not use this imaging; P=.03). “This may have to do with the wider view of the retina in a wide-field image, where more peripheral retinal abnormalities–one of the key features for the progression of RP–could be taken into consideration,” the researchers wrote. In FEVR cases, deep learning models with wide-angle retinal imaging were between 89% to 94% accurate. Sensitivity ranged from 75% to 91% and specificity was up to 98%.
Color fundus photography varied in accuracy (85.3 to 100% for RP detection), sensitivity (88% to 100%), and specificity (70.2% to 99.5%).
The researchers did not find a significant difference in sensitivity, specificity, and overall performance between studies that used fluorescence imaging and those that did not, or between studies that used classification only vs those that used classification in conjunction with segmentation or object detection. Still, lower sensitivity but comparable specificity between studies that used fluorescence may be a result of subtle changes that are invisible to even AI in early stages of disease, the authors noted.
Error analysis indicated false positives, which were often due to high myopia mimicking RP, and false negatives, which were frequently caused by media opacities—such as cataracts—and reduced image contrast. The researchers noted moderate certainty for pooled sensitivity and specificity, but low certainty for DOR due to possible publication bias, particularly in Stargardt disease studies (P=.04). Heterogeneity across studies was moderate (I² = 45%).
Conclusion
“Based on the results of our review, AI models exhibit high diagnostic performance," noted the authors, who said that the results are aligned with previous research in AI diagnosis of RP. They added that Xception outperformed the CNN models in other studies, saying that this model may be “better suited to capturing the complex patterns in retinal images necessary for accurate classification.”
Transfer learning contributed to improved model performance, especially in smaller datasets, the authors noted.
They concluded: “Whereas the potential for AI to revolutionize retinal disease diagnosis is evident, further research is needed to address current limitations and ensure that these models are robust and applicable across diverse clinical settings. In addition to conventional diagnostic methodologies, AI can potentiate early diagnosis and hence provide personalized care to patients with IRD.”