Robert T. Chang, MD, explains how aggregated real-world data will drive practice patterns and algorithms going forward.
Artificial intelligence (AI) is the subject of numerous articles that tout how well machines function better than human ophthalmologists, but why is this happening?
In April 2018, the first FDA approval of autonomous AI for detecting referable diabetic retinopathy (DR) from fundus photos, the IDx-DR camera system (IDx Technologies Inc.), legitimized and essentially “jumpstarted” AI in ophthalmology, according to Robert T. Chang, MD, associate professor at the Byers Eye Institute of Stanford University, Palo Alto, CA.
“The most important thing to understand if the technology will become widespread is how quickly doctors and patients will trust an AI system, such as understanding its strengths and limitations, and how easily the technology will be integrated into current eyecare workflows, especially in terms of liability and business models,” he said.
The FDA was careful in approving the first specific AI doctorless screening method for detecting DR in fundus images with a heavy emphasis on safety (what could be missed).
The IDx-DR breakthrough-device, prospective, multicenter trial included exacting requirements, such as specific camera type, single primary reason for DR screening, a narrow asymptomatic population not previously evaluated for DR, and specific minimal cutoffs for specificity and sensitivity to detect DR that exceeded mild disease, a threshold which likely would not result in a bad outcome given a false negative.
Though the narrow confines of the 2017 trial may limit generalizability or slow adoption of telemedicine screening, an AI-driven screening approach may be ideal for ruling out negative disease, which frees up doctor time for positive cases, Dr. Chang explained.
“AI-based screening algorithms can achieve economy of scale and increase access to care at lower cost but high quality, inexpensive image capture remains a barrier,” he said.
Currently, deep learning has been deployed in the case of DR using supervised learning techniques requiring more than 100,000 labeled images (or subimages) to train the algorithm. With such a large number of examples, modern computational power helps finetune a “neural network” mathematical model to detect the most important features within an image to properly classify it with a certain degree of statistical certainty.
“With constant refinement, the model can achieve a performance that is equal or even superior to human pattern recognition, depending on the consensus ground truth (predetermined right answer),” Dr. Chang said.
This is in contrast to older AI algorithms in which human expert-driven features of DR were programmed, but these algorithms were not able to achieve superhuman performance.
Obtaining data sets
The rate limiting step in deep learning is clean “Big Data,” that is, sufficient variety and quality of training examples without too many artifacts and ideally the ability to ensure that the AI could calculate the relative risk of misclassification-essentially knowing when not to produce an answer, according to Dr. Chan.
This begs the question: Where does all the data to train algorithms come from?
The answer is “de-identified” public data sets that are essentially open sourced by various organizations. Every company is trying to aggregate its own unique or proprietary data set as a competitive advantage since AI algorithm architectures generally are published and cloud computational power is affordable now.
The only real differentiator is the training data quality, quantity, and diversity. Dr. Chang recounted the original Kaggle data science competition in 2014, including more than 600 teams with the aim of training an AI algorithm to screen for DR. The event that kicked off interest in neural networks applied to ophthalmology.
A follow-up similar contest took place last year sponsored by the Fourth Asia Pacific Tele-Ophthalmology Society (APTOS). The data set for this competition comprised thousands of images from the Aravind Eye Institute in India, which become public domain when released. More than 3,000 teams were interested in training a better algorithm to screen for DR. However, people quickly realized the definitions of disease in this data set differed subtley from India, and models from the prior competition did not achieve similar performance.
Humans may not see the small differences, but machines can.
The biggest problem faced in AI is having sufficient longitudinal, real-time, high-quality representative data cleaned up for the appropriate task. The narrower the task, the easier it is to acquire relevant diverse data. Going forward, there is a problem assessing algorithms against each other. Once a validation set is used, it is hard to keep that data private and not become incorporated into training data.
Also, data generally are stored all over the place in private data silos. Ideally, AI needs a globally shared data system to be truly generalizable.
Amassing enough data to garner trust in AI raises all kinds of ethical issues and considerations, including who owns an individual’s health data-the patient, the hospital or hospital system, the insurance company, or the government? Where will all the data be stored securely? Who is benefiting from monetizing it?
Blockchain technology involves decentralized sharing of information without a centralized owner, such as a Google or Facebook. The first mainstream association of the term “blockchain” was with bitcoin.
However, blockchain is really a platform technology, Dr. Chang explained.
“Basically, it is a type of shared public database across many computers, with no single owner, that records a series of time-sequenced, permanent transactions, which people can trust to be secure,” he said.
What makes the blockchain technology special is that it is updated in real time across many computer nodes, with a competitive, mathematically based validation system to verify the transactions “AKA blocks.”
The trust comes from the fact that it is too expensive for the shared database ledger to be manipulated, and thus the first use case was a decentralized digital currency record-keeping system.
In the healthcare realm, using blockchain platform technology combined with encryption, users can submit private image data, for example, which can be shared without actually becoming public domain.
Then, through secure computation, the image data can be used to train or validate AI algorithms without letting any user actually download the original data. This is the concept of the privacy-preserving blockchain as a platform for training and testing AI algorithms. Then a validation set can truly stay private.
“The key factor that attracts people to blockchain technology is that there is no centralized owner, and, thus, information stored on it cannot be manipulated by anyone without taking down the entire system,” Dr. Chang said.
How blockchain works for Bitcoin
The first best-known-use case of blockchain was to create an alternate decentralized online payment system, like Bitcoin, in contrast to a centralized electronic payment system, such as Paypal.
Bitcoin transactions or exchanges are stored on a public ledger in real-time for all computer nodes to see and verify. These computers also double as “miners” so there is an incentive to have many participants since the miners supported the nodes also have a chance to win a bitcoin reward. The value of the bitcoin is solely dependent on the supply-demand curve, like digital gold.
“The computation underlying the blockchain communication consists of hash functions or strings of numbers, and the decentralization component makes it difficult to hack all the computers simultaneously,” Dr. Chang said. “While the blockchain public ledger is immutable, hacks can occur at the cryptocurrency digital wallet level, where you hold your bitcoin for example but not at the exchange trading level.”
Connecting the dots
Imagine if all the images obtained in ophthalmology clinics were somehow uploaded easily to this decentralized blockchain system, and patients-not hospitals-owned their data. The data then could truly cross systems and borders without fear of loss of privacy, breaking down our current data silos.
In this new system, patients could give permission for any organizations to train or validate off their de-identified data and not worry about whether than can trust the owner (like a Google), Dr. Chang explained.
“There could also be a new data marketplace where secure computing allows data seekers to contract with data providers via a blockchain public ledger,” he said. “This is executed via a process called “smart contracts,” whereby multiple companies training AI algorithms could access data privately, not needing public domain datasets or data silos.”
The final question
How does the health data get into this secure, decentralized data marketplace for AI training?
Dr. Chang explained that his efforts are focused on speed and ease of use to capture the data- for example, by taking a picture of the test result, and immediately labeling it prospectively, rather than going back to export data and label it retrospectively.
He suggested that at the point of care, when patients are receiving their result, they could immediately upload a de-identified test result to the cloud that would be available for AI training through the privacy-preserving blockchain public ledger. This would be the start of a shareable data marketplace without relying on a central owner.
“If enough people worldwide got together and started populating a database like this, it may be faster than trying to fund an international multicenter registry or get organizations to sign data sharing use agreements,” Dr. Chang concluded.
Robert T. Chang, MD
Dr. Chang has previously received AI research funding from Santen and was the recipient of a Stanford Center for Innovation in Global Health grant but has no AI financial disclosures.