AI and Dementia Detection: What Machine Learning Can and Cannot Do
Artificial intelligence applied to speech, drawing tests, retinal images, and brain scans is achieving near-clinical accuracy in some cognitive assessment domains. Here is what the evidence shows.
What AI-based dementia detection involves
Artificial intelligence, specifically machine learning and deep learning models, is being applied to a wide range of data types to detect or predict cognitive decline and dementia. The underlying principle is that AI models can identify subtle patterns in complex, high-dimensional data that human evaluators cannot reliably detect — and that these patterns appear early in the disease process, before clinical symptoms are obvious.
The data types being studied include: spontaneous speech (recorded during clinical picture description tasks or everyday conversation), clock drawing tests (analyzed digitally rather than by a clinician), retinal fundus photographs and OCT scans, brain MRI and PET scans, neuropsychological test performance patterns, and combinations of multiple data types.
The approach differs from traditional diagnostic criteria in that it does not try to classify based on explicit rules — it learns from thousands of examples of people who did and did not develop dementia, and finds statistical regularities in their data that predict outcomes. This makes AI potentially sensitive to patterns that have not been explicitly theorized, but also makes the models difficult to interpret and challenging to validate rigorously.
Current evidence: what AI can detect and how well
Speech analysis for Alzheimer's detection has produced some of the most impressive results. A landmark study published in the Lancet Digital Health in 2021 demonstrated that an AI model analyzing speech from the DementiaBank dataset (recordings of the Boston Cookie Theft picture description task) could predict Alzheimer's disease with approximately 83% accuracy — comparable to specialist clinical assessment. The model analyzed acoustic features, semantic coherence, and syntactic complexity.
Clock drawing test analysis has been transformed by AI. The traditional clock drawing test, where a clinician scores a patient's hand-drawn clock, is subjective and coarse. AI analysis of digitized clock drawings — measuring pen pressure, stroke timing, spatial organization, and hundreds of other features — achieves substantially higher discrimination between MCI and normal cognition than human scoring. Research from the University of Toronto and MIT has published extensively in this area.
For brain imaging, AI models analyzing MRI volumetrics and white matter patterns have shown strong performance in predicting progression from MCI to Alzheimer's — in some studies exceeding 85% accuracy for 3-year conversion prediction. These models integrate multiple imaging features that would be impractical for a radiologist to systematically quantify.
Multimodal AI models — those that combine speech, cognitive testing, imaging, and biomarker data — generally outperform any single-modality approach. A 2023 study published in Nature Aging demonstrated that combining plasma biomarkers with cognitive test performance and speech features in a machine learning model achieved 94% accuracy for distinguishing early Alzheimer's from healthy controls.
What this means for people managing cognitive health today
AI-based cognitive assessment tools are beginning to enter clinical practice, primarily as decision-support tools for clinicians rather than as autonomous diagnostics. Tools like Winterlight Labs speech analysis and BioSymetrics cognitive test analytics are being piloted in memory clinics to enhance clinical evaluation rather than replace it.
For consumers, the AI cognitive assessment landscape is a mix of genuinely promising research tools and wellness apps making inflated claims. Caution is warranted: a high accuracy figure in a published study does not necessarily translate to reliable performance in the general population or on data collected differently from the training set. Validated AI tools should have published performance data, defined populations, and known limitations.
One area where AI is already contributing meaningfully for individuals is in the analysis of structured cognitive test data over time. If you take the same cognitive tasks repeatedly, AI approaches can better distinguish meaningful decline from normal variation than single-point assessment — because they can model your personal baseline and the full distribution of your performance across time.
The most important limitation to understand is that AI models inherit the biases and limitations of their training data. Most AI dementia detection models have been trained primarily on White, highly educated, English-speaking cohorts. Performance in diverse populations has often been lower and less well-studied. This is an active area of methodological concern in the field.
The bigger picture: AI as part of an early detection ecosystem
AI is not going to replace the clinical evaluation, the patient relationship, or the judgment call about when and how to communicate a concerning finding to a patient and family. But it is increasingly likely to become an essential component of an early detection ecosystem — one that processes the large volumes of data that accumulate from continuous monitoring, biomarker testing, and cognitive assessment, and flags people who warrant closer attention.
The combination of AI with passive digital biomarkers is particularly promising. If AI can reliably extract cognitive signal from smartphone typing patterns, voice recordings in natural conversation, or wearable motion data, it becomes possible to continuously monitor large populations at low cost — something impossible with periodic clinical assessment.
Regulatory frameworks are catching up to these developments. The FDA has cleared several AI-based medical devices in neurology, and the regulatory pathway for AI-assisted cognitive assessment tools is being actively developed. Validation standards, performance transparency requirements, and bias auditing are becoming part of the regulatory conversation.
For Keel specifically, AI approaches are relevant to making sense of the longitudinal data that daily testing generates. Distinguishing a real cognitive trend from normal day-to-day variation is fundamentally a pattern recognition problem — one where machine learning approaches that model individual baselines and account for confounding factors (sleep, mood, time of day) can add genuine value.
Frequently asked questions
Can an AI app accurately detect Alzheimer's disease?
Research AI models have achieved high accuracy in controlled studies, but consumer apps vary enormously in validation quality. No AI-based consumer app has been FDA-cleared as a diagnostic tool for Alzheimer's. Published research tools, like those analyzing the Boston Cookie Theft speech task, show genuine promise but are designed for research or clinical support settings, not direct-to-consumer diagnosis.
What is the clock drawing test and how does AI improve it?
The clock drawing test is a classic neuropsychological assessment where patients draw a clock face from memory, showing a specific time. Human scoring captures gross errors but misses subtle patterns. AI analysis of digitized drawings can measure hundreds of features — stroke timing, pressure, spatial organization — that predict MCI and early Alzheimer's with much higher sensitivity than human scoring alone.
Is AI cognitive assessment biased toward certain populations?
Yes, this is a recognized and significant concern in the field. Most AI cognitive assessment models have been trained predominantly on White, English-speaking, highly educated populations. Performance tends to be lower and less well-validated in other demographic groups. Researchers are actively working to create more diverse training datasets, but users and clinicians should be aware of this limitation when interpreting AI-generated assessments.
Related resources
The data inputs — typing, gait, speech — that AI models are being applied to for cognitive detection.
Retinal Scans and Cognitive DeclineOne of the imaging modalities where AI is showing strong performance for cognitive health assessment.
Bad Days vs. Real ChangesHow Keel distinguishes genuine cognitive trends from normal daily variation.
Start tracking your cognitive baseline
Four minutes a day. Five short tests. One trend line that builds over weeks and months so you can see where you stand — and separate a bad day from a real change.
Free to start. No account required. Not a diagnostic tool.