The University of Leeds has helped train an AI system called Optimise, that looked at health records of more than two million people.
…
Of those two million records that were scanned, more than 400,000 people were identified as being high risk for the likes of heart failure, stroke and diabetes.
How it works
The input: Health records
Health records can be structured or unstructured
Structured: can be stored in a table
Unstructured: can’t be stored in a table, different shapes/sizes (e.g. text, audio, images)
Example of structured data in health records
ID
BMI
Age
IMD Decile
Smoker
Blood Pressure
1
17
49
3
1
110/70
2
25
67
1
1
129/70
3
20
39
8
0
140/90
4
28
81
6
0
130/85
5
29
41
4
0
120/80
Data is consistent within each column in the table.
Example of unstructured data in health records
ID
Notes
1
Shortness of breath
2
Patient attended clinic following one week of fever, vomiting, and abdominal pain.
The length of each sentence is different - data not consistent.
A simple approach to classifying data: KNN
Clustering algorithms like K Nearest Neighbours (KNN) are on the more basic end of the scale, requiring very little computational power.
1
A simple approach to classifying data: Decision Tree
1
There are many different models out there! 🥴
1
What makes a model simple or complex?
There are dozens of different algorithms out there
Each algorithm has different strengths and weaknesses
What makes a model simple or complex is the amount of computational power required and how much the model needs to “learn” - how many parameters there are
Is the input or the computation complex?
“We used UK primary care EHR data from 2,081,139 individuals aged ≥ 30 years…
We trained a random forest classifier using age, sex, ethnicity and comorbidities (OPTIMISE).”1
Pros and cons of simple “A.I.” approaches
Pros:
Simple models are more easily explained
Can sometimes find new patterns in the data
Cons:
The quality of the data determines the quality of the model
Not able to handle very complex tasks
🚩 Issues to look out for 🚩
How complex is the input, or the computational approach?