Identifying patients at risk

Is this AI?

10 October 2024

AI could help identify high-risk heart patients1

The University of Leeds has helped train an AI system called Optimise, that looked at health records of more than two million people.

Of those two million records that were scanned, more than 400,000 people were identified as being high risk for the likes of heart failure, stroke and diabetes.

How it works

  • The input: Health records
  • Health records can be structured or unstructured
    • Structured: can be stored in a table
    • Unstructured: can’t be stored in a table, different shapes/sizes (e.g. text, audio, images)

Example of structured data in health records

ID BMI Age IMD Decile Smoker Blood Pressure
1 17 49 3 1 110/70
2 25 67 1 1 129/70
3 20 39 8 0 140/90
4 28 81 6 0 130/85
5 29 41 4 0 120/80

Data is consistent within each column in the table.

Example of unstructured data in health records

ID Notes
1 Shortness of breath
2 Patient attended clinic following one week of fever, vomiting, and abdominal pain.

The length of each sentence is different - data not consistent.

A simple approach to classifying data: KNN

Clustering algorithms like K Nearest Neighbours (KNN) are on the more basic end of the scale, requiring very little computational power.

Example of k-nearest neighbour classification, with red triangles representing one class, blue squares representing another, and a new datapoint as a green circle which has two close red triangle neighbours and one blue square neighbour.1

A simple approach to classifying data: Decision Tree

Example of a decision tree1

There are many different models out there! 🥴

Complex diagram showing a decision tree for choosing the right estimator for different machine learning problems1

What makes a model simple or complex?

  • There are dozens of different algorithms out there
  • Each algorithm has different strengths and weaknesses
  • What makes a model simple or complex is the amount of computational power required and how much the model needs to “learn” - how many parameters there are

Is the input or the computation complex?

“We used UK primary care EHR data from 2,081,139 individuals aged ≥ 30 years…

We trained a random forest classifier using age, sex, ethnicity and comorbidities (OPTIMISE).”1

Pros and cons of simple “A.I.” approaches

Pros:

  • Simple models are more easily explained
  • Can sometimes find new patterns in the data

Cons:

  • The quality of the data determines the quality of the model
  • Not able to handle very complex tasks

🚩 Issues to look out for 🚩

  • How complex is the input, or the computational approach?
  • How is the model’s performance measured?
  • Does the model get updated?
  • Where did the data come from?
  • Have issues of bias or ethics been considered?