Data science in the UK NHS

Building a model and building a community

6 June 2025

The New Hospital Programme (NHP) Demand Model

Diagram showing the processes of the model

Diagram of model process

Diagram showing the model infrastructure layout

A diagram showing the architecture of the model and apps. Data is processed from the database by the targets package and stored in an Azure storage account. the model and inputs and outputs apps collected data from there. Selections in the inputs app feed into the model.

Principles

  • Deploy alongside develop
  • Reproducible analytical pipelines
  • Transparent
  • Open (FOSS where possible)
  • Team skills and work management

Tools and platforms

  • Data pipelines: , parquet, CSV
  • Model: Python , Docker
  • Apps: {shiny} and {golem} , Posit Connect
  • Infrastructure and storage: Azure
  • Documentation: Quarto
  • Version control and collaboration: Git , GitHub

Data for NHP

  • HES data stored on Azure
  • Databricks and PySpark
  • CSVs and TSVs for reference data
  • parameters derived and stored in JSON

Model running

Parameters JSON passed to model via API

  • Docker image stored on Azure Container Registry
  • Runs in Azure Container Instance
  • Built-in paralellisation features of Python language

Interfaces

Coded in R - Shiny Open repos: nhp_inputs and nhp_outputs

Modular design

  • Each function is in its own .R
  • Each module is separate
  • UIs and servers are separate
  • Packaged using {golem} to use R CMD check(), devtools::document() etc

Outputs

Many formats:

We use a _brand.yml to help easily maintain style format of outputs (Quarto and Shiny).

Reproducible Analytical Pipeline (RAP) principles

Data versioning

Data and model follow the same semantic versioning1.

Data Log

The Strategy Unit GitHub

RAP

  • Modularised code
  • Styling and Linting
  • OPEN source code
  • Version control
  • Environment management:
  • Docker
  • {renv}
  • conda

RAP continued

  • Documentation
  • JSON schema {✔❌} for validating the parameters passed to the model

Deployment (CI/CD)

How do we maintain clean, safe, working code, centrally, when we have open repositories and up to 10 people collaborating on maintaining that code alongside its active deployment?

Continuous Integration

Automated checks, tests when merging code into main

  • On pull request submission
  • On merge to main

Continuous Deployment

Automated checks and tests when deploying (to dev or to prod)

  • On merge to main
  • On release

GitHub Actions

Actions like:

  • Formatting ({Air}) to pick up stylistic inconsistencies
  • Linting - for logical, syntactic and stylistic issues
  • Rendering the README.Rmd
  • Package checks (benefits of creating a Shiny app as a package)
  • Deploys to a ‘preview’ site
  • Assess code coverage ({codecov})

How we work

We are AGILE, and use Scrum (light)

  • 3 week sprints with 1 week fallow
  • Weekly sprint catch-up meetings
  • Kickoff and retro
  • Promote T-shape expertise while reducing ‘bus-factor’
  • Transparent prioritisation processes
  • Distinct team roles

  • Scrum master : keeping the sprint on track
  • Product owner : steer work towards the goals
  • Project director : overall responsibility for delivery
  • Development board : define the goals and priorities
  • QA : oversee quality

Tip

Roles have enough specificity to provide clarity, but are also shared. Flat management structure.

Agile and Scrum in GitHub

Leverage a LOT of GitHub’s excellent tooling

  • Projects
  • Issues with bespoke labelling
  • Branch protection rules
  • CODEOWNERS
  • Clear and consistent collaboration guidance
  • Checklists

CODEOWNERS

A simple but powerful idea!

Product Team

‘The model’ is a product - it has current and potential use cases and user groups.

We need a team responsible for understanding the software business as well as the software product 🚀.

“What should we build next and why?”

NHS-R

  • I learned R in 2009 in the bad old days
  • I swore that I’d make sure that others didn’t suffer like I did
  • NHS-R is the fulfilment of this promise, created in 2018

Core values of open source

  • Transparency
  • Collaboration
  • Release early and often
  • Inclusive meritocracy
  • Community
  • Work across organisational boundaries (obviously)

Core values

  • Flat hierarchy
  • Sharing
  • Cooperate across organisational boundaries
  • We cooperate across international boundaries
  • We love beginners
  • We make mistakes and learn together

What is NHS-R?

  • Culture > Strategy
  • Doing > Talking
  • NHS-R is your permission to work your way
  • Nobody ever asks us to do our best work
    • NHSRplotthedots
  • We know what to build, we know what to learn
  • “Computer (department) says no”

Open source

  • We believe in open source
  • All NHS R solutions are open source
  • We teach git and GitHub and encourage organisations to share their code
  • We build stuff together because we believe in the value of the community

No such thing as a free lunch

  • We believe in R, we believe in the NHS-R community, and we believe in each other
  • NHS-R is not a LinkedIn certification
  • It cannot be bought, sold, or exchanged
  • You can’t buy a community
  • But you can and must buy the glue that binds them together

Force multiplier

  • Code is a force multiplier
  • Wickham, 2014 https://bit.ly/3jQ5SuJ
  • So is a community
  • NHS-R is absurdly cheap and its ROI is absurdly high
  • NHS-R is making people happy and productive
  • NHS-R is changing the lives of its members and improving healthcare for everyone in the UK

What’s the connection between the NHP model and NHS-R?

  • We’re not perfect but we’ve done the right thing
    • Open code
    • Open technologies
    • Standard datasets
    • Documentation
    • Modularity

“Free as in piano”

  • There are huge obstacles for other teams in using and contributing to the code
  • Some are inherent- e.g. access to data
  • But across the system people lack access to:
    • Skills
    • Software installs (just Python!)
    • “Kit” - cloud compute, Posit Connect, etc.

How can NHS-R help?

  • NHS-R showcases the benefits of RAP
  • NHS-R demystifies the “risks” to IT departments who refuse to install R/ Python
  • NHS-R gives people the training and the community to learn things together
  • NHS-R loves beginners but NHS-R also shows off the best data science going on in the NHS