HACA 2023
Jul 11, 2023
{targets}
& Sql){shiny}
){shiny}
)“We expect in the future to see between a 25% reduction and a 25% increase in this activity”
“We expect in the future to see between a 20% reduction and a 90% reduction in this activity”
“We expect in the future to see between a 2% reduction and an 18% reduction in this activity”
id | age | sex | specialty | los | f |
---|---|---|---|---|---|
1 | 50 | m | 100 | 4 | 1.00 |
2 | 50 | m | 110 | 3 | 1.00 |
3 | 51 | m | 120 | 5 | 1.00 |
4 | 50 | f | 100 | 1 | 1.00 |
5 | 50 | f | 110 | 2 | 1.00 |
6 | 52 | f | 120 | 0 | 1.00 |
Start with baseline data - we are going to sample each row exactly once (column f
).
id | age | sex | specialty | los | f |
---|---|---|---|---|---|
1 | 50 | m | 100 | 4 | 1.00 |
2 | 50 | m | 110 | 3 | 1.00 |
3 | 51 | m | 120 | 5 | 1.00 |
4 | 50 | f | 100 | 1 | 1.00 |
5 | 50 | f | 110 | 2 | 1.00 |
6 | 52 | f | 120 | 0 | 1.00 |
age | sex | f |
---|---|---|
50 | m | 0.90 |
51 | m | 1.10 |
52 | m | 1.20 |
50 | f | 0.80 |
51 | f | 0.70 |
52 | f | 1.30 |
f |
---|
1.00 × 0.90 = 0.90 |
1.00 × 0.90 = 0.90 |
1.00 × 1.10 = 1.10 |
1.00 × 0.80 = 0.80 |
1.00 × 0.80 = 0.80 |
1.00 × 1.30 = 1.30 |
We perform a step where we join based on age and sex, then update the f
column.
id | age | sex | specialty | los | f |
---|---|---|---|---|---|
1 | 50 | m | 100 | 4 | 0.90 |
2 | 50 | m | 110 | 3 | 0.90 |
3 | 51 | m | 120 | 5 | 1.10 |
4 | 50 | f | 100 | 1 | 0.80 |
5 | 50 | f | 110 | 2 | 0.80 |
6 | 52 | f | 120 | 0 | 1.30 |
specialty | f |
---|---|
100 | 0.90 |
110 | 1.10 |
f |
---|
0.90 × 0.90 = 0.81 |
0.90 × 1.10 = 0.99 |
1.10 × 1.00 = 1.10 |
0.80 × 0.90 = 0.72 |
0.80 × 1.10 = 0.88 |
1.30 × 1.00 = 1.30 |
The next step joins on the specialty column, again updating f
. Note, if there is no value to join on, then we multiply by 1.
id | age | sex | specialty | los | f | n |
---|---|---|---|---|---|---|
1 | 50 | m | 100 | 4 | 0.90 | 1 |
2 | 50 | m | 110 | 3 | 0.90 | 0 |
3 | 51 | m | 120 | 5 | 1.10 | 2 |
4 | 50 | f | 100 | 1 | 0.80 | 1 |
5 | 50 | f | 110 | 2 | 0.80 | 0 |
6 | 52 | f | 120 | 0 | 1.30 | 3 |
id | age | sex | specialty | los |
---|---|---|---|---|
1 | 50 | m | 100 | 4 |
3 | 51 | m | 120 | 5 |
3 | 51 | m | 120 | 5 |
4 | 50 | f | 100 | 1 |
6 | 52 | f | 120 | 0 |
6 | 52 | f | 120 | 0 |
6 | 52 | f | 120 | 0 |
Once all of the steps are performed, sample a random value n
from a Poisson distribution with λ=f
, then we select each row n
times.
id | age | sex | specialty | los | g |
---|---|---|---|---|---|
1 | 50 | m | 100 | 4 | 0.75 |
3 | 51 | m | 120 | 5 | 0.50 |
3 | 51 | m | 120 | 5 | 1.00 |
4 | 50 | f | 100 | 1 | 0.90 |
6 | 52 | f | 120 | 0 | 0.80 |
6 | 52 | f | 120 | 0 | 0.80 |
6 | 52 | f | 120 | 0 | 0.80 |
id | age | sex | specialty | los |
---|---|---|---|---|
1 | 50 | m | 100 | 2 |
3 | 51 | m | 120 | 1 |
3 | 51 | m | 120 | 5 |
4 | 50 | f | 100 | 0 |
6 | 52 | f | 120 | 0 |
6 | 52 | f | 120 | 0 |
6 | 52 | f | 120 | 0 |
After resampling, we apply efficiency steps. E.g., similar joins are used to create column g
, which is then used to sample a new LOS from a binomial distribution.
numpy
and pandas
.parquet
format for efficiency.json
file{targets}
and SqlA {shiny}
app that allows the user to set parameters, and submit as a job to run the model with those values.
A {shiny}
app that allows the user to view the results of model runs.
view slides at https://tinyurl.com/haca23nhp