Episode 27 – Predicting Mortality

Episode 27 – Predicting Mortality


Welcome to the data science podcast.
My name is Lexi and I’m your host. This podcast is free and independent
thanks to member contributions. You can help by signing up to support
[email protected] for just $5 per month. You’ll get access to the
members only podcast, data science, ethics and pop culture at
the $10 per month level. You will also be able to attend live
chats and debates with Maria and I. Plus you’ll be helping us to deliver
more and better content now on with the show. Hello everybody and welcome to
the data science ethics podcasts. This is Marie Webber and Lexi Cason and
today we are going to be talking about AI that is being used
to predict mortality. So this is an article that Lexi actually
found on life science and it’s talking about when an AI was developed based off
of data that’s actually in the UK bio bank. And they were looking at people from
the ages of 40 to 69 years old and the algorithm that they developed performed
better than other techniques they had used before. So do you want to talk a little bit about
those techniques and how they differ? Sure. There have been algorithms developed. They’ve been out there for a few
decades now to try to predict, I’m going to say survival and in
this case it’s literal survival, but in a lot of cases it’s
when something will end, when is the likely end point
of something. So for example, I’ve used this to predict a
customer churning out of a contract. In this case it would be
somebody passing away. The technique that was most commonly
used was a cock survival analysis or Cox regression model. That’s one of the algorithms
that they compared in this study. The other ones they used or a machine
learning technique called a random forest and then a deep learning technique that
used a number of layers within a neural net that would then
construct its own rules. And as we’ve talked about
on this podcast before, neural networks and deep
learning techniques are a
little bit more black box and that you put in the layers that you
want it to go through but you don’t necessarily know all of the rules as
to what it came up with and they were specifically using these to look for
who have these 500,000 people that they were studying were more likely to die
prematurely from chronic diseases. The Cox model was about 44%
accurate and tended to overpredict. The random forest was about 64% accurate
and the deep learning model was about 76% the other part that it talked about
in the article was that the factors that these different algorithms came up
with as being important varied a bit, so all of them as their top
factors used, age, gender, smoking history and a
prior cancer diagnosis. But then it started to
get a little bit varied. The Cox techniques used ethnicity and
physical activity as their next most important factors, random forest used body fat percentage, waist circumference and skin tone, which while potentially
linked to ethnicity and
physical activity are a little bit more indicative of that individual’s
characteristics as opposed to a more general characteristics of groups and we can also say could be tied to
health risks like the risk of developing skin cancer. Absolutely, which a prior cancer diagnosis was part
of that and so think about different types of cancer that somebody
could potentially get. That’s absolutely one of the factors. The deep learning model on the
other hand saw job hazards, air pollution, alcohol consumption, and the taking of certain medications
as its next highest factors. What fascinated me about this wasn’t so
much that we could predict it as cool as that is, but I had to wonder, should we tell people this stuff and what it makes me think of is I know
personally that I’m somebody that’s interested in health and I’ve looked
at different things like, okay, what are things that I
can do to live longer? I once worked with somebody who said
triple digits baby and they wanted to live to be a hundred years or older so you
know, looking at things like, okay, your physical activity, what you eat, do you get enough movement during
the day? Even like where you live. Those types of things can be factors
for how long you live in life, so there, there are definitely people out there
that are interested in this and who have an interest outside of it being
part of their, their profession. I think for those people they’re very
interested in something like that. It’s like, Oh, you can tell
me that based on what I eat, you know I could potentially live longer
and I think there are people who do take those steps. I think there are
other people potentially that are like, nope, I enjoy what I eat. For example, I know people that have had doctors
prescribed them a diet that is maybe lower insult or lower in fat and they’re
like, nope, are lower in sugar. Right, exactly. They decided that, no, I’d rather continue to eat the things
that are salty and bad for me because I don’t like it otherwise. Or they don’t
put in the effort to to make the change, see how it helps them feel. So maybe I’m just cynical that
the reason that I thought, should we tell people that they’re more
likely to die prematurely and 76% for what it’s worth is a pretty
good prediction in this case. Yeah. So out of a hundred people, it could help give 76 people
an opportunity to make
a change that would help them live longer
potentially. If all of it, those hundred people, we’re the ones
predicted to die prematurely. Yes. Yeah. That said, the reason I think about it is, is it better to know or not to know? And this is one of those
kind of paradoxical questions
of what happens if you know you’re going to die tomorrow? Are you going to go out and do something
horrible or are you gonna take that time to do something
great? Or in this case, I mean obviously it’s
not going to be tomorrow. To clarify, this algorithm was predicting there
more or less long term mortality. These are already people that were in
their forties to sixties to sixties let’s say. We’re just taking somebody who’s in
their sixties the average probably is 20 years that they have. So you’re trying to predict who is
going to die earlier than that 20 years potentially, right? Yeah. So there is still a considerable
amount of time that said, what do you do if you know, if you know that your history of smoking
for the last 40 years has contributed to the fact that you’re very
likely to die prematurely. Do you try to quit now or
do you go to late guests? I’m just stuck. I’m
going to die prematurely. I’m still gonna smoke
and this to some degree, it makes me think of how people have
acted in the face of a lot of medical evidence up to this point. I kind of wonder if we’re able to more
specifically pinpoint that a given individual is very likely
to die prematurely if it
takes away that unrealistic optimism bias of, I know that smoking causes
cancer and premature death, but that won’t happen to me
even though I’m a smoker. That’s an optimism bias or
unrealistic optimism bias. Does it take away that
safety blanket and say, nope, it really is going to happen to you.
You need to change your behavior. You specifically need
to change your behavior. I don’t know the intent of the
people that developed this algorithm, but given that it’s was developed in
the UK and they have a form of health insurance in the UK that’s
more accessible to everyone, that they are trying to reach those
people where they could have an impact and give them opportunities to understand
the risks and that they might die prematurely and then hopefully be able
to correct it if there are things within their control to to change if it’s air
quality or work conditions that might need the institutions to
step in and help address. That is not an something that an
individual can address that easily. But the, the hope would be if I give you this
information and I can show you that it can provide you with a different outcome, that you would take actions towards
that better outcome for yourself. And when we talk about
institutional changes here, these are things like regulations
that would be put in place, changes to social programs
for assistance. For example, for mobility, meaning for a geographic mobility so
people could move out of areas that are problematic to their health, potentially having additional funding for
things like smoking cessation programs or alcohol counseling or other types
of things that are factors that these models have been using that were
identified as major risk factors. It could be retraining for jobs,
it could be all kinds of things. So those would be institutional
changes that would come in. Exactly. Another example would be even here in
the US in the seventies I believe it was, there was a problem with more lead being
in our environment because there was lead in gasoline. So there was a initiative to get
unleaded gasoline and get that out of the environment. Exactly. So it was a regulation that was put
in place that changed the way that we consumed the product and led to less led. The other one that comes to
mind of course, is tobacco. Where over time we’ve seen larger, more prominent warnings
on tobacco products. There’s been stricter regulation around
who’s able to purchase tobacco products. There’s been a lot more taxation to
make it economically less feasible for people to purchase tobacco products
and consume tobacco products. Those are all institutional changes and
things like that could be potentially affected. Here. It’s
not a scenario tester. It doesn’t say if you change your job
right now from one where you’re put into hazardous situations to one where you’re
not put into hazardous situations, you will gain five years. It doesn’t
give you that level of specificity. It doesn’t say if you move to the country
and out of the air pollution right now, you’ll get another
three years. None of that. All it says is based on everything
that has happened till now. Here’s where you’re at. You’re likely
or unlikely to die prematurely. So without that more
scenario based information, it really leaves it to the physicians
still to advise their patients to try to take action around their health and
around the circumstances that are contributing to hazard to
their health. But again, I feel like we’ve been
down this road forever. Physicians have always been trying to
tell people to take care of themselves and people have largely ignored it unlike you Marie or eat healthier or like you were
saying smoke less or exercise more or do more truly stress or don’t drink alcohol, but
when you talk about enough sleep, how sheeo that wasn’t in any of
the models. I will have you note there are some very large things that
are much more systemic that these, that these models are
pointing to like job hazards. Are you going to be able to immediately
change your career path simply because a modeled hold you that your job is
putting you at risk or air pollution? How much can you realistically
affect the air pollution around you? Not very much. You would have to move in order to go
get to a place that has a better air quality in order to avoid that risk. And again at that point, how
much are people willing to do or how much are people in different
economic situations able to do? Very true. Like some people might not really have
an option to pick up and move to the country, especially if they don’t have the the
means to start a business out there or if they can’t find a job in those areas. So there are implications
where the data could be saying, you know, because you worked in Xyz
Industry for for 20 years maybe, and maybe you’re not
even doing that anymore, but because you did that in your past
and you can’t change your past. Yeah, you right now this is, this is going to potentially
lead to premature death. Hopefully there would be things that they
could recommend because the human body is always healing and rebuilding. So to the extent that you can give it a
chance to heal or recover from some of those things, you know, hopefully doctors would be able
to point people in that direction. But if there are some things
that are baked in at that point, it goes back to your question, is it
better for people to know or not to know? Yeah. The other thing, and this is a seriously
shaky ground for ethics. The other thing is, is it a good thing for us to always
try to increase life expectancy? Yeah, that’s a good question because if
you play it out, and I mean there, there are definitely people that are
working towards this as a goal of increasing life expectancy. So more people can live to a hundred
triple digits maybe or maybe 120 or some people are even starting to theorize
that you could push human life expectancy two to 200 or even 300 probably not
any of us that are in our professional careers right now, but there are some people that theorize
that there are children being born today that could live to be
well over a hundred plus. So the ideas of doing that at
an entire population scale, if we can get the whole population so
they’re not just having a life expectancy of 80 but a hundred how does
that impact the whole system? There’s a lot of back and forth in it
because if everyone’s living to a longer age and contributing to
more overpopulation in
certain areas and so forth, does that affect air pollution, which then affects premature death
or does that affect other types? Or as we start to live longer and as we
continue to automate a lot of processes, does it take people out of
having hazardous jobs and
now robots are doing those jobs, so now it’s less likely that
they would die from a job hazard. All of these different things. And I think there’s an important
distinction between living longer and also having more productive years. And I think ultimately the goal will
be not for people just to live longer, but how to increase the
productive years that people have. So you don’t have extra burdens on society
where everybody needs to be in like an assisted living facility starting at
the time that they’re 70 and that lasts for 50 years until they’re 120 part of
what we’re talking about here is not just related to the data science ethics, but also related to other layers. So when we talk about these
institutional changes, that’s then society making a judgment
call on saying that it’s important to help people live as long as possible and be
as healthy as possible and making those changes and showing that that’s a
value that you want to to move towards. That means, as we talk about this and
we talk about anticipate an adversary’s, it’s also important to think about how
this could potentially be used in the wrong hands because you’d want to
make sure that if governments or organizations, we’re making
decisions on this data, that they were making those
decisions and ethical way. A couple of ways I think about
anticipating adversaries here. One is at the patient level, meaning that if you tell somebody
they’re likely to die prematurely, that they’re going to handle
that in a responsible manner. They’re going to do more for their health. They’re going to get
their affairs in order. If they don’t think that they’ll be
able to kind of escape the algorithm, escape the, they’re premature death. The flip side to that though is
the people who would say, well, I’m going to die anyway. Guess I’m gonna live it up however I
want to live it up and do all kinds of irresponsible things. That is
one form of adversarial behavior. Another would be having this algorithm
in the wrong hands and not equally conveying information to
patients. So for example, if we know this algorithm exists, but we only are able to talk
to people who are wealthier, who are of a specific ethnic
background and so forth, those are the people who are likely
to then have the opportunity to take responsible actions and live longer versus
being able to equally convey this to everyone in the population so that they
could all benefit from knowing whether they’re at increased risk. We need to think about this from multiple
different perspectives as to how it could be mishandled. The other thing that that points to is
getting physicians more comfortable with algorithms like this, understanding them and being
able to convey to their
patients what the algorithm is telling them what their patients could
or should be doing to take charge of their own health and try to extend
their life expectancy. Right now, there’s not been a lot of training for
physicians out in the field who have been practicing for maybe decades on new
algorithms and new modeling techniques and on these new types of
researches that are coming out. It’s interesting to think about where
this algorithm is right now and if you train it well, the assumption would be that it would
get more accurate in its predictions. So right now today, if somebody looks at it and it’s a 76%
chance that it’s going to be right, does that impacts amaze behavior as much
as when the accuracy is maybe up to 95% something to be said for a more accurate
algorithm and and indicating to someone that you’re 95% confident that they’re
going to die prematurely versus 76% confidence that they’re going to
die prematurely. However, again, we have been down this road many
times before in many different medical situations where we’ve conveyed to the
population that certain substances, certain actions lead to outcomes that
they don’t want and they still do them regardless. So how confident do you have to be for
people to get past that optimism bias? Is there ever a number that you could
tell somebody where they could not dismiss it and it would cause them
to change their action? If you tell them you’re 100% confident
that they will die prematurely, they’re going to say, oh, well I guess
that’s that. It becomes dismissed, so does it matter how
close you get to 100% I think it could potentially matter in
terms of helping doctors communicate the value of the algorithm, especially with something like this and
trying to communicate it to a larger population. If they can point to it
as being more accurate, then I think that’s going to help certain
segments of the population trust it more. One of the things I find
really interesting about this
algorithm is that most of the factors that it’s pointed
to, our behavioral, very true, it’s keeping your body
fat percentage down, keeping your diet in check, not
drinking alcohol and access, not using tobacco products, trying to stay in a healthy environment. A lot of the factors that it’s pointing
to are things that we can try to change. Even when we go back to the question
that you had about how do we communicate this with the people that
are representative in this
study based on their data in the UK bio bank, they’re going to have to talk with people
from a lot of different walks of life and not just the people that have heard
about the service and are knocking on their doctor’s door saying, hey, I
want to see what my results were. This is going to include
everybody basically in that
age population of 40 to 69 where they’re working with
them on probably, you know, an annual basis or whatever their,
their appointment schedule is to say, here’s information that we have that’s
important for you to understand and trying to get them to understand
how to take action on it. To be clear though, this data predicted deaths within the
six years of Longitudinal study that they had, so from 2010 to 2016
those people are already gone. So now the question is what
is the incremental prediction
for the next six years or however long? For the people who
remain for whom they had data and is it transferrable to the next population
of 40 to 60 that’s coming nine that’s coming up. It’s looking at a pretty large
population and saying, okay, we studied this 10 to 15 years
ago. We started collecting data. How many of the people who are now coming
up into that age range have the same factors or have the same issues to to
look at some of the variables that the deep learning technique pointed to where
the taking of certain medications makes me wonder which medications those were
and if they’re still on the market because there’s a very real possibility
that some medications may have been provided at various times and then later
it was found that they were or whatever it might be. They were
pulled from the market. Would those medications still be
applicable to the next group of people, for instance, and if that was one of the major factors
influencing the accuracy of the model, how accurate is the model going to be
next time around and that goes back to train transparently. What are the factors that you’re
putting into your algorithm? Exactly. Hence why the data science
process is cyclical, so cyclical, always going back, always
revisiting, always revising. All right, well thanks everybody for joining us for
this quick take on the algorithm that was developed based on the UK bio study. This is Marie Webber and Lexi Cason
Talk to you next time. Thanks so much. We hope you’ve enjoyed listening to
this episode of the Data Science Ethics podcast. If you have, please like and
subscribe via your favorite podcast App. Join in the
[email protected]
or on Facebook and Twitter at ds attacks. Also, please consider
supporting us for just $5 per month. You can help us deliver
more and better content. See you next time when we discuss model
behavior. This podcast is copyright. Alexis Cason all rights reserved music
for this podcast is by Dj Shaw money. Find him on soundcloud or
Youtube as DJ shadow money beats.

Leave a Reply

Your email address will not be published. Required fields are marked *