More Intelligent Tomorrow: a DataRobot Podcast

Making Data Science Bilingual - Najat Khan

February 15, 2022 More Intelligent Tomorrow Podcast Season 2 Episode 4
More Intelligent Tomorrow: a DataRobot Podcast
Making Data Science Bilingual - Najat Khan
Show Notes Transcript

Featuring: 

Dr Najat Khan, Chief Data Science Officer, Janssen Global sits down with Ari Kaplan, AI Evangelist, DataRobot, for this episode of the More Intelligent Tomorrow podcast. 

Najat Khan thrives in her mission to solve the world's most challenging health problems. With data science, AI, and machine learning, she endeavors to bring critical medicines to the public with efficiency, effectiveness, and inclusivity. 

When COVID-19 hit, working for Janssen Global and with Johnson and Johnson, Khan set about constructing a team of more than 100 “bilingual” data scientists. Not only did they understand the language of data science, they also brought medical expertise like a PhD in neuroscience or a background in oncology.  

“Every single thing we do needs to be purpose driven. You can’t start with ‘I saw this really cool algorithm that was published. We should do something with it’. If you go after the shiny objects, it doesn't work. You have to continually ask, ‘What’s the business problem we’re trying to solve?’ To me that is the critical foundation.”

With the business problem fully articulated, she embedded her team into every product, regulatory, clinical, and operational group tackling COVID-19. They worked “shoulder to shoulder” through every critical decision point along the vaccine building workflow. 

To accelerate the process of identifying global hotspots and potential risks for vaccine trials, they partnered with MIT to build what would become one of the world’s largest machine learning programs. The resulting longitudinal dataset included socioeconomic, racial, and health data, with higher-than-census diversity and inclusiveness factors. 

“The hardest part was not just the technical aspect, but also the cultural change management. When a company has never used one of the biggest machine learning programs, and all of humanity is watching to see if it is successful, it takes a lot of guts, will, and fortitude to see it all the way through to measurable impact.”

Listen to this episode of More Intelligent Tomorrow, to learn: 

  • How data science can help us to be better prepared for the next global health crisis
  • Where the life sciences research and industry is headed in the next 5, 10, and 50 years 
  • How data science, AI, and machine learning can change health care 

“We can never say what we're doing in data science today is enough. We need to constantly push boundaries.”


Najat Khan (00:00):
It's the what that people get wrong. They go after the wrong question, or they go after something where it's technology first and not business problem first. That happens a lot. Someone will email me, "Hey, I saw this really cool algorithm that was published. We should do something with it." But to what extent? See being purpose-driven is really, really important, or else you can lose focus. And you can only get so many shots and only so much time in a large organization to show credibility that you're able to deliver value. It's extremely important to have high quality, high caliber data scientists, statisticians, epidemiologists, that know how to do this well. One of the things is there's so many studies being done, Ari, and some of them, the methodologies are all over the place. And that can lead to not very good decision-making.

Ari Kaplan (00:54):
Dr. Najat Khan, thank you so much for being on the podcast.

Najat Khan (00:58):
Hi, Ari. It's great to be here. Thanks for having me.

Ari Kaplan (01:01):
Sure. So why don't we start off telling us a little bit about yourself?

Najat Khan (01:06):
Sure, happy to. I'm Najat Khan. I'm the chief data science officer at Janssen R&D, and I'm also the global head of strategy and operations. So in my role, I head up both, how do we think about the right R&D strategy pipeline? Is this truly the engine of the medicines that we make that drives any life science and pharmaceutical company? And then also combining that with how do we apply AI machine learning, digital health and real-world evidence, all that encompasses data science for us, to enhance how we make our medicines better, faster for our patients?

Ari Kaplan (01:39):
Yeah. Very exciting. And one interesting thing. I love your background. Many data scientists at your level have diverse careers, typically coming from STEM before data science, but one thing that's very unique to you, you have a PhD in organic chemistry from UPenn.

Najat Khan (01:57):
Yes. So yeah, it's been quite a non-linear career, which I think is really the way of the future because it's the interface of a lot of different disciplines where I think we're going to see value going forward, value that's above and beyond what we're doing today. So I started off actually in my undergrad doing chemistry and computer science and econ, so quite a strange blend as everybody said, but I really wanted to understand how can you code, but then also how does that tie into understanding science better. And then when I did my PhD, it was a real combination of organic chemistry, which is physically making molecules and then testing them in vivo and vitro to see how they work. But very early on in my research, I realized that being able to predict what could work versus not would make you much more targeted and much more effective.

Najat Khan (02:49):
So I actually leveraged my computer science background as one of the strange grad students who was going downstairs in the super computer labs and trying to build code and predict. And that really helped me have a very productive PhD. And that's where I saw the interface of computer science and science really play together. After that, I spent number of years in management consulting. And the reason for going there is while it's great to know different disciplines, but how do you put it to action to have impact? How does the whole healthcare ecosystem work? How do pharmaceutical companies, life science companies? In a nutshell, I would say my background is really a confluence of computer science, science, and then business strategy and leadership. And it's all those three components that I really use on a daily basis in my current roles today. Sounds very nice when I talk about it retrospectively, but prospectively, it was really a focus on having the core mission of helping patients and how do we use every innovative approach to do what we do today better and keep progressing.

Ari Kaplan (04:02):
Yeah. Yeah. That's really exciting. I love seeing the work that you did. And yeah, one of the interesting things is you also helped arrange partnerships between UPenn and leaving pharmaceutical large companies, and now you're in that position. So what do you think about the partnership of the academic world and pharma or healthcare in general?

Najat Khan (04:23):
It's a very important collaboration. I remember when I was at UPenn, so I was actually a fellow for our tech transfer office, which I volunteered for because I wanted to understand how are all these great innovations by grad students and professors really translating to a better medicine, a better diagnostic device, something for patients. And that arc is sometimes hard to see when you're in grad school, but when you partner with a pharmaceutical company, you get to see the end to end. How does it move from an idea to a true product medicine impact for patients? Now being in a pharmaceutical company, what I really enjoy is that a lot of these cutting edge innovations are coming from academic institutions. And I try to give back by ensuring that we accelerate the adoption, the progression, whether it's in science or data science of these great ideas into reality, again, benefiting patients in the healthcare ecosystem overall.

Ari Kaplan (05:23):
Yeah, that's great. And now with your role at Janssen and also working along with Johnson & Johnson, your role largely the chief data science officer, and that's fairly relatively new in companies. So let's start off, what does the CDSO do?

Najat Khan (05:40):
It's a great question, and especially, I would say you're right, Ari. In life science companies, it's a very new role. So the role of the CDSO, I would say, is being able to pull together both the strategy, but then also drive impact for actually applying the core elements of data science - AI, machine learning, digital health, and real-world evidence - to the most challenging problems as you're designing and developing a medicine to do two things: (a) to improve the likelihood of success that a medicine's actually going to work or therapy's actually going to work, and number two is to find ways to do it better and faster. And both of them at the end of the core is how do we get a medicine to a patient faster? That's what the role is of a CDSO. A.

Najat Khan (06:32):
Nd the reason and I would say one other thing that at Janssen and specifically Janssen R&D is unique is I wear both hats of the CDSO, but then also head of strategy and operations for R&D. So it's that integration of the core product pipeline engine that drives a pharmaceutical company, coupled with data science, truly allows you to make it more applied and have a singular focus of bringing data science from this abstract idea to actionable. And I'm happy to share some specific examples of what we have done, but I think without that role, but then without that intent, it really becomes difficult to take data science from a cool transformation initiative that a senior leader talks about to embedding it into our ways of working and thinking in an organization that has done things in a very different way for the past few decades.

Ari Kaplan (07:24):
I love that topic. And in your group, I think I heard was there a hundred or so data scientists and engineers? How do you work? You're talking about shoulder to shoulder with the scientists. How does that relation look like?

Najat Khan (07:39):
Yeah, no, it's a great question. And I'll tell you, it's a big change exercise, right? It's not something that happened on day one. So I would say when I took on this role a little over a year ago, maybe a year and a half ago, there is a few things that I focused on. Number one is what are the priorities? What's the impact that you're trying to drive? Having that deep knowledge of the pipeline, but then working with our scientists and therapeutic areas to understand what's a good question to answer that's compelling, that's one dimension. But the second dimension is, where can data science have an impact? Ari, in many cases, the data sets are not even there, or the methodologies might not be robust enough. No point doing things that are not going to be rigorous, right?

Najat Khan (08:23):
So you have to do a diligence both from a scientific perspective and a data science perspective. That was step one to ideate and understand what are those big questions we will try to answer. That to do it well, Ari, you have to work with scientists and data scientists shoulder to shoulder, right? You cannot have them not be equal co-citizens or else the best outcome doesn't come through. So that was step one for us. But while I did that, it was also important to build the core foundations that would allow you to do this at scale. One, as you mentioned, is having a really good team. We had about 10 data scientists last, I don't know, May or June of last year. Now we have over a hundred data scientists and data engineers, but they're all bilingual. They understand the language of data science, of course, but then also medical science, PhDs in neuroscience or backgrounds in oncology.

Najat Khan (09:20):
Why is that important? Because that ideation phase that I just mentioned, building that strategy and priorities with clinicians, it's a much easier bridge to do when you understand and respect and appreciate clinical development, regulations, patient privacy, right? That's why it's important to have that duality of understanding. So that's one of the ways we've done it. The other way that we have done it that was a core foundation was embedding data scientists in every sort of critical decision point across the current workflow of how we build a medicine. Sounds kind of boring to say that like, "Oh my gosh, decision," but believe me, we all know if you're not around the table when the core decisions are being made or the core plans, if you're not around the table, you are never going to fully be able to impact something. So in life science companies, when you make a medicine or product, you have a product team. We have data scientists on the product team, along with regulatory folks, clinical folks, operations folks. They're part of one team. That's really important so that you embedded, it's not a cool idea that you drop in for a second, but it's part of our fabric of how we do work.

Najat Khan (10:32):
Those aspects, and it's a huge cultural change. And if you ask me what's been one of the hardest things, that is that, right? How do you actually balance change while also bringing the organization along and not have it feel like data science is coming in and changing everything that we do today, but balancing and respecting how things are done, but augmenting it. That's a tricky balance, and I think we've done a good job. There's more to do. But one of the core ways of doing that is to not just have a lot of process and strategy, but to actually show concrete examples of impact. So it was very important to fly the plane and then build it at the same time. And that's what gets traction and that pull-through in the organization where that change and the tolerance for change is accepted.

Ari Kaplan (11:22):
Yeah. I love that. Where do you start seeing then the collaboration of the human data scientists? And then what artificial intelligence or the automation of some of those steps are like? How does that collaboration look of humans and data science?

Najat Khan (11:37):
I always think that they're one and the same, right? Because in order to apply data science the right way, the human element of understanding the question, understanding the patient perspective, understanding what medicine or product you're making is so critically important. And then what you do, what algorithm you use, how do you automate? How do you curate the data? That's a how, Ari, to me. It's the what that people get wrong. They go after the wrong question, or they go after something where it's technology first and not business problem first. That happens a lot. Someone will email me, "Hey, I saw this really cool algorithm that was published. We should do something with it." But to what extent? See, being purpose-driven is really, really important, or else you can lose focus and you can only get so many shots and only so much time in a large organization to show credibility that you're able to deliver value.

Najat Khan (12:32):
If you go after the shiny objects, it doesn't work. And that's why I keep coming back to what's that business problem you're solving and being honest early on by doing good diligence that you can actually solve for it. Right? And you're going to test and learn and see what works. So to me, honestly, that is the critical foundation. The how, which is, do you use a deep learning algorithm or is it a plain old linear regression? It doesn't really matter. You go for the best solution, but it's what you're answering that I would probably put more emphasis on or folks I have seen it get it wrong sometimes.

Ari Kaplan (13:04):
What are some of the big healthcare, life science problems you're tackling?

Najat Khan (13:10):
That's a great question. So when I formally took on this role May of last year, think about the timing. That's when COVID hit, right? Here, I was trying to look at the pipeline, figuring out what are the right questions to answer. It was a blessing in a way, because it's really accelerated a lot of what I was trying to push through the organization as I was thinking through it, but using COVID as a way to make it real. When I look at the pipeline overall, we have about 90% of our clinical development projects where we're applying data science. That's a huge increase. Last year, it was about 10%. So it's been a massive momentum in the last year. For COVID-19, and you asked me about the aspect of working shoulder to shoulder, that was a first big example of doing that.

Najat Khan (13:54):
So when COVID-19 hit, we have our clinical team, our amazing vaccines therapeutic area, our operations team who are actually making things happen on the ground, and then we had data scientists, that third pillar. And so we started working shoulder to shoulder from day one, and there were so many questions we could answer, Ari, but I focused on two, three big ones. The first one, the first big, big question was vaccine trials are event-driven. That means that the higher the incidents where you have a site, a clinical site, the more data you can collect. Richer data set means better just analysis, but then also you can do things faster. And in a pandemic, you and I both know every day counts. So what we did is we actually partnered with MIT, going back to your point around partnerships with academic institutions, where we looked at models that they had built to predict COVID not at a country level, but at a county level. So high precision is needed. And not just US, but globally, because it's a global pandemic.

Najat Khan (15:01):
So we partnered with MIT, but we customized the model. We worked together. We would have check-ins every day at 7:00 PM. That's how closely we work together because we needed to customize it for trial. We factored in aspects like socioeconomic status. Where is there more racial diversity? Right? Because we wanted to create a model that would over-index on getting it right in terms of the right hotspots globally, but also we knew that certain minorities were more at risk. So you got to build a vaccine and have data that's representative of the people you will vaccinate. So long story short, we build the model. It was one of the largest machine learning, and we actually won awards for it, but we were backtesting it, Ari, every two weeks to see does it work. And that's what I mean by being super thoughtful, not just doing something because it's a cool thing to do.

Najat Khan (15:51):
And then fast forward, it helped us shave off about six weeks of our development timeline. That's a long time. Every day counts. We also had a very diverse and inclusive higher than census numbers diversity in our trial. We also ended up reducing our sample size because 90% of our predictions ranking the right count is globally we're accurate. And so we were able to go from 60,000 to 45,000 people in our trial. That makes it faster, also smaller because we had so much data we were collecting. We had more incidents that even the models predicted. It was really high. And then it also allowed us to get data on variants. We were in Brazil. We were in South Africa. So you see the impact, but believe me, the hardest part of it was the technical aspect, but also the cultural change management. Think about using a machine learning model that a company has never used for one of the biggest programs it was doing with the entire world watching. I mean, it takes a lot of guts, a lot of will and fortitude to actually believe in it and go for it for that.

Najat Khan (17:00):
Back in January, February of this year, we knew that all vaccine manufacturers would vaccinate their placebo arm, so the arm that's not given the vaccine, for ethical reasons, right? The incidents was going up and you just have to do that. So then the question is, how do you know if your vaccine is holding up against new variants, against new subgroups. It's really important for public decision-making reasons. So we started working with external partners. Everything you'll hear me say is always with external partners as well. Like for instance, Israel has beautiful because of their one health system longitude. We don't have that in the US. But do you just sit there and say we don't and that's what it is? Or do you try to solve for that problem?

Najat Khan (17:42):
So being proactive, we pull together the right data sets and we actually did one of the largest real-world effectiveness studies that show that the Janssen vaccine was stable in terms of effectiveness, both pre the Delta period and post the Delta period as well. Again, the most important aspect, and you'll see there's a lot of real-world effectiveness studies, the methodology, they are not all created the same. So the methodology has to be done really well. We partnered with Harvard Medical School and a couple of other companies as well. And it's also important to use this as a monitoring device, because again, it's completely unbiased. The data tells you whatever is happening and you can get much richer output than you could get in clinical trials. This is something that was a really important part of our evidence generation for our booster discussions so forth.

Najat Khan (18:31):
So you see how it became central. Something that we had never used before has now become central to what we do. So those are a couple of examples of what we have done for the COVID-19 vaccine. If you want, I'm happy to talk about... There's plenty of other examples that are non-COVID-19 as well that we're working on. There are... We know the kind of viruses that can lead to a pandemic like this, right? We kind of know. There's hundreds, but there's not millions. So why not start using AI machine learning, and some of this work is happening, to be able to predict which variants, which mutations could actually be problematic going forward? So I'm saying what I told you is a lot about what we're doing in clinical development, but go back in discovery, right? Start doing that analysis in terms of there are other coronaviruses, right? What could that impact be?

Najat Khan (19:22):
mRNA and other technologies are great in order to develop a vaccine, but if you can prepare even earlier from a discovery perspective and a research perspective, I think that is part of pandemic preparedness and I know there's a lot of talk about that. That should be happening now. And AI machine learning is great because there's a lot you can do to map out not just the genomic aspect, but then what is the phenotypical outcome as well, mapping the entire immune profile, immune system. That is one of the studies that we are doing for those who have gotten COVID-19, understanding what's happening to their immune system. That's a different study that we're also running. So these are all efforts we can learn now, which would help us make better vaccines of course, but then also therapeutics, right? I mean, therapeutics are catching up now, but why not have more earlier on?

Najat Khan (20:11):
And then the other thing I would say is... So that's the whole thing around research. That's one point. The second point is if a pandemic hit, which I mean, it will happen at some point or it's just what it is, how do we respond? One of the biggest challenges we had, if you remember, I live in New York City, March, April of last year, was even getting good data. What standard of care works? Our first-line workers, they were inundated with patients. Generating data and having good data for health systems was not something that was easy to do real time. I mean, you're trying to treat the patients. You have to make those priorities. Now is the right priority to make, but having really good longitudinal, high quality databases where you can actually in real time understand how patients are doing outcomes, who's at risk, right? Get real data.

Najat Khan (20:59):
I think we're starting to get some of that with some of the work that I just mentioned in terms of real-world effectiveness. Millions of claims de-identified that you can actually get real-times EHR lab results to really react real time. Imagine if you didn't have Waze, right? And you were just going to do a cross-country trip. You have no idea where there's traffic. You have no data and you're just using a map. I mean, we should not do that anymore. And for a pandemic, it's even more important because minutes, hours matter. So that's the second piece I think we've made progress, but I wouldn't want us to regress. It's really important that we have longitudinal near real-time data sets.

Najat Khan (21:35):
And then the third thing I will say is it's extremely important to have high quality, high caliber data scientists, statisticians, epidemiologists, that know how to do this well. One of the things is there's so many studies being done, Ari, and some of them, the methodologies are all over the place. And that can lead to not very good decision-making, and we need to be careful. I always worry about that, right? In a new field, it only takes a couple of bad examples for the whole field to regress, and I think there's so much opportunity that would be bad if we don't do that. So I think building that next generation of data scientists that are bilingual. I have a lot of folks that came bilingual. I have a lot of folks that have become bilingual by working with clinicians, just the nature of being in a organization where you're clinicians, operations folks, et cetera. That's our future. We need that. We need more of that because if we don't have them, then we're not going to be able to solve for the first couple of points that I mentioned well.

Ari Kaplan (22:35):
Speaking of the future, this podcast is called More Intelligent Tomorrow. We want to talk about where is humanity headed 3, 5, 10 years, 100 years. So where do you see humanity headed?

Najat Khan (22:48):
Where I see humanity headed, and a lot of it I think is going to be driven by AI machine learning and real-world evidence because that helps you make more data-driven decisions. I think first of all, the way we even define diseases is going to completely turn on its head. In terms of today, we define a lot of diseases based on phenotypic signatures. You see something, okay, this person has this disease. That person has X disease. We don't have sufficient granularity or where it's biomarker-driven to say it's not that, Ari, you have a certain cancer. It's actually that you have an EGFR, FGFR mutation, and that cancer can be a lung cancer, liver cancer, colon. So do you know what I mean? So turning it on its head, it's not about where the cancer shows up, the tissue, but it's actually what's driving. What's the driver mutation of that cancer?

Najat Khan (23:39):
I think if we can do that, we actually know the root cause much better of what's causing the disease versus today, a lot of it is outcome-based, "Oh, I see colon cancer." So that's what it's called. And that would lead us to more precision medicine. Precision medicine is something we've talked about for a long time, but I think we can actually put it in action and make it real. We're starting to see that. So I think the way we define diseases. So for instance, Parkinson's, we're probably going to have 10 different types of Parkinson's. We're going to have 10 different types of diabetes. It's not going to be just one because as we learn more about the mechanistic underpinnings of what's driving a disease, and this is where AI machine learning is really important, we have so much data and more and more RNA-Seq data, genomics data, proteomics data, claims, EHR, wearables, this, that, signal processing. What is it all doing? It's helping us better understand what's actually going wrong.

Najat Khan (24:36):
If you use AI machine learning to figure out the right target that's causing the disease, everything you do downstream changes. So I think that's going to be one big sort of change that we're going to have, which is our diseases are going to be driven by the underpinnings of what's happening versus today, which is like, okay, I see that this is a certain kind of disease based on the tissue that it's impacting. So that's one. I think the second thing, and this is something I hope in terms of drug development and drug discovery, we are, I think, going to be much more virtual and digital and decentralized in our trials. I think this whole concept of today, we will go out to certain sites and say, "Hey, do you have patients?" And we run our trials and patients have to come in. There's a huge burden on patients.

Najat Khan (25:25):
I think that's completely going to change. I think it's going to be... We are already trying to do that. We will know which locations have patients based on de-identified data so that we can open the aperture of where we go, not just the large academic medical centers we know, but also community clinics. And guess what, Ari? That's where you have diverse patients. And we actually have started some fully remote trials, fully remote. And I think that's going to become more of the future because if you can do that, then patients can actually, more patients can enroll in the trials, and patients can actually get more of their data as to what's happening. So I think that's going to totally turn on its heads in terms of taking so many years to run a trial. This is my vision. You can do trials much faster, much more agile, more decentralized, and where you're measuring endpoints, endpoints or outcomes, right, that are not just this esoteric thing we do in a trial, but it's actually what's measured in the real world.

Najat Khan (26:24):
Imagine if you could do that, because then you could have more pragmatic trials in a health system. You don't need to do the separate trial altogether. It adds cost, complexity, timelines. So the whole clinical operation space, I think, is going to completely turn on its head in the next 10 years, which will benefit patients and also more engagement and better data that we can collect. And then the third thing I would say is how patients... I mean, we are all patients. We will be some point probably. How we get our data and how we are engaged is completely going to be more consumer-like. Again, we've talked about it, but I think that's happening with fire, with so many other advances and also different regulations where our data is going to be more of our own.

Najat Khan (27:12):
And therefore we can actually make better decisions and be more proactive in terms of earlier diagnosis. There are so many rare diseases, Ari, where we don't even diagnose patients early enough. So one of the things that my team worked on is for pulmonary arterial hypertension, patients are misdiagnosed by four or five years, and a lot of them are women in their mid-thirties. I mean, think about that. It's a travesty. And now we have developed algorithms where you can actually use ECG. And that's one of the first procedures everybody gets, right? If we can deploy it, you can actually pick up the signature that somebody has this disease two, three years in advance. Now I don't want that to be just for one disease that we have medicines for, which improves outcomes for patients if you diagnosed earlier. How about if we could do it for a suite of diseases?

Najat Khan (27:59):
So you pick it up earlier, right? And so therefore you can actually have all of the medicines. We talk about kind of better treatments. So this whole early diagnosis, finding patients in a different way, running trials in a different way, defining diseases in a different way, I think there's so much we can do. And that's what gets me out of bed every morning, which maybe I'm a little bit biased, just a tiny bit, but I think that kind of forth revolution you were talking about before in the green room, Ari, I do think healthcare and especially life sciences is so ready for this, right? Data science is disrupted and transformed in a positive way, financial services, retail edge. Why not healthcare? And COVID just underscores the need for us to be able to do that.

Ari Kaplan (28:44):
Yeah. And I get super excited about all this and very much respect all the work you're doing since it's saving and prolonging lives three to five years for one of those major challenges with the heart is tremendous. And as we get better and better, you're going to keep cancer at bay. How long do you think a typical human can live once most of the major cancers are kept at bay?

Najat Khan (29:07):
That's a great question, and I would love to give you a number, but here's what I think. There's always going to be new diseases that also show up. And this is the constant struggle, right? That we're working with. Think about it. When the average life expectancy was in the sixties, right? There are a lot of diseases we talk about today that weren't an issue, but as you increase longevity, maybe it's a hundred, 120 years, I do think you're going to see more and more new things that come up. And that's why we can't rest. You have to constantly keep innovating. And there's also other aspects, environmental factors. There's so many things that are changing that are also contributing. Our lifestyles, it's not the same healthy lifestyle, the level of activity. Maybe I'm a little bit of a pessimist, but to me, if we can make all of these manageable chronic diseases manageable and improve the quality of life, that's a first win.

Najat Khan (30:00):
I mean, even look at COVID-19 with the vaccines and the therapeutics. Is COVID-19 ever going to go away? No, but if we can turn it into a cold, like a normal cold, then all of the implications we've had, the death, the hospitalization, that's what really shuts an economy down and really the pain and suffering that so many have gone through, right? That I wish would never ever happen again. So I think again, life expectancy, we can probably get in the three digits. However, the more important question is to be vigilant about transforming the diseases now and really understanding it well enough to transform it, what exists today, but then proactively doing work to anticipate what new diseases would come up with everything that's changing. And that's why we can't just say what we're doing in data science today is enough, but we need to keep pushing the boundary. Sometimes people say, "Do you think in five years is just going to be the norm and we're done?" I said, "Have we ever said that about any type of technological innovation?" The word innovation in itself means you're not going to just do enough and then it's done and you're using it. Can't get complacent.

Ari Kaplan (31:05):
Yeah. And then yeah, in terms of being vigilant, not being complacent, how hard is it... And I think countries might already be doing it to genetically modify and potentially unleash something. And if so, how easy is that to do? And if so, is the work that you're doing in data science kind of a net to help stop that faster?

Najat Khan (31:28):
I think that's one of the ways. So more and more countries are starting to do longitudinal patient cohort data sets, right? Like for instance, UK Biobank has half a million patients or participants, and they're collecting samples, ECGs, all the different data sets that you'd want. And then there are larger data sets that are also being put in place, another one, which is 5 million people. Why is that important? Because to be able to follow patients longitudinally or just people longitudinally, you can actually... And having all those data points, that's the way you unlock why did somebody progress to a certain disease? Why did somebody not? You have genomic data, proteomic data. That, that is the core because it's like solving any problem. If you're just looking at the outcome, Ari, and after the fact, there's only so much you can figure out, but having that kind of rich, longitudinal, connected, diverse data sets, that is going to be the core to us then to be apply analytics to answer those questions, like why did somebody progress? Why did someone not? So some countries are starting to do it. It requires investments. We are investing along with many governments, but I think that's going to be one of the bedrocks of how we answer some of these questions. Without that, you're flying blind, quite frankly.

Ari Kaplan (32:42):
So Najat, thank you so much for being on. It was a wonderful time and we appreciate it.

Najat Khan (32:48):
Thank you so much, Ari. Thank you to the entire team for having me on. This is a great discussion. It actually energizes me as I think about what more can be done, and it's a good forcing function to step back, so thank you again.

Ari Kaplan (33:00):
Thanks.