Combining Secondary & APLD Advanced Analytics and Primary Analytics

with
What do you do if your primary research and secondary analytics don't come up with the same conclusions? At a recent PMSA conference, Igor Rudychev, head of U.S. digital, data, analytics and innovations at AstraZeneca Oncology gave a presentation that delved into this issue.

Here's the crux of it.

Historically, we know that primary market research drives pharmaceutical decision-making. Senior leadership is making major strategic and tactical marketing decisions based on a variety of factors, including:
  • Awareness and familiarity with the drug
  • Percentage of trialists, prescribers or switchers
  • Perceptions (including perceptions of efficacy and tolerability)
  • Likelihood to prescribe in the future
  • Discussions with sales reps
  • Barriers to prescribing
  • Brand perception and satisfaction
  • Influencer nominations/mapping
  • Inputs to forecasts
  • Market shares
When you get into secondary analytics, we know that the data is coming primarily from the patient level.
  • Optimization
  • Targeting
  • Segmentation
  • Prescriber analysis, including early or late adopters and historical prescribing patterns
  • Sources of business
  • Durations of therapy
  • Spheres of influence
  • Inputs to forecasts
  • Market shares
But, here's the problem. In pharma, decisions are made by using primary data, but that data is incomplete. Sales decisions are made by using the secondary data. A combination of the two is the optimal way to improve patient outcomes. But the results of those two methods, even when measuring the same thing, come up different.

Two Models Research the Same Thing, Different Results. Now What?

A way to look at this is to look at the goals of primary research and secondary analytics. Many of those goals overlap to answer the same questions.

Primary goals:
  • Market shares
  • Inputs to forecast
  • Influencer nominations/mapping
  • Likelihood to prescribe in the future
  • Had discussion with sales rep
  • Sources of business
  • Durations of therapy
Secondary goals:
  • Market shares
  • Inputs to forecast
  • Influence mapping
  • Innovators/Laggards Analysis, probability to prescribe
  • Call execution
  • Sources of business
  • Durations of therapy
Some of those goals overlap, but the research of the two methods can come up with different results. Say that, in primary research, you find a market share of 30% and a likelihood to prescribe of 80% and had a discussion with a sales rep comes in at 40%. Great! But your secondary analytics find that market share is at 41%, probability to prescribe in the future based on analogs is 50% and sales rep discussion is at 70%.

It happens much of the time because of a difference in assumptions in the two methods. So, now what?

Triangulation!

The key is to triangulate the data and look at the subset of where the triangles meet.

Let's use one example. Using a machine learning/AI model, you can create a subset from, say, claims that imitate complete HCP and patient populations and that are representative of the payor and patient population. You can then train the ML/AI model on this subset and estimate market shares, making sure to capture the parameters driving initial data skews.

Then, you can apply the model to the primary research subset and compare the numbers. This improves the model.

It's further possible to create behavioral HCP segmentation based on both primary and secondary data using that overlap and use the results for targeting.

The point is to link primary and secondary data to train the ML/AI model. It's about linking attitudinal primary variables with secondary variables in claims.

Pros and Cons

When you're talking about the projection of attitudinal variables for every HCP for targeting, the standard approach is to just use secondary variables from claims to create secondary segments.

What if, with the Qual Variables Projection approach, you project variables from primary research to the secondary data HCP? Here are the pros and cons.

Standard approach
  • Uses only secondary data for individual HCP parametrization
  • Uses only secondary data for targeting
  • Often secondary data is not enough to create uniform segments
Qual Variables Projection approach
  • Allows to probabilistically introduce qual variables to the secondary data
  • Models data first and improves projectability of segmentation
  • Creates more uniform segment
So, what do we learn from all of this?

AI and ML allows us to bridge primary research and secondary analytics. It also allows us to resolve major differences between results of primary and secondary data analysis. Many primary research techniques could be improved with secondary data analytics.

Bottom line: It's important to communicate to the leadership and decision-makers that pure primary research data could be skewed. Data enhanced with secondary analytics should be used in their strategic and tactical decision-making instead.

Complete data allows us to understand which medicines work best for which patients. In the end, it's about saving lives.


We Were Data Scientists Before Data Science Was Cool: New Challenges for the Profession

with
Suddenly, being a data scientist is cool. And in high demand.

Why? Because these days, data makes the world go round. Nearly every industry in our economic ecosystem is clamoring for it.

If a company, no matter the industry, is not using Big Data to chart and forecast customers' journeys, better connect with them, ferret out their wants and needs before they even know what they are (thank you, Netflix, for creating that perception in people's minds), and otherwise using the numbers to enhance the customer experience, it will be left in the dust by competitors that do.

The increased demand for data in all sectors of the economy has created a boom in the data science field. According to Forbes magazine, the fastest-growing jobs in the country today are data scientist, machine learning engineer and big data engineer. In the blink of an eye, every company needs people who can make sense of data. LinkedIn conducted a survey and found there are 6.5 times as many data scientists working today than there were just five years ago. For machine learning engineers, that number jumps to 9.8.

"The field has exploded within the past four or five years," says Nuray Yurt, head of enterprise data science at Novartis. But, she points out, while the need for data pros continues to ramp up, which is a good thing on many levels for the profession, it also brings with it some challenges for the data scientists themselves.

Challenges for Data Scientists Today

The situation can be loosely compared to the disruption the corporate training field went through back when the internet was first starting to change the way every company on earth worked. People got into the training profession because they liked teaching in front of a classroom, which is where the bulk of training happened pre-internet. But very soon after the screech of dial-up technology began connecting every desk in every office to the World Wide Web, someone got the idea that training should happen online, so trainees could sit at those very desks and get the knowledge they needed on their own schedule. Suddenly, trainers had to learn an entirely new skillset — creating online learning modules. It was not what they signed up for, but it quickly became an essential part of the job.

Data scientists are finding themselves in a similar predicament today. The nuts and bolts of analyzing data are always evolving, but the skills to do the job, like analyzing statistics, computer knowledge and business knowledge, remain the same. What's new for data scientists are the so-called soft skills that are becoming necessary parts of the job.

"Data scientists need to be curious, open minded, quick learners and have the right personality fit now," Yurt explains.

Communication skills are a vital part of that. Why? Because industries that are newly reliant on data, like sales, customer service and hospitality, are hiring data scientists to help them make sense of it all. And, gently put, the people who run those companies are not data scientists nor have they ever had one on staff. As Yurt notes, everyone now knows what to do with data, but few know what it takes to glean that data, analyze it and translate it into actionable goals and strategies for companies to implement. So, data scientists are suddenly put into the position of emerging from their offices where they've been happily crunching numbers on their own and explaining to higher-ups what the data science actually means, in language they can understand.

The temptation may be to "dumb down" the explanation, but Yurt says that's a mistake.

"The challenge for data scientists today is being able to communicate complex concepts to people who don't understand them without diluting the complexity," she says. That last part is the key. People in industries new to data need to understand the complexity of the process, or it diminishes the data science field as a whole. It also puts funding and potentially jobs at risk if people don't entirely get the fact that analyzing and interpreting data is a science that Hal from accounting wasn't trained for.

"We need to communicate why and how what we do makes a difference," she says.

Another challenge for data scientists is the need to be more open minded. "We need to be OK with change," Yurt says. "Our jobs won't be the same as they always were, and we need to be OK with that."


Application of NLP to Detect Adverse Events in Patients

with
At the last PMSA conference, Ketan Walia, senior associate of decision science at Axtria, and his colleague Rushil Goyal, also a senior associate of decision science at Axtria, presented "Application of NLP to Detect Adverse Events in Patients," which generated a lot of interest. They looked into the automated detection of adverse drug reactions using social media text data leveraging natural language processing and machine learning, and gave conference attendees a rundown of what they found. In case you missed it, here's a recap.

Why Are Adverse Drug Reactions (ADR) Significant?

Getting a handle on ADR will significantly benefit the industry, leading to huge savings in healthcare costs and better patient compliance.

"ADR detection is a very significant task which typically doesn’t get as much traction as it needs," Walia says. "Especially considering the fact that adverse reactions related to a drug could affect the entire life cycle of the drug from clinical trials to the time it is launched in the market. Around 90% of ADR are underreported and there is often a big delay by the time they get formally reported and registered. This creates a huge lag in the system called a delayed feedback syndrome. This eventually hurts drug performance in the long run, greatly impacting safety of patients and commercial gains for the manufacturer."

ADR is a top cause of morbidity. Here are the grim stats:
  • 6.7% of hospitalized patients have a serious ADR with a fatality rate of 0.32%
  • Adverse reactions to drugs cause 100,000 deaths yearly
  • ADRs are the 4th leading cause of death in the U.S.
  • 90% are underreported
There's an urgent need for action. Adverse drug reactions and their impact on drug approval are having serious impacts on the commercial outlook of drugs.

One place to start, Walia found, was Twitter.

Wait, Twitter?

Yes, Twitter. The social media giant is a potential gold mine of information about ADR. Its 645 million users generate about 9,100 tweets every second, some of them about their own health and response to medications.

Twitter has widely been used in other frontline industries like retail, e-commerce, consumer durables, service and more for opinion mining, customer intelligence and gauging customer satisfaction levels.

However, Twitter as a data source has not been widely used by the pharmaceutical and life science industry as it is not a standard practice. Here's what led Walia and his team to Twitter:

Delayed feedback syndrome: "For our topic while doing literature review we realized that around 90% of ADR cases are underreported, which results in delayed feedback syndrome and many times ADR are officially registered only after their market launch. This hurts entire USP (unique selling point) of the product/drug and continues to affect drug performance throughout its lifecycle," Walia explains. "To mitigate these shortcomings, we were looking to build a pharmacovigilance system which could provide automated feedback, possibly in real time."

While researching, Walia and his team realized that although ADR are underreported, patients do not hesitate to go online and vent about their experiences in almost real time. So one of the reasons to use Twitter as a data source emanates from the shortcoming of the present system and also the nature of the problem we are trying to solve.

Lack of data sources: There is not much data pertaining to ADR being collected and made frequently available publicly for commercial use. So, Walia found Twitter to solve this problem. All the data is publicly available, directly coming out of affected patients themselves.

How Walia Used Twitter Data for Pharmacovigilance

Step 1: Data Acquisition
The first part of the process was to collect tweets as a source of potential ADR. Arizona State University collected 10,000 tweets corresponding to a list of 81 drugs as per IMS Health Top 100 Drugs. What they found was raw, unstructured data: people's thoughts, feelings and experiences. Next, it was time to remove the "noisy" information, like retweets, advertisements, URL links, boiling the tweets down into the information they really needed — a patient's reaction to the drug they were taking.

Step 2: Tweet Pre-Processing
Pre-processing involved segmentation of the raw text, sentence splitting and tokenization. It was about converting words into numbers so the data could be analyzed.

Step 3: Feature Engineering
The third stage in the process involved coming up with a representation of a group of words having similar meanings. One representation of this is the "bag of words" that most everyone has seen. It is simply a grouping of words, some very large and some tiny, representing how people feel about a certain situation or issue. Ketan Walia explains: "The way it works is that you feed in 'Bag of word' representation of words to this algorithm and it runs a neural network on the background and converts bag of word representation into a more generalizable vector representation called 'Word Embeddings.' Once you get word embeddings for all the words in your data set you can now feed these word embeddings instead of bag of words to a machine-learning algorithm."

Step 4: Binary Classification
This step involved categorization of sentences as ADR or not-ADR, and testing and evaluation of data using various cross validation techniques. Here, it's about deep learning. The main advantage of deep learning is that it is capable enough to deal with highly complex and unstructured data like text.

Step 5: Named Entity Recognition
Walia and his team used the Hidden Markov Model to annotate words and phrases directly related to ADR. That's because the Hidden Markov Model has a 63% accuracy rate to train a machine learning model to automatically annotate ADR positive tweets.

Conclusion

Maximizing knowledge of a drug’s safety profile and integrating it into commercial planning will have greater influence with regulators, payers, and ultimately patients and prescribers.

The end result is that the entire modeling framework provides an Artificial Intelligence based system which could automatically stream a drug-related post online (Twitter in this case), interpret the text data and classify if the text is pertaining to an ADR or not. If yes, these ADR positive tweets are further analyzed within the framework itself to tag the words and phrases in the tweet directly pertaining to ADR, thus providing the most relevant and concise intelligence to the user. It provides the user tools to perform this pharmacovigilance and extract the most relevant information in an automated fashion.


Industrializing Machine Learning in Pharma

with
Last year at the PMSA conference, Daniel Kinney, senior director, data and analytics platforms for the Janssen Pharmaceutical Companies of Johnson & Johnson, and two colleagues gave a presentation that generated a lot of interest among conference attendees. "Industrializing Machine Learning in Pharma" looked at common problems and myths about industrializing AI/ML, and how best to tackle those issues.

Even one year later, the topic still resonates. In case you missed the presentation, here's a recap:

Challenges of Industrializing Machine Learning

Nobody questions that machine learning and data science has great power. Now, it's a given. Still, challenges exist. While most pharma companies work with AI/ML in different parts of the organization, few actually leverage AI/ML beyond proof of concept projects. AI/ML proponents in the pharma industry face three main challenges industrializing AI/ML to create large-scale impact.
  1. Downplaying the role of iterative automation
  2. Data readiness
  3. The adoption barrier
Let's look at each of those in more detail, and the myths behind them.

Three Myths About Machine Learning

Common myths, if not "busted," contribute to the inability to show ROI if ML underperforms with poor data and short timeframes, and inefficient one-off analysis with unnecessarily complicated ML algorithms.

Myth: ML is a black box. If you don't already know this term, it refers to the possibility that, once you run your model, you'll be given something that just doesn't make sense, but you won't get a reason why. For instance, you'll get: "When targeting physicians, the year they graduated from medical school is an important variable," giving you something that doesn't make sense, and doesn't tell you why that variable is there.
Busted! Most use cases do not require or are not fit for black box algorithms like Neural Network. If you're worried about not getting buy-in because of the black box factor, it's a nonissue now.

Myth: You'll see immediate improvement.
Busted! The improvement of processes depends on data availability and quality, and it takes iterations.

Myth: You'll be disrupting today's processes with ML.
Busted! Machine learning models do not need to be developed from scratch. They can incorporate business rules or sit on top of rule-based, knowledge-based systems.

Data Readiness: Garbage In, Garbage Out

Another problem is with the data itself. It's vital how clean, accessible and quality assured your data is. But somebody's got to do it. Sometimes data scientists and analysts feel like "data janitors," spending the majority of their time cleaning and prepping data. Also, creating training data can be daunting and time consuming.

But those pain points also come with solutions.

Accessibility is no longer the issue it once was. Cloud and big data tech are increasingly available for the entire ML ecosystem. Also, publicly available data can be labeled quickly and cheaply by online resources.

Data strategy, governance and management is a big piece of this pie, and you can also use ML reinforcement learning to enrich training data.

The Adoption Barrier: Ability to Impact Decisions with ML Insights

The inability to impact decision-making eventually will impact the ROI of machine learning, making sustainable development of machine learning in an organization difficult. That's why it's vital to help guide sales and marketing teams, who are likely not data experts. It's vital how you engage these teams. How you introduce the concept and get buy-in can mean the difference between success and failure. The data can be spot on, but if sales and marketing don't use it, what good is it? Ways you can help encourage buy-in:
  • Field adoption: Show the relationship between increased sales and adoption.
  • Non-personal marketing: Use tools to automatically track target responses to create training data and improve marketing vendor management with a clear tracking system.
  • Other strategic decisions: Identify pain points, including financial, productivity, processes and support. Avoid "technology for its own sake."

A Word About ROI

Yes, that's what ultimately makes the world go round, but you shouldn't get too hung up on ROI at the outset. Let the models run, let some time pass and the data will get better. You'll identify new sources and the model will improve. Over time, you'll refine and refine again.

Key takeaway:
Your team doesn't need to be a machine learning unicorn. But for success, your organization should combine three key aspects of machine learning:
  • Data science. Develop and adjust your ML models. Validate and continuously monitor your model's performance.
  • Business analytics. Ensure your ML model is set up to answer important business questions. Relevancy is key. Communicate results to realize the impact of machine learning on your business.
  • Data engineering. Set up an infrastructure to enable data combination. You can even scale models developed by data scientists in big data environments.
The most important thing? Machine learning is the flash and sizzle of data science, but in order for it to really impact your business, you have to start with the endgame in mind. Design your infrastructure. Engage your customer base. And plan for success, because it will come.