Improving Accuracy in Rare Disease Patient Identification Using Pattern Recognition Ensembles

Improving Accuracy in Rare Disease Patient Identification Using Pattern Recognition Ensembles
Tim Hare, Sr. Manager, CE-Analytics, Symphony Health Solutions; Pratibha Sharan, Senior Consultant, Commercial Effectiveness Team, Symphony Health Solutions; Ewa J. Kleczyk, PhD, Vice President of Client Analytics, Symphony Health Solutions; Derek Evans, Sr. Vice President & General Manager, Symphony Health Solutions
Finding patients who are appropriate candidates for therapy has always been a primary goal for pharmaceutical marketing teams. However, reaching the right patient at the right time has never been more critical given the strategic shift toward highly specified, personalized therapies such as immunotherapy, gene therapy and rare disease markets. Reimbursement for these high-cost therapies typically requires extensive demonstration to payers that the patient meets clinical requirements. For pharmaceutical manufacturers, identifying these patients through traditional data sources alone is no easy task: many rare disease patients go undiagnosed or misdiagnosed for years, many rare diseases lack specific diagnosis codes, access to electronic medical records (EMR) is limited, and the results from lab or genetic testing cannot be seen in traditional reimbursement claims. However, by coupling the historic claims data of specific patient populations with machine learning techniques, an algorithm can be created to identify “high likelihood” future candidates. In this paper, we will review studies in which a model was used to predict which patients were potential candidates for orphan drugs to treat two very rare (< 5,000 cases of each known) diseases. In this disease state, patient identification is particularly challenging given the lack of a definitive diagnosis codes and symptoms that mimic those of extremely common conditions. The full claims histories of patients receiving the therapy and randomly selected control patients were used to build, train, and test multiple predictive models (single tree, boosted tree, bagged random subspace trees, though any number of different algorithms might be suitable). Each of these predictive models reached high levels of out-of-sample positive predictive value (PPV) in distinguishing target patients from control. Two ensemble predictive models have been deployed which have identified patients at rates well above disease prevalence.

Keywords: rare disease, patient finding, advanced analytics, predictive modeling, healthcare claims data

Appropriate rare disease patient identification represents a significant opportunity for both clinical and commercial stakeholders. While a single condition can affect fewer than 200,000 patients, it is estimated that 7% of the developed world’s population suffers from one of 7,000 known rare diseases.16 With 25-30 million affected individuals in the United States alone, a physician is likely to encounter at least one rare disease patient in their practice. These patients’ path to appropriate diagnosis is long, with an average time of 7.2 years.16 Minimizing the time to correctly diagnose these patients is critical, particularly for conditions in which there are therapies available to limit the progression or relieve key symptoms of the disease. Identifying highly likely patients prior to diagnosis can facilitate timely and targeted disease education efforts.  

Examination of longitudinal patient level health claims data,18,19,20  laboratory and EMR data, socio-demographic information, and linkage to physician attributes, enables a comprehensive understanding of a patient’s medical history – diagnosis, treatment and testing. In order to identify potentially undiagnosed rare disease patients, there are primarily two approaches: top-down searching 18, 21 and bottom-up predictive analytics. Clinical expertise can be leveraged to search for patients exhibiting known rare disease symptoms and presentations. Alternatively, the pre-diagnosis data of known patients can be utilized to build models that reflect underlying patterns in claims data.8, 17 This approach removes human bias and accounts for the potential wide variation in patient presentation. High dimensional machine learning, in which computers learn without explicit programming from a very large number of variables, is well suited to this effort. However, further combining these machine learning algorithms into an ensemble, to offset weakness in any one approach, achieves a level of accuracy well beyond that of any of the individual ensemble member models. This article will review currently leveraged machine learning techniques for rare disease patient identification. It will also discuss case studies that demonstrate the value of such an ensemble approach.  
Non-Parametric Machine Learning Complements Classical Parametric Statistical Approaches
The field of machine learning grew out of research in computer science,4 and is defined as the development of “computer programs that automatically improve with experience.”2  These computer programs adapt during exposure to data, in effect learning from experience.22, 23  Machine learning algorithms also learn which variables are important, and can be used with what would otherwise be considered to be a prohibitively large numbers of variables, known as high dimensional data,13, 24  without any requirement that the investigator choose a subset. Importantly, non-parametric machine learning approaches such as those used here have no inherent form prior to exposure to the data. This reduces the bias in the final model. 

In contrast, classical statistical modeling relies on human agency in choosing a particular model form that fits the data best, as well as a reasonably small set of variables to be tested during modeling.  The assumption is that the investigator has sufficient knowledge of the data to choose the correct model form and a small subset of variables that are likely to be useful. The model form is static, exists before exposure to the data, and does not change during exposure to the data.  A lack of correspondence between inherent structure in the data and the chosen model form will result in poor modeling results, as will an incorrect or incomplete choice of variables to be tested.   

For this reason, the non-parametric machine learning algorithms used here complement classical statistical approaches by offering additional flexibility as well as the ability to search directly in high dimensional spaces. However, a higher rate of over-fitting (the “bias-variance trade off”)3 can reduce model performance. Algorithm-specific error reduction methods (“regularization”)3 are often leveraged to reduce over-fitting. However, we proposed a different approach for rare disease patient identification, a simplified form of regularization using ensemble agreement.3,5
Evaluation of an Ensemble Approach to Machine Learning
Given patient rarity, large sales territories and promotional effort costs in the rare disease space, it is desirable to reduce false positive (FP) rates as low as possible.  Precision or positive predictive value (PPV = TPR/(TPR+FPR))1 can be increased when the true positive rate (TPR) is much smaller than the true negative rate (TNR). For example, if a population has a prevalence of TP of 0.1%, there are a very large number of TN: 99.9% TN, or 999 per 1000 patients evaluated.  If a single model class error for the TN class is 25% (e.g. our FPR), and has a 0% FNR, precision equals 0.003984 or ~0.4% [1/(1+250)] despite this low FNR. That is, ~4 true patients for every 1000 patients screened can be expected on average. 

Going beyond a single best model approach, we can leverage ensembles to compensate for the patient classification bias of any one algorithm. For example, under the simplifying assumption that each model misclassifies true positives independently, and given a false positive probability of P=0.25 for each model, the ensemble probability of 3 models making a misclassification error is (0.25)3 or 0.0156, when all 3 models are required to agree.  Our hypothetical ensemble-level false positive rate is now 1.56% and yields a positive predictive value of [1/(1+15.6)] = 0.0602 or 6.02%. Obviously the error rates may not be fully independent, and the precision gains will depend on the data; however for illustrative purposes, we see a ~15-fold higher ensemble PPV relative to single model PPV of 0.4%.  This improved precision necessitates a higher false negative rate, a trade-off that must be considered when utilizing the ensemble approach.  

Algorithms should be selected for participation in the ensemble so as to complement weaknesses. In the ensemble case studies described here, Tree, Random Forest, and AdaBoost algorithms were chosen for the reasons outlined below, as well as for computational efficiency on large, high dimensional data sets.
Tree Algorithms
Tree-based algorithms such as the RPART algorithm used here,25  partition the feature space (independent variables) into cuboid regions by a series of binary splits, and assign constant values to all members of that region. A simple two-dimensional (two features: X1 and X2) tree is shown in Figure 1, along with a corresponding partition of the feature space (from Hastie, Tibshirani, Friedman).11
Figure 1: Tree Algorithms

Tree models operate on a data set defined by N observations where each observation (xi, yi) for I=1,2,…,N is associated with P features, such that xi = (xi1, xi2, ….,Xip).  Tree algorithms result in a piecewise-constant10,11 functional representation. It can be helpful to consider the final functional form along-side the machine learning approach that produced it. The functional form for a tree model is shown below.

A single tree can be viewed as composed of many models, or sub-models, called “nodes”, with the particular model applied to any input, xi, being the rule set associated with the constant, Cm, for that region, m. For example, the region m=R1 in the figure above would be the model defined by X1 <= t1 and X2 <= t2, and the value Cm associated with that region would be the result f(x) associated with all points Xi that satisfy the following:

Tree-based algorithms can rapidly model large amounts of data (many observations) given its computational efficiency. They are also robust to high dimensional data (many features, where P >> N) because the algorithm only considers a single feature at each split within the overall feature space and as such is agnostic to the dimensionality of the data.    

Error minimization for each region is implemented by defining two sub-regions parameterized by the variable under consideration (j) and the value of that variable to perform regional splitting (s), as shown in the definition below.

Penalty function minimization takes place as a minimization of the sum of the two sub-regions, as shown below.

The two interior minimizations of mean square error (MSE) are trivial in that it is the constant assigned to that region - the average for each region (c1 and c2).

Strengths of the recursive binary tree algorithm include the conditionality of each node, the ability to analyze high dimensional data, and computational speed.  Weaknesses include splits based only on the single best variable, sampling bias, and over-fitting.3  Additionally, regions must be rectangular cuboid, eliminating non-rectangular, irregular convex, and concave regions from consideration. The trade-off between under-fitting versus over-fitting the data can be controlled within an individual tree by empirically selecting good parameters such as minimum group size to split, maximum number of levels, etc.  The onus is on the data scientist to determine optimal settings through a series of experimental runs with checks against out-of-sample (OOS) data (data not used in developing the model). 

Ensemble Regularization Using Tree, Random Forest, and Boosted Tree Algorithms
As noted above, individual trees can have high error rates, especially if optimal parameter settings are not experimented with manually.  To reduce these error rates as part of a turn-key ensemble regularization approach, we look to find algorithms that are not likely to make errors in the same way as single tree algorithms do, while maintaining our focus on algorithms that can work directly with high dimensional data, and offer sufficient computation speed on large data sets. Random Forest reduces variance (random error) and bias (systematic error) by minimizing correlations between the tree variables and averaging over many trees. The process of producing Boosted trees, unlike the algorithms above, leverages individual record error tracking during the tree growing process.  The motivation for combining these particular approaches as a form ensemble regularization is driven by the recognition that each algorithm has a distinct bias toward certain types of errors, which can be offset using ensemble regularization.
Results and Pertinent Case Studies
The results and case studies below will be described utilizing the follow terminology:  Receiver Operating Characteristic (ROC) curve, out of sample (OOS) testing, confusion matrix and positive predictive value (PPV). A ROC Curve is plot of the true positive rate against the false positive rate in a classification model.  A ROC plot shows the relationship between the two, as a function of the stringency of the classification threshold. Out of sample testing refers to data not used in training a model that is useful in gauging the performance of the model.  A confusion matrix is a table that describes the performance of a classification model for a set of test data in which the true values are known (Figure 2).
Figure 2: Confusion Matrix

PPV is equal to the ratio of true positives divided by the sum of true positives and false positives [TP/(TP+FP)].  Also known as precision, it is a measure of the purity of the assigned classes in a classification model.
The goal is to provide the client with high probability leads for clinicians whose patients are likely to be undiagnosed rare disease patients.  Expanding the population of patients on therapy drives revenue growth and also shortens the period of time between first contact with patients and correct diagnosis. 

The ensemble methods outlined in the introduction were used to score patients based on secondary data, such as prescriber information, diagnostic information, procedures, and socio-demographic information.  The particular mixture of base learning agents and ensemble algorithms was motivated by the rare disease patient identification context outlined in the introduction. Th1at is, given budgetary parameters and sales force sizing, clients typically have us carry out ensemble modeling that focuses on FPR reduction, leading to higher PPV at the expense of FNR increases. Put another way, this approach may not be feasible if one is concerned with capturing most of the patient population, rather than an accessible fraction of it. As well, a client may have limited sales force allocations, hence they prefer to focus on small volumes of very high probability patients, again, at the expense of FNR increases.
Case Study 1: Leveraging Predictive Analytics to Expand Patients on Therapy
In this case, an ensemble approach involving three models, as detailed above was utilized to identify untreated patients pre-diagnosis within a rare disorder, which has very low prevalence world-wide. The models were trained on available patients, a total << 1000, identified within the US population. The ensemble approach identified a number of patients as high likelihood untreated patients with the rare disease, given the volume and probability cutoffs desired by the client.

However, assuming that all patients were evaluated and all true positives for the disease were given the opportunity to choose therapy, this represents a 172,000-fold improvement over random picking at the native prevalence. In fact, this rate is subject to many factors, such as whether a diagnosis is made, willingness to go on treatment, ability to pay for treatment, insurance status, limited physician detailing and physician response to the intervention. Hence it is possible that this rate may be higher.

The overall false positive rate for each of the individual ensemble member models was 14% (Tree), 4% (Random Forest) and 12% (ADA Boost).  Given the volume of patients that must be screened, if one were to use a single model approach, seemingly acceptably small error rates are not insufficient where disease prevalence is so low. For example, a 4% false positive rate (if we choose the best performing model, Random Forest) applied to 320 million (~ United States population) screened patients results in 12.8 million false positives.  However, the ensemble level error rate would be as low as (under the simplifying assumption of model error rate independence noted above) P(false positive Tree, Random Forest, AdaBoost) = (0.14)(0.04)(0.12) = 0.000672 or 0.0672%, an ~60-fold (4%/0.0672%=59.5)  improvement relative to the best performing single model in the ensemble. 
Figure 3: Case Study 1, Tree Model

TREE model:  ROC (AUC=0.84) plot. (Figure 3)
True Positive Recovery vs False Alarms as function of model probability score
Positive Predictive Value: PPV=0.78, Sensitivity or True Positive Rate: TPR=0.84, FPR = 0.19, FNR = 0.14
Figure 4: Case Study 1, Random Forest Model

Random Forest Model:  ROC (AUC=0.96) plot (Figure 4)
True Positive Recovery vs False Alarms as function of model probability score
Positive Predictive Value: PPV=0.97, Sensitivity or True Positive Rate: TPR=0.89, FPR=0.04, FNR=0.11
Figure 5: Case Study 1, ADA Boost Model

ADA Boost Model: ROC (AUC=0.97) plot (Figure 5)
True Positive Recovery vs False Alarms as function of model probability score
Positive Predictive Value: PPV=0.91, Sensitivity or True Positive Rate: TPR=0.86, FPR=0.12, FNR=0.14
Of the ~9000 variables interrogated, 112 were implemented in the final ensemble. A portion of the variables found across the three modeling events (highly ranked by Mean Decrease in Accuracy from Random forest), were consistent with the therapeutic area and were significant when analyzed by Asymptotic Linear-by-Linear Association testing vs control (Table 1, P-value), and are shown in Table 1. 
Table 1

Case Study 2: Leveraging Predictive Analytics to Expand Patients on Therapy
In this case, an ensemble of three models as detailed above was utilized to identify untreated patients pre-diagnosis, within a rare disease population of approximately 1 in a million or rarer. The models were trained on available patients, where less than 5000 patients were identified within the US population.     

We can calculate fold-increase based on all patients who were offered treatment, not just those who actually used the treatment. The rate of patient identification (model rate/ prevalence) represents a 257-fold improvement over random picking. In fact, this rate is subject to many factors, such as whether a diagnosis is made, willingness to go on treatment, ability to pay for treatment, insurance status, limited physician detailing and physician response to the intervention. Hence it is possible that this rate may be higher. Of the ~1775 variables interrogated, 10 were implemented in the final ensemble. Variables found across the three modeling events were consistent with the therapeutic area (data not shown).   

The overall false positive rate for each of the individual ensemble member models was 4% (Tree), 8% (Random Forest) and 4% (AdaBoost). However, the ensemble level error rate would be as low as (again, under the simplifying assumption of full model error independence) P(false positive for Tree, Random Forest and AdaBoost) = (0.04)(0.08)(0.04) = 0.000128 or 0.0128%, a ~313-fold reduction (4%/0.0128%=312.5) relative to either of the two best performing single models in the ensemble.
Figure 6: Case Study 2, Tree Model

Tree Model: ROC (AUC=0.97) plot (Figure 6)
True Positive Recovery vs False Alarms as function of model probability score
Positive Predictive Value: PPV=0.96, Sensitivity or True Positive Rate: TPR=0.96, FPR=0.04, FNR=0.04
Figure 7: Case Study 2, Random Forest Model

Random Forest Model: ROC (AUC=0.99) plot (Figure 7)
True Positive Recovery vs False Alarms as function of model probability score
Positive Predictive Value: PPV=0.92, Sensitivity or True Positive Rate: TPR=0.98, FPR=0.08, FNR=0.04
Figure 8: Case Study 2, ADA Boost Model

 ADA Boost Model: ROC (AUC=0.99) plot (Figure 8)
True Positive Recovery vs False Alarms as function of model probability score
Positive Predictive Value: PPV=0.98, Sensitivity or True Positive Rate: TPR=0.96, FPR=0.04, FNR=0.10
Case Study 1 and 2: Results Comparison
There are two tunable parameters that were employed that affect the final patient discovery rates. The shape of the ROC plots above indicate that higher model probabilities are associated with lower false positive rates. In light of this finding, client preferences were to raise the probability threshold for classifying a patient within each model to require a higher score during classification of >=0.9 rather than the default of >0.5, hence the number of patients found reflects this. As well, “ensemble agreement” (within-patient) was also preferred, such that a patient would need to meet this higher threshold within all three models. These two parameter values impact volume and PPV.
Table 2

Table 3

We can compare the estimates above with out-of-sample PPV and FPR when class agreement is imposed. For example, no FP errors were made in the Case Study 1 ensemble until the model probability cutoff fell below 0.55, leading to a 100% PPV for this small (N less than 100) hold-out sample, for model cutoffs above 0.55. This therefore provides an opportunity to lower each individual model cutoff well below where one would have otherwise experienced large FP rates. In Table 3 we see that we would experience only 3,843 false positives at an ensemble cutoff of >= 0.9.

An important aspect of the ensemble regularization process being proposed here is balancing the cost of the false negative rate increases coming from both a high probability cutoff and the ensemble member agreement requirement, against the value of being able to properly match some volume of high quality candidate patients to fit available resources.  Since any threshold (even thresholds below 0.5) can be used in an ensemble regularization process, volume can be scaled up to fit larger force sales sizes, with added assurance from the ensemble regularization that FP rates will be improved relative to any single model. Our Case Study 1 and 2 examples used very stringent cutoffs (>=0.9); however, the levels could be set at or below 0.5 as part of the ensemble regularization process that focuses on higher volume, higher FPR and lowers FN rates.
Case Study 3:  Ensemble Regularization and Model Probability Cutoffs
Due to the very small hold-out sample set sizes for the rare disease case studies above, estimates of PPV as a function of cutoff are very sparse. For example, no FP errors were made in the Case Study 1 ensemble until the model probability cutoff fell below 0.55, leading to a 100% PPV for this balanced N of less than 100 hold-out sample size. In order to show a robust distribution of ensemble PPV as a function of model probability cutoff, we turn to a third rare disease case where a larger balanced N > 1000 hold-out sample was available. As detailed in the figure below, ensemble level precision remains high when models are applied to an out-of-sample data set even when the cutoff for each model is as low as 70% probability. The ensemble level error remains above 90% despite lowering each of the individual model stringencies to >=70%.
Figure 9: OOS Ensemble-Level PPV as Function of Probability Cutoff

A three (3) model ensemble showing ensemble-level precision (positive predictive value, PPV) on the Y-axis vs the model probability cutoff on the X-axis where all three models share the same cutoff.
Identifying undiagnosed and untreated rare disease patients8 represents a significant opportunity to shorten time to diagnosis. As noted in the introduction, minimizing the time to correct diagnosis for these patients is critical, particularly for conditions in which there are therapies available to limit the progression or relieve key symptoms of the disease. Identifying highly likely patients prior to diagnosis can facilitate timely and targeted disease education efforts. From a clinical perspective, predictive analytics represents a means to expedite early and accurate diagnosis and care. From a commercial perspective, identifying patients and their associated physicians supports ongoing promotional and educational efforts in a highly targeted fashion.  

A predictive analytics approach to rare disease patient identification is accomplished most effectively through high dimensional13, 14 machine learning.4  In this process potential human bias is reduced as computers learn without being explicitly programmed and very large selections of variables are evaluated. Tree, Random Forest and Boost algorithms are well-suited to this task given their computational efficiency and utility in an ensemble error reduction process.   

The ensemble approach to error reduction is particularly appropriate for rare disease, in which true patient rarity, large sales territories and promotional effort costs result in a low false positive (FP) tolerance. The cases in this article highlight the efficacy of this approach for two separate rare disease cases. Here we have employed this approach in building predictive model ensembles for two very rare diseases. In each case, high levels of ensemble predictive accuracy were achieved well beyond the level of any of the individual models. Precision, or positive predictive value, provides the best measure of value within the rare disease patient identification context, as it focuses on the purity of the pre-diagnosis rare disease patients we identify for clients. In this context, large volumes of false positives will accumulate when screening a large number of patients unless a model has a zero (or very near zero) false positive rate. Through the use of ensembles, false positive rates can be reduced by orders of magnitude below that of any single model, as long as we can tolerate a higher false negative rate. In the case of the very rare diseases modeled here, a prohibitively high false positive rate would have prevented implementation had we not used an ensemble approach.   

Choice of ensemble members depends upon a variety of factors.  The machine learning algorithms used here were chosen to maximize value to the client based on a consideration of budget, timeline, computational efficiency and robustness against ultra-high dimensional data.  The ability to interrogate algorithms post-modeling in order to set the stage for additional explanatory modeling was also a key factor, hence a single “greedy” (best variable at each binary split) tree algorithm that generates explicit rules allows one to look at all conditionality, and Random Forest generated “Mean Decrease in Accuracy” scores associated with key variables, both contribute information that can be used to choose subsets of variables for final modeling.   

Additional exploratory modeling on subsets of variables using classical statistical approaches such as Logistic Regression would add value. That said, project overhead might not justify the additional investigation time and cost. Per this, the choice of models used here allows analysis to take place directly in ultra-high dimensional space (6,000-15,000 variables), helps streamline the work flow in a turn-key fashion, produces less biased models, and shrinks the variable space6, 7 for explanatory modeling methods that require low dimensional data sets. 

Recommended improvements to the above turn-key process that might add value would be combining patient finding models with additional differentiation models where possible. For example, the Case 1 disease is in fact a subset of many very similar conditions, and developing an additional model to differentiate between these different conditions might improve accuracy.  As well, building out the ensemble by adding additional member models built within specific data silos, such as the diagnostic code space, the prescribing code space, etc., forces learning and variable selection within a restricted domain. This information might otherwise be lost or under-exploited when all data sources are combined during training, if stronger variables are associated with a single data silo.

A final point with regards to model selection has to do with diversity. While all three algorithms leverage a tree as the base learning agent, the strengths and weaknesses of each algorithm are very different. The complementary nature of these differences make the final ensemble approach particularly powerful, however any number of different algorithms types can and should be evaluated. For example using mixtures of tree-based algorithms, support vector machines, artificial neural networks might results in improved model precision.
About the Authors
Tim Hare, Sr. Manager, CE-Analytics, Symphony Health Solutions, is a statistician and computational biologist with 13+ years of robotic high throughput drug discovery experience, and 14+ years of pre-clinical biochemical research experience. Tim was Sr. Scientist in Informatics at a pharmaceutical firm prior to joining the commercial effectiveness team at Symphony. Tim has an extensive publication record and has received 10 industry awards. He has a bachelor’s degree in Biology from Earlham College, and a master’s degree in Applied Statistics from West Chester University of Pennsylvania, where his thesis focused on the application of ensembles of artificial neural networks in the prediction of drug-target interactions.

Pratibha Sharan is a Senior Consultant with the Commercial Effectiveness Team at Symphony Health Solutions in Conshohocken, PA. Pratibha has completed her MBA from the University of Pittsburgh and has over 5 years of experience working in functions such as Commercial Effectiveness, Marketing Ops, Market Research across big pharmaceutical companies. She is adept in analyzing different types of healthcare datasets, developing actionable insights and project management. She aspires to create continuous improvement, process efficiency and business value in using machine learning as a mainstream tool to drive client strategies.

Ewa J. Kleczyk, PhD is a Vice President of Client Analytics with Symphony Health Solutions in Conshohocken, PA and an Affiliated Graduate Faculty for the School of Economics at the University of Maine in Orono, ME. She also holds a doctorate degree in Economics from Virginia Tech. Ewa leverages over 12 years of industry experience and a strong passion for healthcare to develop key insights from healthcare claims data. Going forward, she is excited about the new and expanded role of machine learning in providing more exact and refined optimization of sales and targeting strategies and resources, as well as their deployment timing.

Derek Evans, Sr. Vice President & General Manager, Symphony Health Solutions, is an intuitive business leader with over twenty years of marketing, analytical, strategic and operational experience in the pharmaceutical and consulting industry. In his current position, Derek leads a multi-million dollar health informatics consulting practice, which is focused on helping improve the delivery of healthcare via real world evidence. Under his leadership, his practice has generated breakout growth for Symphony Health. Over the years, Derek has been recognized by Janssen, Pfizer, Daiichi Sankyo and Novartis for leadership on programs that have improved company profitability, service, delivery and product development.
1 Petersen L, Wright S, Normand S, Daley J. (1999), Positive Predictive Value of the Diagnosis of Acute Myocardial Infarction in an Administrative Database. JGIM 14: 555-558.
2 T.M. Mitchel, Machine Learning, McGraw-Hill Publishing, ISBN-13: 978-0071154673
3 T.J. Hastie and R.J. Tibshirani, Introduction to Statistical Learning, Springer Publishing, ISBN-13 978-1-4614-7137-0
4 A.L. Samuel (1959), Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development, 3:210–229
5 P. Sollich, A. Krogh (1996), Learning with ensembles: How over-fitting can be useful, In D.S.Touretzky, M.C.Mozer, and M.E.Hasselmo, editors, Advances in Neural Information Processing Systems, 8:190-196, MIT Press.
6 R. Genuera, J. Poggiab, C. Tuleau-Malotc, Variable Selection Using Random Forests, Pattern Recognition Letters 31(14): 2225-2236
7 A. Hapfelmeier, K. Ulm, A New Variable Selection Approach Using Random Forests, Computational Statistics & Data Analysis, 60:50-69
8 M. Khalilia (2011), Predicting Disease Risks from Highly Imbalanced Data Using Random Forest, BMC Medical Informatics and Decision Making, 11(1):51-
9 D. Wackerly, W. Mendenhall, R. L. Scheaffer (2008), Mathematical Statistics with Applications, Seventh Edition, ISBN-13: 978-0495110811
10 T.J. Hastie, R.J. Tibshirani (1990), Generalized Additive Models,  First Edition, Chapman and Hall Publishing, ISBN-13: 978-0412343902
11 T.J. Hastie, R.J. Tibshirani, J. Friedman (2016) Elements of Statistical Learning, Second Edition, ISBN-13: 978-0387848570
12 T.K. Ho, (1998), The Random Subspace Method for Constructing Decision Forests, IEEE Transactions on Pattern Analysis and Machine Intelligence. 20 (8): 832–844.
13 R.E. Bellman (1957), Dynamic programming. Princeton University Press.  ISBN 978-0-691-07951-6., Republished: R.E. Bellman (2003), Dynamic Programming. Courier Dover Publications. ISBN 978-0-486-42809-3.
14 R.E. Bellman:
15 Kégl, Balázs, (2013), The return of AdaBoost.MH: multi-class Hamming trees, arXiv:1312.6086 Freely accessible [cs.LG]
16 L. Stokowski, W. Gahl (2015), Pursuing Elusive Diagnoses for Rare Diseases,
17 M.A. Vedomske, D.E. Brown, J.H. Harrison (2013), Random Forests on Ubiquitous Data for Heart Failure 30-Day Readmissions Prediction in Machine Learning and Applications (ICMLA), 12th International Conference on Machine Learning
18 J. Wennberg, N. Roos, L. Sola, A. Schori, R. Jaffe (1987), Use of Claims Data Systems to Evaluate Health Care Outcomes. Mortality and Reoperation Following Prostatectomy. JAMA 257(7): 933-936.
19 Y. Zhao, R. Ellis, A. Ash, D. Calabrese, J. Ayanian, J. Slaughter, L. Weyuker, B. Bowen (2001), Measuring Population Health Risks Using Inpatient Diagnoses and Outpatient Pharmacy Data. Health Services Research, 36(6): 180-193.
20 P. Hougland, J. Nebeker, S. Pickard, M. Tuinen (2008), Using ICD-9-CM Codes in Hospital Claims Data to Detect Adverse Events in Patient Safety Surveillance., in Advances in Patient Safety: New Directions and Alternative Approaches (Vol. 1: Assessment), Rockville (MD): Agency for Healthcare Research and Quality; Publication No.: 08-0034-1
21 D. Bertsimas, D. Czerwinski, M. Kane (2013), Measuring quality in diabetes care: an expert-based statistical approach. SpringerPlus 2:226-
22 S. Marsland (2009), Machine Learning: An Algorithmic Perspective.  Chapman & Hall Publishing, First Edition, ISBN-13: 978-1420067187
23 M. Negnevitsky (2005), Artificial Intelligence: A Guide to Intelligent Systems. Addison-Wesley Publishing, Second Edition.  ISBN: 0321204662
24 N. Meinshausen and P. Buhlmann (2010). Stability Selection. J Royal Statistical Society Series B, 72(4):417–473.
25 R package RPART:
26 R package RandomForest:
27 R package ADABoost: