Friday, November 24, 2023

How Confident Are We of Machine Learning Model Predictions?

When we build binary classification models using algorithms like Neural Networks, XGBoost, Random Forests, etc., we get as an output of the models a prediction that ranges from 0 to 1. But how sure are we that this is a stable prediction? Does a score of 0.8 really mean 0.8? There is a difference between 0.8 +/- 0.05 and 0.8 +/- 0.4 after all! 

One reason we love models grounded in statistics is that because of the strong assumptions they have, we can compute many metrics to provide insight into how sure we are the the coefficients are correct and what confidence intervals exist for model predictions. For example, see "Calculating Confidence Intervals for Logistic Regression" (https://stats.stackexchange.com/questions/354098/calculating-confidence-intervals-for-a-logistic-regression) or books like "Applied Linear Statistical Models" (https://a.co/d/a7BR3pa). 

However, for other model types (non-parametric for example), we don't have the benefit of these kinds of measures. Over the past decade or more, when I've needed this kind of information, I've used bootstrap sample this way:

  1. Build my model. This is the baseline. For the testing data set, each record gets a score. 
  2. Create 100 bootstrap samples of the training data. 
  3. Build 100 models (one for each bootstrap sample) using same the protocol as the baseline model
  4. Run each model through the testing set. We now have 100 scores for every record in the test set...a distribution of scores
  5. compute the 90% confidence interval equivalent by identifying the probabilities (or model scores) at  5th and 95th percentiles (ranks 5 and 95 of the 100 scores). For the 95% confidence interval, one would need to interpolate between 2nd and 3rd, and also the 97th and 98th ranked scores
This works fine for any algorithm. However, I'd never seen a formal treatment of this topic; this is really to my discredit as I had never really done a signfiicant search to find any theory related to this topic.

At this year's Machine Learning Week Europe (https://machinelearningweek.eu/), there was a talk on this subject given by Dr. Michael (Naatz) Allgöwer (https://www.linkedin.com/in/allgoewer/) entitled "NFORMAL PREDICTION: A UNIVERSAL METHOD FOR UNCERTAINTY QUANTIFICATION" that introduced another way of accomplsihing this objective. A Wikipedia summary of the approach is here (https://en.wikipedia.org/wiki/Conformal_prediction). I like what I've heard from Dr. Allgöwer at the conference and would like to experiment with this approach to learn how it works and what limitations might exist for the approach.

I hope to compare the approaches, with pros and cons, in the coming weeks. Stay tuned!



Thursday, November 02, 2023

What if Generative AI Turns out to be a Dud?

 I follow posts on twitter from different sides of the generative AI debates, including Yann LeCun (whom I've followed for decades) and Gary Marcus (whom I discovered just in the past few years). I'll post at some other time about my views, but found this post by Marcus to be intriguing. I first published my comments here on LinkedIn 


Key quotes at the end of the article, 


"Everybody in industry would probably like you to believe that AGI is imminent. It stokes their narrative of inevitability, and it drives their stock prices and startup valuations. Dario Amodei, CEO of Anthropic, recently projected that we will have AGI in 2-3 years. Demis Hassabis, CEO of Google DeepMind has also made projections of near-term AGI.

I seriously doubt it. We have not one, but many, serious, unsolved problems at the core of generative AI — ranging from their tendency to confabulate (hallucinate) false information, to their inability to reliably interface with external tools like Wolfram Alpha, to the instability from month to month (which makes them poor candidates for engineering use in larger systems)."

This is exactly how it comes across to me and is consistent with what I've experienced myself and what my closest colleagues who have used generative AI have also experienced.