Data Skeptic

Paxos is a protocol for arriving a consensus in a distributed computing system which accounts for unreliability of the nodes.  We discuss how this might be used in the real world in the event of a massive disaster.

Direct download: paxos.mp3
Category:general -- posted at: 8:00am PDT

Machine learning models are often criticized for being black boxes. If a human cannot determine why the model arrives at the decision it made, there's good cause for skepticism. Classic inspection approaches to model interpretability are only useful for simple models, which are likely to only cover simple problems.

The LIME project seeks to help us trust machine learning models. At a high level, it takes advantage of local fidelity. For a given example, a separate model trained on neighbors of the example are likely to reveal the relevant features in the local input space to reveal details about why the model arrives at it's conclusion.

In this episode, Marco Tulio Ribeiro joins us to discuss how LIME (Locally Interpretable Model-Agnostic Explanations) can help users trust machine learning models. The accompanying paper is titled "Why Should I Trust You?": Explaining the Predictions of Any Classifier.

Direct download: trust-in-ml.mp3
Category:general -- posted at: 8:00am PDT

Analysis of variance is a method used to evaluate differences between the two or more groups.  It works by breaking down the total variance of the system into the between group variance and within group variance.  We discuss this method in the context of wait times getting coffee at Starbucks.

Direct download: anova.mp3
Category:general -- posted at: 8:00am PDT

When humans describe images, they have a reporting bias, in that the report only what they consider important. Thus, in addition to considering whether something is present in an image, one should consider whether it is also relevant to the image before labeling it.

Ishan Misra joins us this week to discuss his recent paper Seeing through the Human Reporting Bias: Visual Classifiers from Noisy Human-Centric Labels which explores a novel architecture for learning to distinguish presence and relevance. This work enables web-scale datasets to be useful for training, not just well groomed hand labeled corpora.

Direct download: ishan.mp3
Category:general -- posted at: 8:00am PDT

Survival analysis techniques are useful for studying the longevity of groups of elements or individuals, taking into account time considerations and right censorship. This episode explores how survival analysis can describe marriages, in particular, using the non-parametric Cox proportional hazard model.

This episode discusses some good summaries of survey data on marriage and divorce which can be found here.

The python lifelines library is a good place to get started for people that want to do some hands on work.

Direct download: survival-analysis.mp3
Category:general -- posted at: 8:00am PDT

This week is an insightful discussion with Claudia Perlich about some situations in machine learning where models can be built, perhaps by well-intentioned practitioners, to appear to be highly predictive despite being trained on random data. Our discussion covers some novel observations about ROC and AUC, as well as an informative discussion of leakage.

Much of our discussion is inspired by two excellent papers Claudia authored: Leakage in Data Mining: Formulation, Detection, and Avoidance and On Cross Validation and Stacking: Building Seemingly Predictive Models on Random Data. Both are highly recommended reading!

Direct download: Predictive_Models_on_Random_Data.mp3
Category:general -- posted at: 8:00am PDT

An ROC curve is a plot that compares the trade off of true positives and false positives of a binary classifier under different thresholds. The area under the curve (AUC) is useful in determining how discriminating a model is. Together, ROC and AUC are very useful diagnostics for understanding the power of one's model and how to tune it.

Direct download: roc-auc.mp3
Category:general -- posted at: 8:00am PDT

I'm joined by Chris Stucchio this week to discuss how deliberate or uninformed statistical practitioners can derive spurious and arbitrary results via multiple comparisons. We discuss p-hacking and a variety of other important lessons and tips for proper analysis.

You can enjoy Chris's writing on his blog at and you may also like his recent talk Multiple Comparisons: Make Your Boss Happy with False Positives, Guarenteed.

Direct download: multiple-comparisons.mp3
Category:general -- posted at: 8:00am PDT

If you'd like to make a good prediction, your best bet is to invent a time machine, visit the future, observe the value, and return to the past. For those without access to time travel technology, we need to avoid including information about the future in our training data when building machine learning models. Similarly, if any other feature whose value would not actually be available in practice at the time you'd want to use the model to make a prediction, is a feature that can introduce leakage to your model.

Direct download: leakage.mp3
Category:general -- posted at: 8:00am PDT

Kristian Lum (@KLdivergence) joins me this week to discuss her work at @hrdag on predictive policing. We also discuss Multiple Systems Estimation, a technique for inferring statistical information about a population from separate sources of observation.

If you enjoy this discussion, check out the panel Tyranny of the Algorithm? Predictive Analytics & Human Rights which was mentioned in the episode.

Direct download: predictive-policing.mp3
Category:general -- posted at: 8:00am PDT