All notes
Technical·December 2024

LIME and Why Interpretability Matters


LIME (Local Interpretable Model-agnostic Explanations) explains individual predictions from any ML model by approximating the model locally with something interpretable — usually a linear model.

The idea: you have a model that says "this is a wolf." LIME asks: what inputs, if changed, would change the prediction? It perturbs the input, observes what happens to the output, and fits a simple model to that local relationship.

The result: you can see that the model focused on snow in the background, not the wolf — which means it learned "wolf = snow context," not "wolf = wolf features." The model was right for the wrong reason.

Why this matters: models can learn the right answers from the wrong evidence. Without interpretability, you won't know until deployment exposes the failure mode.

My IEEE paper used LIME for species identification from images — looking at what features a classifier actually learned to distinguish species, and whether those features made biological sense. In several cases, they didn't. The model had learned artifacts of how the images were captured, not features of the organisms.

The general lesson: a model that scores well on a test set is not the same as a model that learned the right thing. These are different claims. Interpretability is how you check.