No c'mon. I skimmed 10 pages of this and got as far as discovering that Parzen Window density functions are a thing, but this isn't my language. Care to mile-high why this is important? 'cuz that's gonna be a much more interesting discussion to me than the raw paper.

There are some learning algorithm you need to know more math than the average undergraduate just to understand what sort of object the inputs and outputs are, and in real applications some voodoo often happens that the authors don't really understand either. We've talked about this. What these guys propose is learning a simpler model of the model learned by the complicated algorithm, with inputs and outputs that are easy to explain and, usually, generate a visualization for and for which the relationship between input and output is also relatively simple. They propose that, if the relationship between inputs and outputs of the second model is close enough to the first *for points near those particular inputs and outputs*, then a visualization or some other hint at what's going on, of the second model can serve as an explanation for what actually happened. If that really is an acceptable use of "explanation", then we have a way for black magicy learning algorithms to be at least a little transparent.

That's what I figured; I guess I'm a little flabbergasted that this sort of thing isn't particularly rare; for example, Netflix, Google Music, Last.fm and everybody else will recommend something to you and then say "based on watching X Y and Z" which is kind of what we're talking about here, right?

So how is that different than what's going on here? Because in reading what I can understand, it doesn't look like a mile-high "here's how you convince the savages to like you" it looks deeper than that. So is the magic in the generation of the simpler model?

Recommenders are pretty simple, hence so many companies implementing them, and so not the type of algorithm that needs something like this. They're hard to implement, because you're juggling really big sparse matrices and you need to be a little clever to make them tractable, but algorithmically there's not much to them. Here's a bare-bones but representative recommendation algorithm:

1) Assign some index to each item in your inventory, and one to each of your customers

2) Form a matrix A such that a[i][j] is 1 if user i purchased item j

3) Perform singular value decomposition on A, and keep only the first some-arbitrary-number of singular values. This gives you a lower-dimensional space that approximates the very-high dimensional space. The intuition here is that because purchases of related things will be correlated, so by mapping into the lower-dimensional space produced by the SVD, which combines correlated axis, you're going from particular items to general interests.

4) Find k users nearest to a given user in the lower-dimensional space. Cosine similarity is the favorite way to define nearness. Rank items based on how many of those k purchased them, and recommend either the first some-arbitrary-constant or those with a rank greater than some-other-arbitrary-constant.

You can generate a "based on watching X, Y and Z" explanation from that by looking at the rankings of the items the user themselves purchased.

The models that are hard to explain aren't that simple.

Awright. So if I understand you correctly, you just explained (concisely, and in an easy-to-follow format) how a recommendation engine works. What's different here is that the paper is discussing a *justification* engine, which if I'm following along correctly, is kind of the equivalent of working backwards.

Am I close?

So since "working backwards" is a stupidly-wrong analogy, what makes the "justification engine" so much harder than the recommendation engine, and what's the cleverness here? Because if I'm prepped, I might just understand instead of feeling like my knuckles are hairy.

The justification engine shouldn't be hard, otherwise it's no good for making explanations. What they're doing is giving criteria for what makes a good simple model of a very complicated model, and using that to learn a member of a particular class of simple model (sparse linear classifiers; draw a few lines, classify points based on what sides of the lines they lie on) to approximate some arbitrary complicated model.

So, say you have a bazillion layer deep learning model that you're using to classify people as terrorists or not terrorists. No one understands the bazillion layer model, not even the authors, they just know that it performs well enough on the testing set. You're just asking your users to trust you when you tell them that Little Timmy's kitten is a threat to national security. Now, you probably couldn't use a simpler model to do the classification, otherwise you would have saved yourself a lot of trouble and a lot of waving your hands at scary guys with crew cuts, but you might be able to approximate it locally, in a way roughly analogous to approximating a complex surface with tangent planes, with a simpler model, and then you get the explainability of the simpler model but the accuracy of the complex model. Then when your algorithm tells the FBI to investigate Little Timmy's kitten, you don't have to shrug and mumble about doing funny things with tensors that have no relation to kittens and terrorists you can can see much less explain, you can use the human-understandable approximation to see that Little Timmy has a chemistry hobby and his parents thought it would be cute to give him some supplies as a present from the cat, and that those chemicals happen to also be useful for making bombs. Then you don't loose the trust of your users, because your algorithm did a stupid thing, but they can see that it was stupid in the idiot savant way computers are stupid and not because it just doesn't work.

edit: so tldr, the clever thing here is using simple models to approximate a complicated model locally, so you can use the complicated model to give you better classifications and the simpler model to give an explanation of why it gave the classification it did, and are justified in explaining the complicated model in terms of the simple one because the simpler model is a good *local* approximation of the complicated model.

Okay. So if I understand you correctly, what's being described here is an algorithm and a process whereby an unknowable AI model can be synthesized down to a knowable AI model, basically by highlighting and relating the big peaks that caused the unknowable AI model to make its prediction in such a way that it's giving a relatable "slice" through the data.

So while the algorithm as presented works on "is this dot black or white" or "is 7 more or less than 10" (I looked up "sparse linear classifier"), the theory would be that this method of computation could eventually lead to "The Weather Channel predicts it's going to rain Tuesday afternoon because this pressure profile has led to rain 30% of the time, there's a wave of humidity sweeping north out of the Gulf, the jet stream is acting weird and there are half again as many sunspots as normal" out of a dataset that includes all of the above plus eleventy dozen other things.

Close?

Appreciate your patience. Statistics was a long time ago...

Yes, that's pretty much it. Sparse linear classifiers aren't as simplistic as your googling led you to believe; look at figure 4, where they carved out pixels of the image that led to the three classifications they got for it. Their algorithm could also use some other easily comprehensible model than sparse linear classifiers, just like all learning algorithms you have to decide in advance the sort of model you're going to learn.

Copy copy. Thanks. I saw the dog and his guitar and learned what a superpixel was but the math was too rigorous for me to follow along without a spotter. Last question: what is it about their approach that's novel, and why hasn't an approach like this been attempted before? "Parzen windows", whatever they are, appear to be like 50 years old so I have to assume attempts at doing stuff like this has to have been around for as long as AI itself... but again, I'm a plebian.

There have been a lot of symbolic AI programs that could explain themselves, because it's relatively easy to explain what your program is thinking when your program does its thinking by constructing a proof. I'm not aware of many attempts to do it with learning algorithms, and the authors only cite three.

If you're looking at machine learning as modeling a kind of thinking instead of just computational statistics (always a thing to be cautious about), it's modeling the kind of unconscious thinking you have a hard time explaining yourself. How do you know that's your friend standing in the crowd over there? You just recognize them, that's all. How do you walk without falling down? You just do it. How do you interpret a bunch of sounds as words? ...

That's why papers contain conclusions in the end ;)

- In this paper, we argued that trust is crucial for effective human interaction with machine learning systems, and that explaining individual predictions is important in assessing trust. We proposed LIME, a modular and extensible approach to faithfully explain the predictions of any model in an interpretable manner. We also introduced SP-LIME, a method to select representative and non-redundant predictions, providing a global view of the model to users. Our experiments demonstrated that explanations are useful for trust-related tasks: deciding between models, assessing trust, improving untrustworthy models, and getting insights into predictions.

It boils down to finding a way where AI can asses if it can trust feedback or now and fallback on other models intelligently. At least that was my understanding.