“It shouldn’t be able to take an image, slightly tweak the pixels, and completely confuse the network,” he said. “Neural networks blow all previous techniques out of the water in terms of performance, but given the existence of these adversarial examples, it shows we really don’t understand what’s going on.”

    The research has its limitations: Now the attackers needs to know the inner workings of the algorithm they’re trying to fool. However, past research has been shown to work on black-box systems, or proprietary algorithms unknown to the attacker. Athalye says the team will pursue that area of research next.



veen:

Clever! Here's the paper by the way and here's the clever part:

    ...prior work has shown adversarial examples’ inability to remain adversarial even under minor perturbations inevitable in any real-world observation (Luo et al., 2016;Lu et al., 2017). To address this issue, we introduce Expectation over Transformation (EOT); the key insight behind EOT is to model such perturbations within the optimization procedure. In particular, rather than optimizing the log-likelihood of a single example, EOT uses a chosen distribution T of transformation functions t taking an input x′ generated by the adversary to the “true” input t(x′) perceived by the classifier.

If anyone here wants more explanation than just 'the algorithm is a mystery' and has an hour to spare, I really like 3Blue1Brown's explanation. It is a bit slow-paced but explaines the logic and intuition behind ML math without resorting to too many shortcuts. Here's all three videos.


posted 2376 days ago