As a programmer, there is a lot of misinformation in that article. I have never used a program where being able to access more than 4 GB of physical RAM mattered. All of the below reasons apply even if you only have 256MB of RAM: 1. Being able to access more than 4 GB of virtual RAM (e.g. with mmap), or even just allowing holes and not worrying about fragmentation, is a huge win.
2. The 64-bit instruction set is significantly different than the 32-bit instruction set. First, there are twice as many registers, so there is much less stack spilling. The default C ABI passes several arguments in registers instead of passing everything on the stack, so again less stack access. And 64-bit processors are guaranteed to include the (2x-4x faster) SIMD instructions (which, since they're not present on all 32-bit processors, will not be used even if your processor supports them).