Assuming you don't need the extra address space, what are those? End-user improv...

Scaevolus · on Feb 1, 2017

Assuming the most performance intensive tasks most users do is browse the web, it's probably a win. Webkit's garbage collector is conservative-- it scans all memory looking for possible pointers-- so it runs more efficiently on 64-bit architectures where valid pointers are unlikely to randomly occur. Lots of JIT tricks like tagged pointers really work best on 64-bit architectures. https://webkit.org/blog/7122/introducing-riptide-webkits-ret...

It's not just extra registers and address space. ARM64 is a large-scale restructuring to better enable modern processor design, including things like removing barrel shifting on every instruction (expensive and not very useful) and removing predication on every instruction (expensive and makes OoO machinery slower). http://stackoverflow.com/a/26841196

comex · on Feb 1, 2017

On x86, it's just the extra registers, and indeed 32-bit can be faster. But AArch64 is essentially a completely different ISA from AArch32. Here's an old benchmark from Apple's first 64-bit CPU that shows a significant improvement on benchmarks in 64-bit versus 32-bit mode on the same device:

http://www.anandtech.com/show/7335/the-iphone-5s-review/4

Of course, this may be an idiosyncrasy of that processor implementation, or it may not generalize well to real applications which tend to stress the instruction cache more...

johncolanduoni · on Feb 1, 2017

Being able to rely on at least SSE2 is a big win for 64-bit on x86.

Technically you could use 32-bit addresses on a 64-bit processor and not have your data be any bigger, but most languages don't offer a good way to do this (IIRC the HotSpot JVM's UseCompressedOops option does something along these lines).

andreyv · on Feb 1, 2017

This is actually implemented on Linux/glibc: https://en.wikipedia.org/wiki/X32_ABI

You can compile C/C++ programs using this ABI, and get all 64-bit benefits without 64-bit pointers.

ktta · on Feb 1, 2017

>higher cache miss rate from the larger code and data sizes.

Can you explain how these are related? What do you mean by "larger code"? I'm assuming you're talking about the binaries, rather than actual code.

I'd agree with you on the large data size but I really don't think 32->64 bit increase will make much difference overall unless someone is using a lot of existing numerical data. Most of the data is taken up by multimedia resources and this won't really be affected by the architecture.

More registers are always good, unless you have a terrible compiler. And since Apple uses LLVM a lot, and actually takes care of the compiler backend. I don't really see caching problems. AFAIK there will be more registers to store data in so caches will actually be used less than usual.

berkut · on Feb 1, 2017

Pointer size doubles, so any data structure (linked list, tree, etc, etc) which uses raw pointers directly will need more memory to store the same items.

Normal way of negating this increase is to use uint32_t offset items into a master array of the items the pointers represent, which reduces the size again and has a side benefit of possibly making the items more coherent (localised) in memory.

seepel · on Feb 1, 2017

I believe there are actually some cool things they do with the extra space in the Objective-C runtime. It's been a while so I may be mistaken but I think they store the reference count in the pointer to avoid a lookup to a separate table. There might also be some trickery in storing NSNumber values in the pointer itself as well.

EDIT: Grammar

saurik · on Feb 1, 2017

As the 64-bit CPUs used on the iPhone actually only have 33-bits of address space (undermining the benefits of having more address space ;P), Apple does use the extra space in a pointer to store "tagged" values in some cases, but for the most part this can be seen as a "mitigation": as the size of a pointer is now 64-bits, and as the alignment of a lot of things is now 64-bits because of that, processes are larger and heavier than on 32-bit systems, and thereby require more memory bandwidth. Sharing some of the bits of a pointer is essentially required to win back this lost performance. (Another great option is a 32-bit ABI using the 64-bit instruction set, as with x32 on Intel, but Apple did not go that direction.)

pjmlp · on Feb 1, 2017

Improved security with care for performance, little things like SGX, MPX, IO and GPU DMA in hypervisors, for example.

saurik · on Feb 1, 2017

How about coming up with an example for ARM, one that doesn't come from using the new chipset but which requires the new instruction set (and particularly one which somehow precludes supporting older applications). AFAIK, ARM64 maybe offers some minor benefits to floating point operations, but mostly just provides access to some more registers... an advantage which sometimes matters but is often so dubious that on 32-bit ARM a lot of performance oriented code would be compiled to Thumb-1 even though it had half as many registers, as the code would load faster and take up less space in the cache.

pjmlp · on Feb 1, 2017

Parent stated:

> In practice only two architectures matter: x86 and ARM.

So I replied about x86, because that is the architecture I know best.