Big-Endian “is effectively dead”

bandrami · on April 28, 2015

ARM still lets you switch endianness, but nobody other than me ever seems to (and they claim they will probably deprecate that going forward).

What is really alarming to me is that I occasionally run into middle-endian systems on 64-bit chips (two little-endian doubles in big-endian relative order, to signify a single quad). This is an abomination and must be killed with fire.

_wldu · on April 28, 2015

They do. They are bi-endian and default to little on most systems I have seen. The only real big endian system I have left is an old sun Netra.

panzi · on April 28, 2015

Well, the XBox 360 was big endian. Not very recent but newer than Sun Netra.

tritium · on April 28, 2015

And it's funny you should mention Sun Microsystems, because this is the influence behind Java's endian-ness, and subsequently Dalvik (mentioned below in another comment), which happens to be... you guessed it... big endian.

So the big endian gene lives on in the JVM and its relatives.

gradstudent · on April 28, 2015

Java is seriously nasty if you need to work with bitpacked representations of data structures. Big endian nonsense, no unsigned types, booleans that cannot be converted to integer types, no typedefs... gahhhh! It's really freaking hard to write optimised code in this stupid language.

Alupis · on April 28, 2015

> no unsigned types

Java 8 has unsigned types[1]

> boolean that cannot be converted into integer types

Don't know why you would want to do that when a boolean is a primitive type... and also has an object variant, but...

int myInt = (myBoolean) ? 1 : 0;

> no typedefs

A typedef is really like a java bean or class object... it's just a custom data structure.

> It's really freaking hard to write optimised code in this stupid language.

Not true, some of the most highly performant systems on the planet run Java... HFT, stock exchanges, banking, nuclear plant control systems, etc...

There's also the JVM with it's optimizing compiler... one of the best (the best?) optimizing compilers around. Long running Java applications eventually compile hot paths down to native machine instructions, achieving C performance without a lot of the hassle.

But then again, Java wasn't intended to do bit-twiddling, it's a higher abstraction.

[1] https://docs.oracle.com/javase/8/docs/api/java/lang/Integer....

TheLoneWolfling · on April 29, 2015

> Java 8 has unsigned types

Right. Now try that with a long, and you're replacing every piece of code with something that's, what, 20x more verbose? If not more? And becuase of how bad the JVM is, substantially slower.

Look at JGit.

> int myInt = (myBoolean) ? 1 : 0;

Now try to do that throughout the code. Not to mention which, that's a conditional branch for what should be (and is in bytecode) a no-op. (Well, assuming your bytecode is well-conditioned. But the JVM being the JVM, there's no such thing as a boolean, which means you can have the "boolean" value 2, for instance, which seriously messes all sorts of things up.)

Sure, the JVM is supposed to optimize that out. But you can't rely on it being able to do so. At least not if you're not going to use a mainstream JVM.

And no, the JVM is not "one of the best" optimizing compilers around. It's not even a good optimizing compiler. Nowhere near. For a quick counterexample, this: https://github.com/RS485/LogisticsPipes/commit/bb8a57665c4f8...

Halving the time taken. Why? Because the JVM wasn't smart enough to realize that copying an EnumSet for a readonly foreach loop was unnecessary. Oh, and there's more low-hanging fruit there w.r.t. the amount of work (read: reflection) EnumSet has to do behind the scenes to work around Java's type system. Simple. But no, the Java compiler doesn't optimize it out, and the JVM doesn't either.

And that's with adding an additional unnecessary layer of abstraction (unmodifiableSet) that you wouldn't need if Java had a sane type system.

Copying an EnumSet of a small number of elements would be, in a sane language, quite literally just a register move. Ditto, containsAll a bitwise-not and a bitwise-and. But no, "one of the best" optimizing compilers around cannot even do that.

Trying to write high-performance code in Java generally requires tossing out all the supposed advantages of Java. Write your own object pools, whee! Hardcode your own primitive types, whee! Avoid temporary objects, whee! Avoid using polymorphism, whee! Manually unpack arrays inside objects, whee!

mreiland · on April 29, 2015

My experience with a lot of 'Javaheads' is that they get a little too emphatic about Java's optimizations, most especially the claim that it can out perform C.

BUT

In this case, to be fair, I took his statement to include Hotspot, which can do some pretty cool stuff provided you meet the requirements for it. ie, be long running, have enough memory and horsepower available on the machine to run hotspot, and have paths through the code that rarely jump around (meaning most executions go through the same path).

If you can meet those requirements, my understanding is that the Java ecosystem does a damned fine job.

The issue is that a lot of javaheads will extrapolate that out to the rest of the language and tech and start making claims about Java being the best overall at X, or as fast as language Y (C or C++, take your pick).

TheLoneWolfling · on April 29, 2015

This was something that was supposed to be about the single best case for hotspot: a single code path almost always, inside a single while loop that's doing the same thing over and over, with a couple sanity / bail-out checks that are rarely (if ever) called.

And it still didn't optimize something as trivial as avoiding an unnecessary copy that it was taking ~50% of the time doing.

It's too bad - Java is a fine language in many ways (though it tends to be rather overly verbose for no good reason, but meh. Looking at you getters and setters and lack of operator overloading), but it's saddled with a reliance on the arcane to actually get non-hideous performance out of it. I mean: 13ms per copy of what should be a single integer? (Milliseconds! I'm not joking. 595ms inside 47 calls to EnumSet.copyOf (mainly inside Object.clone))

That's, and I'm saying this quite literally, more than a million times slower than what it should be.

Assuming it needs to be done at all, and you can trivially show that it doesn't.

(That being said, I need to explicitly check that Hotspot does do it's full optimization pass on that chunk of code. I see no reason why it wouldn't, but maybe Hotspot doesn't want to. Though that'd be a WTF in and of itself.)

(On a side note: does Java cache hotspot optimizations? I think it does, in which case there's definitely no excuse. And if it doesn't that's a wtf in and of itself.)

(On another side note: is there a Java bytecode-to-bytecode optimizer that'll do optimizations based on the code you've got in front of you now?)

mreiland · on April 29, 2015

I honestly don't know too much about Java and its technologies outside of a general understanding of it. I specifically chose to stay out of the Java ecosystem years ago because I disliked the Java community as a whole. They had a real beef with C and C++ being more performant and constantly pushed and railed against both C and C++ to the point of being what I considered completely divorced from reality.

Java as a tech is strong, but Java as a community was full of pretentious assholes who had a complex about performance (in my opinion of course).

I have no doubt your example was most likely due to some technical issue preventing hotspot from doing what it should have. When hotspot can do it's work it's amazing, you just have to enable it, and you're right about doing arcane things to get performance. That's true in any GC'd language though, even .Net has it's boogeymen.

TheLoneWolfling · on April 29, 2015

This isn't anything to do with GC.

This is purely an optimization issue, and one that can be done regardless of if a language is GC'd or not.

mreiland · on April 29, 2015

You're speaking of this specific issue, I'm speaking in general.

KMag · on April 28, 2015

> no unsigned types,

... except char.

You also forgot to mention their odd utf-8-ish encoding of strings in class files.

TkTech · on April 28, 2015

Eh, there's been far worse and their justification for it kinda-almost makes sense. They were trying to avoid 00 in the strings at all costs. It's simple enough to encode and decode: https://github.com/TkTech/Jawa/blob/master/jawa/util/utf.py#...

KMag · on April 28, 2015

Avoiding nulls wasn't the messed up part. Encoding UTF-16 surrogate pairs separately as pseudo-UTF-8 was the weird part. Some codepoints that would be 3 bytes in UTF-8 or 4 bytes in UTF-16 wind up as 6 bytes in Java's pseudo-UTF-8.

TkTech · on April 28, 2015

The encoding for the surrogate pairs is a spec (not a standard) called CESU-8[1]. The modified UTF-8 in the ClassFiles is really just CESU-8 with an exception for U+0000.

1: http://www.unicode.org/reports/tr26/

Dylan16807 · on April 29, 2015

And that spec is basically "We screwed up the UTF-16 support and wrote a UCS-2 to UTF-8 converter instead. Um, well, it works, don't touch it."

dtech · on April 28, 2015

It's also not really meant/suited for that level of optimization. At some point it's better to use JNI or something.

justincormack · on April 28, 2015

I still have my Apple G5, a rare bigendian only powerpc...

marssaxman · on April 28, 2015

Eh? All the PowerPCs were big-endian.

justincormack · on April 29, 2015

No, almost all were dual endian.

weland · on April 28, 2015

Linkbait much? Further down below, Torvalds explains precisely why BE is anything but dead. All that's dead is general-purpose CPUs using it, mostly because x86 happened.

http://geekandpoke.typepad.com/geekandpoke/2011/09/simply-ex...

(Edit: which is precisely why I prefer BE for anything, unless there is a pressing - usually hardware- and performance-related - requirement for the opposite. Computers are good at reading bytes in swapped order. I suck at it.)

loopbit · on April 28, 2015

I think Linus' comment is more on the lines of picking an endian-ness and sticking with it rather than having to check the specific stream and swap bytes conditionally.

He specifically mentions cases where the data is stored in BE and the program reads it, swaps it, operates with it, swaps it back again and writes it.

The BE comment seems to me to be of much less importance than the other message (kind of like a side comment). And as you say, BE is very much alive today.

weland · on April 28, 2015

Absolutely. Encoding byte order and swapping or not swapping dynamically is a silly solution to a problem no one had in the first place.

justincormack · on April 28, 2015

Well, and IBM switched Power to little endian recently (the machines are dual endian, but all the new OSs are littl endian only). Power was the main BE architecture left. Most arm machines are dual endian, and NetBSD supports both, but almost everyone uses little endian on arm. There is a bit of big endian mips around still, eg the Cavium network appliances.

weland · on April 28, 2015

In the meantime, almost every bytestream sent over networks on planet Earth is BE, regardless of whether it's sent by or to a LE or BE machine.

JoeAltmaier · on April 28, 2015

Which is the great tragedy of our age. What a colossal waste

smorrow · on April 28, 2015

It's the same C code doing IP, going through the same steps and the same number of steps, whether the underlying machine is little- or big-endian. So if you're saying about wasted cycles...

JoeAltmaier · on April 28, 2015

The waste is every little-endian machine (almost all machines) converting every packet to big-endian for 'the wire', then converting back again when received. Its all a pointless self-flagellation.

smorrow · on April 28, 2015

Any sensible code is going to have to do that stuff whether the machine's endianness matches IP's or not, though. It happens on BE boxes too. It's either that or write completely different code for different CPUs.

limeyx · on April 28, 2015

Not for networking...ntohl/htonl and friends are no-ops (macros) on BE machines ... zero overhead

JoeAltmaier · on April 28, 2015

The OPs point was, there are essentially no more BE boxes.

astrodust · on April 28, 2015

Shuffling bits around takes basically zero effort, there's often a CPU instruction to flop them. Re-doing everything to be the opposite polarity takes tons of effort.

wyager · on April 28, 2015

> What a colossal waste

How expensive do you think endianness-swapping is?

jcranmer · on April 28, 2015

<http://www.agner.org/optimize/instruction_tables.pdf>

About 1 clock cycle per word.

jdmichal · on April 28, 2015

Fixed link:

http://www.agner.org/optimize/instruction_tables.pdf

JoeAltmaier · on April 28, 2015

Expensive in having to do it at all - code complexity, confusion between HBO and NBO buffers, heck the fact that nobody can even receive an opaque packet at all! You have to know every field to correctly normalize it on receive. This impacts code at every layer of the network.

curun1r · on April 28, 2015

Miniscule multiplied by massive.

StillBored · on April 28, 2015

And while it was probably a "good" idea, the implementation was terrible. Absolutely zero binary compatibility between SLES11 on a POWER7 and SLES12 on a POWER8 (same for RHEL). I think the official line was that compatibility comes in the form of being able to run the older OS in a VM...

This is worse than the HP-UX PA to IA transition, the mac 68k to PPC, or PPC to x86, transition/etc.

The least they could have done was some kind of elf loader, and some fat libraries that flips the execution mode for one generation while everyone transitions.

The stated claim was to make it easier for people to transition to POWER from x86 (aka little porting effort), but what they really did was screw their existing customers by requiring them to port their software (big porting effort). I question, why if I'm going to spend the effort to port from BE POWER to LE POWER why I wouldn't just port to x86...

bandrami · on April 28, 2015

IBM also for years screamed at devs that you can't rely on a PPC system being BE. Which means all oldworld MacOS software was at least theoretically written by people who couldn't tell you what byte order they were writing for. (How seriously the devs took that admonition is a different question.)

duskwuff · on April 28, 2015

The PowerPC architecture was bi-endian, but classic Mac OS was very definitely big-endian. The only situation where the CPU would switch to little-endian would be if it was running Virtual PC. (And even then, it'd switch back for other applications.)

bandrami · on April 29, 2015

Sorry, yeah, I stated that badly:

Apple kept screaming at devs that IBM couldn't be guaranteed to keep the arch bi-endian, so Apple couldn't guarantee a big-endian platform and developers shouldn't assume one.

vonmoltke · on April 28, 2015

I was writing real-time signal processing software for POWER as recently as three years ago. I'm not even aware that was IBM's position, and they were our partners on the venture.

That said, it really didn't matter because all our software had to build and run on x86-64 anyway, so all the endian-specific code was wrapped in preprocessor directives. We could have changed the endianness with a single makefile change.

bandrami · on April 29, 2015

I'm not even aware that was IBM's position

Yeah, I said it badly. IBM wouldn't guarantee Apple that Power would always support BE, so Apple pushed (or tried to) developers not to assume that.

vonmoltke · on April 29, 2015

That makes sense. IBM knew we were focused on the POWER7 blades and that our backup plan was their Intel blades, so they probably figured the warning wouldn't be relevant to us.

aduitsis · on April 28, 2015

SPARC is also big endian.

eliben · on April 28, 2015

PowerPCs are still 32 bit though, and fairly popular in the uController space [http://en.wikipedia.org/wiki/PowerPC]

vonmoltke · on April 28, 2015

POWER != PowerPC. POWER processors through the POWER4 generally implemented the PowerPC ISA, but were not "PowerPC" chips. Also, there is a 64-bit PowerPC ISA; that is what the POWER4 implemented. It just doesn't get much use in embedded contexts because it isn't necessary and costs more.

jk · on April 28, 2015

There are quite a few big endian MIPS devices. Many Chinese processor vendors (Mediatek, Rockchip, etc.) still use MIPS and have big endian configuration. Most common are TV SoCs and networking chips like WiFi SoCs.

Shivetya · on April 28, 2015

Odd, iSeries and pSeries are BigEndian, even Z is.

On the iSeries it has no bearing on application programmers except when using transformation products to other platforms.

restalis · on April 28, 2015

"Computers are good at reading bytes in swapped order. I suck at it."

Byte endianess is about binary content, which isn't supposed to be read by humans. Humans read only interpretations of the binary data. You shouldn't come nowhere near to the question of if you are or aren't good at reading binary.

wyager · on April 28, 2015

>Byte endianess is about binary content, which isn't supposed to be read by humans.

You can tell a lot about someone's programming history from statements like this.

Anyone who habitually works on low-level code has plenty of experience working with raw binary data.

restalis · on April 28, 2015

Working with raw binary data, like having it represented as hexadecimal pairs per byte? That is an interpretation. Having some representation for each bit in the group is still an interpretation, regardless if you consider those bits individually or in groups forming masks. "Reading" that binary content is by representing it in some form or another and interpreting it as such. Like I said, humans read interpretations of that binary data.

P.S.: You are a bad judge of characters. I worked on low-level code (microcontrollers programming, for a living).

mreiland · on April 29, 2015

Initially I disagreed with you until I thought about it for a second and realized Hex was technically an interpretation as well. You're mostly right, although I do think calling hex an interpretation is maybe not completely accurate. It's accurate enough, but I could see someone arguing the other way.

But regardless, I agree that no one is really going to be looking at raw binary. Atleast I can't think of a good reason why anyone would.

weland · on April 28, 2015

Well, reading the hexadecimal interpretation of a memory dump full of little-endian data is annoying as fuck.

restalis · on April 28, 2015

Although I believe your sentiment, I'm not sure on the other hand how much easier it appears to you with BE dumps. Reading data is perceived to be difficult when it shares little similarity with what we're already accustomed to. In time it will become more familiar, especially when you'll do a lot of it.

weland · on April 29, 2015

I've done enough of it that I hate it routinely, rather than occasionally.

Roboprog · on April 28, 2015

od -t x1, much?

As somebody who had to generate and read back tokenized files decades ago, I can vouch that little endian sucks.

jpindar · on April 28, 2015

Let me guess, the software you write runs on desktop computers, web servers and such?

tempodox · on April 28, 2015

I'm not much into low-level CPU architectures, so I have to ask: Is there a technical reason (performance or whatever) why LE would be objectively better than BE?

I do read hex dumps & disassemblies, and there LE is a huge pain in the backend. Everything is in reverse. Why would a sane programmer voluntarily use LE?

God Save the Network Byte Order!

andrewla · on April 28, 2015

I think the main advantage I see in LE architectures is that integral types don't have to relocate. For example, the int32 value 30 is represented in memory as "1E 00 00 00". The int16 value 30 is represented in memory as "1E 00". So casting downwards means that the pointers are the same, and you just ignore the trailing 0s. Contrast this to BE architectures, where you have "00 00 00 1E" vs "00 1E" -- casting from the larger to the smaller type means that you have to move the pointer.

bodyfour · on April 28, 2015

Yes, exactly. Maybe an example that's easier to visualize is when you learned to do arithmetic as a child. If I gave you a bunch of numbers to add like 57, 9, and 318 the first thing you would do is right justify them all. For any math operation, you start at the least significant digit and work up. This is ultimately equivalent to the int32/int16 example you gave -- if you are trying to operate on numbers of different length you have to first ensure that their LSBs line up. Little-endian is like writing your decimal integers right-justified on the blackboard.

So little-endian is (slightly) more convenient for machines and big-endian is (slightly) more convenient to humans reading memory dumps.

The unfortunate thing is that "network byte order" is big-endian so it's still the traditional endianness to use for wire protocols or on-disk formats. In my own designs I've switched to specifying fixed-width integers as little-endian which makes more sense for 99.9% of computers today.

cesarb · on April 28, 2015

Little-endian also feels more natural for big integers: the byte at offset x has weight 256^x.

Contrast with a big-endian representation, where the byte at offset x would have weight 256^(length - 1 - x).

KMag · on April 28, 2015

Except that it's often convenient for encodings to sort lexographically and numerically the same way. There are several multi-precision number formats with this property, but they're all either equally slow to load on any endian machine, or faster to load on big-endian machines.

Also, on a 64-bit big-endian architecture, you don't need specialized string comparison instructions to get efficient string comparisons. The code is only slightly more complicated on BE architectures that only support aligned 64-bit loads.

wyager · on April 28, 2015

>Little-endian also feels more natural for big integers: the byte at offset x has weight 256^x.

This is literally the opposite of how most programmers in the world write numbers naturally.

See "1234". The digit at offset x from the right has weight 10^x. That corresponds to big-endian.

Considering that the only important difference between little and big endian is when people have to read or write it by hand, we should probably model it after common human representation...

hamstergene · on April 28, 2015

By the way, this "natural" way of writing numbers comes unchanged from right-to-left writing system of Arabic. They naturally read 1234 starting from the lowest significant digit (here, '4').

Think about the irony :)

KMag · on April 29, 2015

I'm not sure about Arabic, but my understanding of Hebrew (another member of the Semitic language family) is that they still pronounce the numbers left to right, and if a line break forces digits to be split across lines, the most significant digits are put on the top line.

hamstergene · on April 29, 2015

You got me curious. I googled and found[1] that indeed they only pronounce numbers 21-99 that way, e.g. for 31 the order of words is "one thirty", but higher order components go left to right, e.g. for 25031 the order of words is "five twenty thousand one thirty".

1. http://arabic.tripod.com/VocabNumbers.htm

wyager · on April 28, 2015

Modern processors can perform this cast in a negligible amount of time, and due to modern memory architecture there's no advantage to reading 2 bytes instead of 4.

gecko · on April 28, 2015

Little-endian has a major advantage over big-endian...on eight-bit architectures: as you expand the size of the number, you can keep using up new digits at the end of the range, rather than the beginning. This means that you can more easily expand field widths.

Beyond that, it's called after two stupid factions in Gulliver's Travels for a reason...

KMag · on April 28, 2015

Though. it's sometimes convenient to have a lexographical sort result in a numeric ordering. For instance, if you're storing timestamps as multi-precision integers using a Rice-Golomb encoding that sorts the same way lexographically and numerically, decoding is faster if your CPU is big-endian.

cliffbean · on April 28, 2015

Little-endian is generally better for compilers, to the extent that it matters. For example, with SIMD, it makes most sense to number the lanes in the same direction as addresses in memory, so that lane 0 is at offset 0 when doing a load or store. And if you want to reinterpret the bits of a SIMD value as a different type, little-endian is the only sane way to do it. And so on.

I've become fond of thinking about little-endian becoming known as a universal "CPU byte order", to complement "network byte order". Each order makes the most sense for its domain.

mungoman2 · on April 28, 2015

I won't give you a technical reason, but rather respond to the second part about LE and sanity.

Actually LE is the natural order and the question is why anybody sane would prefer BE. The reason you think otherwise is because you view hexdumps the wrong way.

A hexdump shows hexadecimal bytes. Isolate one byte and number the bits -- 76543210, right to left. Number the nibbles -- 10, right to left. Now number the bytes -- 0123..., left to right?

Display your hexdumps with addresses increasing right to left and LE makes perfect sense. This is how numbers are written, so the convention should be followed when displaying numbers.

phkahler · on April 28, 2015

Why on earth would anyone number the bytes right to left? Do you want strings to appear backward as well? But only the portion of a string on one line, the fragments will still be in correct order. Or do you advocate numbering from bottom to top too?

mungoman2 · on April 28, 2015

I am talking about the view of the hex data. That view is about showing numbers, and the common complaint about LE is that numbers are reversed, but this is only the case when you break the convention of displaying numbers right to left.

For showing strings it's better to use left to right, as that's the convention for text.

mreiland · on April 29, 2015

most hex editors will show you the ascii value for hex that falls within the range of ascii, which can be useful to humans.

kahirsch · on April 28, 2015

No, it's just that Intel won, mostly because of IBM choosing it.

marssaxman · on April 28, 2015

Big-endian has always made more sense to me, since digits are laid out in memory the same way they are laid out in writing and therefore in my brain. Little-endian just seems like a cheap '70s kludge that accidentally became dominant thanks to Intel.

SFjulie1 · on April 28, 2015

Money.

When Motorola 68000 was out, it was doing 32bits linear addressing.

The mov operation would load directly from the memory by using a direct addressing.

Wiring 32 wires is expensive.

Wiring 2 overlaping address bus of 16 bits partially ovelapping was less. Intel CPU ASM works big endian because we all prefer to write AND 0xFF, DSI to get the 8 lower bit of a number.

The first register would hit segment, and the second the offset.

Hence you would first need to load the page (that would wire the multiplexing) to get the offset in the page/segment.

The addressing bus would be accessed with a stack. So first you push the lower 16 bits, and then the highest 16 bits. Which would revert since its a FIFO in first loading the segment address, then the offset address. The cost of this "savings" in terms of CPU cycle was marginal while the gain in money was tremenduous.

DonHopkins · on April 28, 2015

I once heard somebody jokingly refer to MIPS as SPIM when booted in "other-endian" mode. If SPARC could do that, then it would be CRAPS.

SeanLuke · on April 28, 2015

An odd joke. SPIM is a well-known MIPS emulator, used in a great many university compiler classes.

http://spimsimulator.sourceforge.net/

JosephRedfern · on April 28, 2015

Going against what Linus suggests, The Dalvik DEX format (used by Android) has a flag representing the endianness of the enclosed bytecode: https://source.android.com/devices/tech/dalvik/dex-format.ht... (see "ENDIAN_CONSTANT and REVERSE_ENDIAN_CONSTANT"). By default, it's Little Endian.

I've always wondered WHY this flag is present - why would an implementation wish to change the byte-order? I could understand if the file contained machine code, and would vary with different architectures - but given it's running in a VM, why bother?

danfuzz · on April 29, 2015

You can blame me for those wasted four bytes (or, perhaps, that wasted terabyte in the aggregate).

The original idea was that on big-endian machines, the resulting .odex would be endian-swapped compared to the original .dex, and this field would provide a reasonably-blatant indication. This would probably help with debugging more than anything else (maybe help with security, but not much), because there's no reason anyone would drop a little-endian .odex file on a big-endian machine (or a big-endian one on a little-endian machine).

We wrote (most of?) the code to get the vm working on big-endian systems, but within Google we never shipped any big-endian hardware AFAIK. I have my doubts that anyone ever did.

davtbaum · on April 28, 2015

Even though the bytecode is being executed on a VM, the VM itself has been ported to the architecture, which has a native endianness. Byte order in the dex that matches the host OS's will likely result in a performance gain as the VM can cast more efficiently.

JosephRedfern · on April 28, 2015

That's true - do you suppose that this flag might be used if an app developer was targeting a specific CPU architecture, then?

I guess it could be argued that if you're that concerned about performance and know your target architecture, that it might be worth going native rather than running on a VM.

yincrash · on April 28, 2015

This is just a guess, but I think the odex pass on dalvik switches the endianness to the architecture's native.

hsivonen · on April 28, 2015

The Web is now little-endian thanks to ArrayBufferViews exposing endianness and everyone testing only the little-endian case. It doesn't make sense to make big-endian hardware that needs to run a Web browser anymore, so chances are that big-endian will never come back.

dtech · on April 28, 2015

I have a hard time believing that the whole web uses ArrayBufferViews.

blueflow · on April 28, 2015

IP and TCP are still Big-Endian. Above that, both formats and ascii numbers are used.

bryanlarsen · on April 28, 2015

We use DataView wherever we can, but it's quite likely we do have endianness bugs in our code. AFAICT, some browsers on BE machines use LE typed buffers.

Do you know of any modern browsers that use BE typed buffers with full WebGL support for a cheap BE machine that we could buy for testing? Does anybody actually use such? Is it worth the bother?

krazydad · on April 28, 2015

From Wikipedia: [Gulliver's Travels] describes an intra-Lilliputian quarrel over the practice of breaking eggs. Traditionally, Lilliputians broke boiled eggs on the larger end; a few generations ago, an Emperor of Lilliput, the Present Emperor's great-grandfather, had decreed that all eggs be broken on the smaller end after his son cut himself breaking the egg on the larger end. The differences between Big-Endians (those who broke their eggs at the larger end) and Little-Endians had given rise to "six rebellions... wherein one Emperor lost his life, and another his crown". The Lilliputian religion says an egg should be broken on the convenient end, which is now interpreted by the Lilliputians as the smaller end. The Big-Endians gained favour in Blefuscu.

http://en.wikipedia.org/wiki/Lilliput_and_Blefuscu

restalis · on April 28, 2015

This got me thinking if in general (outside computing) it was a good choice having numbers represented the way they are. What if we had 1234 read as "four thousand, three hundred and twenty one"? Or even more different than how we pronounce it today - in order to read the number left-to-right to spell it "one and twenty and three hundred and four thousand"! This way the last digit lingering in your mind would be the one representing the most significant part of the given number (unlike the current practice of hearing first then tuning off for the less interesting smaller parts). Would it have been practical?

jerf · on April 28, 2015

If you take into account the fact that you can obtain the order-of-magnitude for most "human" numbers without actually focusing on them, then putting the most-significant digit first in the stream makes sense. 30 is easy to see, even if you look at the 3 you can at least tell there's two digits. 12,048 is pretty easy. Even 8,293,254 isn't that hard to order-of-magnitude from the near-fovea while you're focused on the 8. By the time you get into the larger numbers that are hard, well, they're hard under either order so it's a wash.

restalis · on April 28, 2015

At reading, one may be indeed hit at first by the most significant digit ('8' in your last example) that is somewhat useful, but the reader can't easily figure out digit's order of magnitude without reading the entire number. There's actually more. The reader can't spell the number right away, as it has to be read entirely and compiled in mind first. This compilation involves counting all digits in the given number to determine the number's size, then starting again by interpreting and spelling accordingly in its order each digit and its designation. This double-pass doesn't appear that efficient to me. On the other hand, if the number from your last example would be written backwards (452'392'8), its reading and interpretation would come efficiently in one pass ("four fifty and two hundred, three ninety and two hundred thousand, and eight million"). Even when all we're interested in would be the most significant digit and its order of magnitude, that would still be faster to determine (i.e. again, in one pass).

jerf · on April 28, 2015

"but the reader can't easily figure out digit's order of magnitude without reading the entire number."

Yes, they can; that was my entire point. Human-sized numbers fit within the eye's fovea. Numbers that don't fit easily within the fovea require effort under either scheme. Your model of having to go digit-by-digit is a computer's scanning model, not a human perception model; we do not scan that way, we scan in fovea-sized chunks, which are smaller than people may think because the brain is very good at interpolating before the information hits the conscious mind, but is still large enough to fit "millions" in quite comfortably.

_kst_ · on April 29, 2015

It wasn't exactly a deliberate choice.

The number system we use was imported to Europe from Arabia. In Arabic, the number 1234 is written with the most significant digit on the left -- but Arabic is written right-to-left, so it's actually little-endian. When they were introduced to Europe (with the help of Fibonacci), the convention of putting the most significant digit on the left was retained, but since European languages are written left-to-right the notation magically became big-endian.

The fact that the Roman numeral system that it replaced was also big-endian was likely an influence.

chucksmash · on April 28, 2015

> Would it have been practical?

My gut feeling is that it would have been less so than the current solution.

Our current representation is great for skimming in a language written LtR. Placing the most significant digit all the way to the right would be like putting the lead paragraph of an article last.

aikah · on April 28, 2015

I wonder if some languages write figures RtL , I'm curious about it.

zerocrates · on April 28, 2015

Arabic, which is of course normally right-to-left, uses left-to-right ordering for numerals.

antirez · on April 28, 2015

I would not follow Linus advice here. To stick with a given endianess for a protocol? Sure! But why to use BE if everything is LE? If you use LE at compile time you can convert the conversion calls to no operations: zero cost. The cost will be incurred only into BE systems that are every day less common. So my suggestion is to stick to a specific endianess in protocols and serialization factors but picking LE instead of BE. Macros to convert the value that will do nothing if the host is already LE are trivial to write.

toyg · on April 28, 2015

That's what Linus is saying: hey, BE is basically dead and you shouldn't use it, but if you really really want to use it, that's fine as long as you don't go for dynamic switching.

loopbit · on April 28, 2015

I might be missing something here, but I'd say that you are following exactly Linus' advice :)

njloof · on April 28, 2015

On modern architectures the byte swap is effectively free, if it is done at the time the data is read or written. Worry more about code that touches data just to swap bytes, which may have to page data in.

rwmj · on April 28, 2015

When POWER8 came out (ppc64le) the writing was pretty much on the wall. Is there any major or even minor architecture which still defaults to BE? SPARC possibly?

sanxiyn · on April 28, 2015

Yes, SPARC and zSeries. Among architectures supported by recently released Debian, zSeries is the only architecture that defaults to BE. (SPARC support is dropped.)

on April 28, 2015

[deleted]

mmf · on April 28, 2015

Muauahahahah

nudpiedo · on April 28, 2015

Another subjective opinion from someone who doesn't look much out of his box.

olm · on April 28, 2015

As opposed to an objective opinion?