Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You may wonder, what was in this for IBM? The answer is fairly straightforward. IBM used to make proprietary chipsets for Intel chips!

The pride of the xServer/xSeries systems were "complex" setups -- multiple chassis -- with up to 32 sockets, and 512 GB of RAM. These required a lot of IBM internal engineering, where they made pin-compatible sockets for what Intel was offering at the time, and glued those chips into _hugely_ different topologies than Intel had in mind.

These systems sound small today, but back in 2001, this was a really big deal for x86. The IBM-proprietary chipsets were much more expensive than off-the-shelf systems, but still a fair bit cheaper than going with NCR or Unisys, competing vendors with proprietary x86 MP designs.

IBM had a lot at stake when that socket changed. Achieving pin-compatibility is hard! Intel is very jealous of their documentation, and prefers to offer paper only, with water-marked copies. It's like owning a gutenberg bible. Engineering something pin-compatible with an Intel x86 CPU has never been easy.

It was no doubt worth it to their "big" x86 server business to ask Intel to make a special run of chips with the old socket layout, but the new emt64 extensions. I bet it was a complete no-brainer compared to the costs of integrating a new socket!



In the era of dual socket, single core xeons, ServerWorks also produced non Intel chipsets, North and South bridge, for motherboard makers to use. There were tyan and supermicro boards that used them. Common at the time in 1ru and 2ru size servers. Also found on big quad socket supermicro boards.

From 2002: https://www.extremetech.com/extreme/73498-serverworks-ships-...

I actually think the ibm systems used the grandchampion chipset.

https://www.supermicro.com/products/motherboard/Xeon/GC-HE/P...

Serverworks was later acquired by broadcom.


IBM was locked in a brutal fight with HP and Dell to take share in the early x86 server market. Their chipsets were neat but they looked silly next to an HP Opteron box. Integrated memory control ended the chipset race.

Still their chipsets had neat features worth remembering, such as the ability to use local main memory as a last-level cache of memory read from remote nodes. And of course they went to 64 sockets which was respectable.


These are pretty bold words, my friend.

Opteron had vastly better architecture than the Intel chipsets of the day, but it topped out at four or eight sockets, I forget.

IBM, at the time, could offer you Opteron-like architecture and performance, with up to 32 sockets, using Intel chips. That was worthwhile to some customers. "Intel" wasn't the selling position. It was x86 or x86-64, with "big" as the selling position.

I'm not here to apologize for Intel. I'm just saying, those IBM proprietary chipsets had their nice bits.


Early Opteron (with Socket 940) topped at 8 sockets glueless - with the same chipset one could drive a socket 754 chip.

However, with custom glue logic, one could expand it very far - AMD offering in that space was "Horus" chipset which connected 4 sockets with external fabric (infiniband, iirc) to create 64 socket systems. Similar tactic was (and still is) offered by SGI in their UltraViolet systems which utilize the same principle using NUMAlink fabric and Xeon cpus.


But in the end, Xeon systems with anything more than 4 sockets, with 8, 16 or 32 CPUs were a rare weird niche market compared to on one hand, things like zseries mainframes and big Sun and SGI machines, and on the other hand, people who learned to write software to distribute workloads across a couple of dozen $2000 to $4000 1RU dual socket servers rather than buying one beastly proprietary thing with a costly support contract.


As I recall HP also used serverworks for a lot of dual socket 2ru systems, for people who didn't want to buy opteron (and from 2000 to 2003 before the release of opteron)


I started following hardware news maybe a year or two before these IBM P4 were manufactured. I remember all the buzz on sites like Geek.com and TheInquirer.net about the soon to be released AMD Clawhammers, Intel constantly pushing Itanium, and the slow feature crawl of Intel Xeons marching over Big Iron regardless of what the Itanium team wanted.


IBM flirted with Itanium. They spent a lot of money on software support for Itanium. They sold a few hardware Itanium systems to customers who really wanted one.

There just wasn't enduring customer demand to make an ongoing hardware/software product out of it.


I think theregister and theinquirer helped steer a lot of people away from itanium in that era, showing the $/performance that could be achieved with a larger number of much less costly dual socket Xeon and opteron boxes. People who really wanted to centralize everything on one godly giant machine went to things like zseries mainframe, not itanium. Everyone else started running Linux on x86...


I don't think the press had anything to do with it.

Customers got engineering samples in their hands and they were very, very slow.


Everyone was promised sufficiently-smart compilers would make great use of them, but that never works out as a plan.


They very much underestimated the complexity of such compilers. But the concept is fine. Itanium had a lot of raw power, but you had to do demoscene level trickery to get that performance.


> But the concept is fine.

The concept is not fine. Itanium was predicated on saving transistors in the OoO machinery, and spending those on making the machine wider and thus performance. However, it turns out that without any OoO, the machine is terrible at hiding memory latency, and the only way to get this back was to ship it with heroically large and low latency caches. And implementing those caches was harder and many times more expensive than just using OoO.

In the end, Itanium saved transistors and power from one side, only to have to spend much more on another side to recoup even part of the performance that was lost.


The concept was fine based on knowledge available at the time. Processor design occurs on a long cycle, sometimes requiring some guesswork about surrounding technology. The issue of having to hope/guess that compilers would figure out how to optimize well for "difficult" processors had already arisen with RISC and its exposed pipelines. In fact, reliance on smart compilers was a core part of the RISC value proposition. It had worked out pretty well that time. Why wouldn't it again?

VLIW had been tried before Itanium. I had some very minimal exposure to both Multiflow and Cydrome, somewhat more with i860. The general feeling at the time among people who had very relevant knowledge or experience was that compilers were close to being able to deal with something like Itanium sufficiently well. Turns out they were wrong.

Perhaps the concept is not fine, but we should be careful to distinguish knowledge gained from hindsight vs. criticism of those who at least had the bravery to try.


Perhaps the concept is not fine, but we should be careful to distinguish knowledge gained from hindsight vs. criticism of those who at least had the bravery to try.

So how many times do you have to fail before being brave is just a bad business decision. The "concept" wasn't fine for Multiflow or the i860 (I used both, and would call it terrible for the i860). It didn't work for Cydrome. Trimedia is gone. Transmeta flamed out. There's, what, a couple of DSP VLIW chips that are actually still sold?

But, hey, let's bet the company on Itanium and compilers that will be here Real Soon Now. I remember the development Merced boxes we got.

The general feeling at the time among people who had very relevant knowledge or experience was that compilers were close to being able to deal with something like Itanium sufficiently well.

That's revisionism. There was a general feeling we were getting good at building optimizing compilers, but I don't recall any consensus that VLIW was the way forward. The reaction to Itanium was much less than universally positive, and not just from the press.

Turns out they were wrong.

Very, very wrong. Again.


> how many times do you have to fail before being brave is just a bad business decision

That's a very good question. More than once, certainly. How many times did Edison fail before he could produce a working light bulb? How many times did Shockley/Bardeen/Brattain fail before they could produce a working transistor? Even more relevantly,how many ENIAC-era computer projects failed before the idea really took off? Ditto for early consumer computers, mini-supers, etc. Several times at least in each case, sometimes many more. Sure, Multiflow and Cydrome failed. C6X was fairly successful. Transmeta was contemporaneous with Itanium and had other confounding features as well, so it doesn't count. There might have been a couple of others, but I'd say three or four or seven attempts before giving up is par for the course. What kind of scientist bases on a conclusion on so few experiments?

> The reaction to Itanium was much less than universally positive, and not just from the press.

Yes, the reaction after release was almost universal disappointment/contempt, but that's not relevant. That was after the "we can build smart enough compilers" prediction had already been proved false. During development of Itanium, based on the success of such an approach for various RISCs and C6X, people were still optimistic. You're the one being revisionist. It would be crazy to start building a VLIW processor now, but it really didn't seem so in the 90s. There were and always will be some competitors and habitual nay-sayers dumping on anything new, but that's not an honest portrayal of the contemporary zeitgeist.


Mhm, yeah, it depends on how you look at I guess. I meant if you fine tuned your code to take advantage of the strengths, it could be very good for those workloads. But maybe they built something which from afar, if you squint a lot, has more of the strengths of a programmable GPU today, while they pitched it as a general CPU.


> But the concept is fine.

Is it? What’s the point of a processor that we don’t know how to build compilers for? We still don’t know how to schedule effectively for that kind of architecture today.


The point is that they didn't know it's impossible. They had good reasons for believing otherwise (see my other comment in this thread) and it's the nature of technological progress that sometimes even experts have to take a chance on being wrong. Lessons learned. We can move on without having to slag others for trying.


So we would never build a computer because back in 1950 we didn't know how to make compilers, only raw bytes of machine code (we couldn't even "compile" assembly language). Life somethings require that you create a prototype of something that you think will work to see if it really does in the real world.


But with 1950s-era machines, it was expected that programmers were capable of manually scheduling instructions optimally, because compilers simply didn't exist back then.

VLIW architectures are often proposed to simplify the superscalar logic, but the problem with VLIW is that it forces a static schedule, which is incompatible with any code/architecture where the optimal schedule might be dynamic based on the actual data. In other words, any code that involves unpredictable branches, or memory accesses that may hit or miss the cache--in general CPU terms, that describes virtually all code. VLIW architectures have only persisted in DSPs, where the set of algorithms that are trying to be optimized is effectively a small, closed set.


> So we would never build a computer because back in 1950 we didn't know how to make compilers

No that's different - the big idea with the Itanium was specifically to shift the major scheduler work to the compiler. We didn't build the first computers with the idea we'd build compilers later.


But we did build an awful lot of RISC machines with exactly that idea. And it worked.


It begs the question whether or not any current compiler optimizations for a new theoretical VLIW-ish machine (Mill?) would prove to be an effective leg-up on the Itanium.


Being a bit more charitable, I think the problem is that people look at VLIW generated code and think 'wow that's so wasteful look at all the empty slots' without realising those 'slots' (in the form of idle execution units pipeline stages) are empty in OOO processors right now anyway. The additional cost is in the ICACHE, as already described.

Also, these days you would pretty much just need to fix LLVM, C2, ICC, and the MS compiler, and almost everyone would be happy.


The focus on vertical scale servers and FP performance was a mistake. Had they focused on single socket servers with INT performance the history might have been different. Also today’s compilers are much more capable so maybe Itanium was simply too early.


This article makes me nostalgic for the days when every office building on the planet was carpet bombed with black and charcoal Dell Dimensions.

I spent a lot of time pulling drives and reselling systems that were end of life. I mostly dealt with switches and networking gear I was still encountering old 32bit dual and quad socket X-series semi regularly when I moved on in 2015ish.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: