Would really love to see some before/after cost calculations for some cloud migr...

cookiecaper · on Sept 1, 2019

I've seen several large-scale cloud migrations and the bill has always been higher, usually egregiously so. In one case in particular, we would've been able to re-buy all the (perfectly adequate) hardware in our racks every 45 days if we had been pouring the cloud spend into hardware.

In another case, I've seen a company that spends dozens of thousands of dollars a month on the cloud infrastructure to run a site that services a max of 50 concurrent users. The truth of that matter is that the production site could run just fine on any developer's laptop and a one-time spend on a pair of geographically-dispersed dedicated servers would free up huge amounts of cash without any measurable/actual impact, but the bosses won't feel very important if they acknowledge that. It boosts their self-image to have a big cloud bill and feel like a grown-up company because they're paying big invoices, and plus the CxOs can prance around and tell everyone how forward-looking they are because they're "in the cloud".

It seems like the most common pitch is "cloud is usage-based billing" and people operate under some vague theory that this will translate to savings somewhere, but despite popular belief, most workloads are reasonably static and you're just going to pay a lot more for that static workload.

The fantasies of huge on-demand load are mostly a delusion of circular self-flattery, aggressively pushed by rent-seekers and eaten up all too eagerly by people who are supposed to be reasonable stewards and sometimes even dare to call themselves "engineers".

By all means set up the cloud stuff and have the account ready to take true ad-hoc resource demands, but the number of cases where AWS and friends are an actual net savings over real hardware is infinitesimal. Most companies would be much better off if they invested in owning at least the baseline 24x7 infrastructure.

I guess the issue there is that since most companies don't really have the dynamic demand they imagine, if they actually used cloud providers for elasticity, they'd almost never use them and then they couldn't feel cool enough.

If you're a random guy, it's going to be cheaper and better to run on a Linode or small AWS instance than it will be to rent and stock a rack. If you have more than 5 employees, this is almost certainly not true.

scarface74 · on Sept 1, 2019

In another case, I've seen a company that spends dozens of thousands of dollars a month on the cloud infrastructure to run a site that services a max of 50 concurrent users. The truth of that matter is that the production site could run just fine on any developer's laptop and a one-time spend on a pair of geographically-dispersed dedicated servers would free up huge amounts of cash without any measurable/actual impact,

I find it hard to believe that even if I went out of my way to throw every single bit of AWS technology I know that I could Architect a system that only has 50 customers where I could make it cost that much more. I could do that with AWS with a pair of EC2 servers, a hosted database, a load balancer and an autoscaling group with a min/max of 2 for HA. That includes multi AZ redundancy. Multi region redundancy would double the price. That couldn’t possible cost more than $500 a month

cookiecaper · on Sept 4, 2019

Here the software side of the fad rears its head: there are about four dozen microservices involved, each with its own RDS instance, load balancer, the works. 5-6 different implementation languages were used and a large number depend on the JVM or other memory-hungry runtimes. There are a couple of so-called "data analysts" who don't really know what they're doing, never produce anything, and spend lots of money on EMR et al. Buzzwords abound.

The workload is containerized and orchestrated (of course, since a company so self-conscious about its tech fashions would never not be) but one can only increase the density so far, and obviously optimizing the infrastructure spend on "sexy cloud stuff" hasn't been the top priority.

Even hinting that hardware may be appropriate for a certain use case will bring out the bean counters in force. At a third company, I almost gave the "Global VP of Cloud Computing" an aneurysm by suggesting that there may be a use for some of the tens of millions of dollars of hardware that they'd recently purchased. In shock and disbelief, he shouted "What, now you're talking hybrid cloud?!" I said "if that's what you want to call it" as the rest of the room jumped to inform me that the R&D departments at the cloud providers ensure customers will always be using the latest datacenter technology, hastening to add that Microsoft is building a datacenter underwater somewhere, and thus it's a lost cause for anyone to run their own hardware. Some of the shadier cronies in the room chimed in to add that the hardwareless course of action had been confirmed as the ideal by both IBM and Accenture in studies commissioned by the VP.

Cloud resources are a useful tool in the toolbox, but as an industry, we have gone way overboard and lost all reason. At some point, when cloud inevitably loses its shine, the bubble must pop. If you're in the market for server hardware, this is a great time to buy.

arethuza · on Sept 1, 2019

Crazy costs for simple stuff can easily happen with on-premise systems as well - I once had an in-house infrastructure team quote £70K for infrastructure to host a single static HTML page that would be accessed by about 10 people.

There was even a kind of daft logic to their costing - didn't make it any less crazy.

scarface74 · on Sept 1, 2019

If you do your cloud migration and just do a lift and shift without changing your processes or people (retrain, reduce, and automate), it will always cost more. The problem is that too many AWS consultants are just old school net ops people who watched one ACloudGuru training video, passed a multiple choice certification, and can click around in a GuI and replicate an on prem architecture.

I’ve never met any that come from a development or Devops background and know the netops side.

hinkley · on Sept 1, 2019

What could you do with your private server room if you were willing to spend that much time and money, though?

scarface74 · on Sept 1, 2019

Well, seeing that there are only 24 hours in a day and that I refuse to work more than 40-45 hours a week....

There are two parts to any implementation - the parts that only you or your company can do - ie turn custom business requirements into code and the parts that anyone can do “The undifferentiated heavy lifting” like maintaining standard servers. Why would I spend the energy doing the latter instead of focusing on the former?

If I have an idea, how fast can I stand up and configure the resources I need with a private server room as compared to running a CloudFormation Template? What about maintenance and upgrades?

How many people would our company have to hire to babysit our infrastructure? Should we also hire someone overseas to set up a colo for our developers there so they don’t have to deal with the latency?

hinkley · on Sept 1, 2019

We are talking about a situation where you already have a server room and employees.

Typically what I've seen is that the developers are being starved out for resources in the on-prem hardware, and no amount of complaining or yelling or saber-rattling seems to do anything about it. But along comes cloud and we are willing to spend many times more money. The devs are happy because they can spin up hardware and apologize later, which feels really good until you find out people are spinning up more hardware instead of fixing an n^2 problem or something equally dumb in their code (like slamming a server with requests that always return 404).

scarface74 · on Sept 1, 2019

We are talking about a situation where you already have a server room and employees.

And by “changing your processes” I guess I should also include “changing your people”. Automate the processes where you can, reduce headcount, and find ways to migrate to manage services where it makes sense.

The devs are happy because they can spin up hardware and apologize later, which feels really good until you find out people are spinning up more hardware instead of fixing an n^2 problem or something equally dumb in their code (like slamming a server with requests that always return 404).

I hate to say it, but throwing hardware at a problem long enough to get customers, prove the viability of an implementation and in start up world, get to the next round of funding or go public and then optimize is not always the wrong answer - see Twitter.

But, if you have bad developers they could also come up with less optimum solutions on prem and cause you to spend more.

With proper tagging, it’s easy to know where to point the finger when the bill arrives.