> Designing scalable systems when you don't need to makes you a bad engineer.
> In general, RDBMS > NoSql
These two bullet points resonate with me so much right now. I'm a consultant and a lot of my client absolutely insist on using DynamoDB for everything. I'm building an internal facing app that will have users numbering in the hundreds, maybe. The hoops we are jumping through to break this app up into "microservices" are absolutely astounding. Who needs joins? Who needs relational integrity? Who needs flexible query patterns? "It just has to scale"!
As an engineer-turned-manager, I spend a lot of time asking engineers how we can simplify their ambitious plans. Often it’s as simple as asking “What would we give up by using a monolith here instead of microservices?”
Forcing people to justify, out loud, why they want to use a specific technology or trendy design pattern is usually sufficient to scuttle complex plans.
Frankly, many engineers want to use the latest trends like microservices or NoSQL because they believe that’s what’s best for their resume, even if it’s not necessarily best for the company. It doesn’t help that some companies screen out resumes that don’t have the right signals (Microservices, ReactJS, NoSQL, ...). There’s a certain amount of FOMO that makes early-career engineers feel like they won’t be able to move up unless they can find a way to use the most advanced and complex architectures, even if their problems don’t warrant those solutions.
>Forcing people to justify, out loud, why they want to use a specific technology or trendy design pattern is usually sufficient to scuttle complex plans.
Does that really work ?
Usually these guys read the sales pitch from some credible source. Then you need to show them that the argument is X works really well for scenario Y but your scenario Z is not really similar to Y so reasons why X is good for Y don't really apply. To do this you usually rely on experience so you need to expand even further.
And the other side is usually attached to their proposal and starts pushing back and because you're the guy arguing against something and need a deep discussion to prove your point chances are people give up and you end up looking hostile. Even if you win you don't really look good - you just shut someone down and spent a lot of time arguing, unless the rest of the team was already against the idea you'll just look bad.
I just don't bother - if I'm in a situation where someone gives these kind of people decision power they deserve what they get - I get paid either way. And if I have the decision making power I just shut it down without much discussion - I just invoke some version of 'I know this approach works and that's good enough for me'.
Yeah, god helps if a higher up is a zealot about a technology. They will try to suggest that at every opportunity and arguing against it makes you stand out like a sore thumb that after a while you wonder why you even bother.
Frankly, many engineers want to use the latest trends like microservices or NoSQL because they believe that’s what’s best for their resume
The sad thing is, they might well be right.
People used to not get hired for a job involving MySQL because their DB experience was with Postgres, but usually more enlightened employers knew better. Today, every major cloud provider offers the basic stuff like VMs and managed databases and scalable storage, and the differences between them are mostly superficial. However, each provider has its own terminology and probably its own dashboard and CLI and config files. Some of them offer additional services that manage more of the infrastructure for you one way or another, too. There is seemingly endless scope for not having some specific combination of buzzwords on an application even for a candidate and a position that are a good fit.
I don’t envy the generation who are applying for relatively junior positions with most big name employers today, and I can hardly blame them for the kind of job-hopping, résumé driven development that seems to have become the norm in some areas.
Agreed, I found it really hard to get good roles 5 years ago. Then I worked on some cool shiny stuff - in general I dont like microservices, k8s, React/JS but it opens a whole new world of jobs.
> As an engineer-turned-manager, I spend a lot of time asking engineers how we can simplify their ambitious plans. Often it’s as simple as asking “What would we give up by using a monolith here instead of microservices?”
Funny you mentioned this. I have the exact opposite problem.
That is, I am an engineer trying to push back against management mandating the use of microservices and microfrontends because they are the new “hot” tech noawadays.
On my reading, this is the exact same problem, not the exact opposite problem. The break-even bar for a reasonable monolith is a lot lower than for microservices, so the GP's question is specifically asking, under a hypothetical where the team simply uses a monolith, what benefits the team would miss out on relative to microservices. If there are none, or they aren't relevant to the project scenario, then microservices probably isn't justifiable.
(I, too, am in the position of pushing back against microservices for hotness' sake.)
This. I'm a consultant and 90% of the time the technology has already been decided by our fancy management team who haven't written code in 10+ years before a line of code has been written. But they know the buzzwords like the rest of us and know they sell.
Problem is they no longer have to implement, so they are even more inclined to sell the most complicated tech stack that have marketing pages claiming they scale to basically infinity.
In my company we store financial data for hundreds of thousands of clients in sql db. It's decade okd system and we have hundreds of tables, stored procedures (some touching dozen+ tables) and rely on transactions.
It took me weeks to convince my managers not to migrate to new hot nosql solution because "it's in cloud, it's scalable and it also supports sql queries".
> Frankly, many engineers want to use the latest trends like microservices or NoSQL because they believe that’s what’s best for their resume, even if it’s not necessarily best for the company.
Probably nobody is using NoSQL for their resume. It's because picking a relational database, while usually the correct choice, is HARD when you're operating in an environment that changes quickly and has poorly defined specifications.
When you start seeing engineers have difficulty reasoning about what the data model should be and nobody willing to commit to one, it's the clearest sign that organizationally things are sour and you need to start having very firm and precise conversations with product.
I'm facing this issue now. App is supposed to deliver to clients after this sprint - and the data model still isn't locked down. After arguing through about 10 hours worth of meetings this week, I think I need a new job.
Personally, when I want speed or easy upkeep and intend on doing dumb simple things.
Postgres is more featureful, but if you don't intend on using those features, MySQL is consistently faster and historically smoother to update and keep running.
Also in the Enterprise, if you're doing a lot of sharding and replication across networks, Percona MySQL is a very compelling product. I say that as a Postgres diehard.
Traditionally it was because you needed replication or sharding that you didn't have to boil half an ocean for, or at least half decent full text indices. These days however I believe the differences are smaller and in other areas.
Most often, you choose a database because of what you application supports and is tested with, not the other way around. Or what your other applications already use. Complete green fields aren't all that common.
It's because picking a relational database, while usually the correct choice, is HARD when you're operating in an environment that changes quickly and has poorly defined specifications.
Wouldn't this apply if you are using a static typed language too? what's harder about changing the schema in the DB?
You (mostly) don't have to deal with data migrations with statically typed languages. Releasing a new version of some code is usually easier than making structural changes to a database that's in active use.
Releasing a new version of some code is usually easier than making structural changes to a database that's in active use.
Yes, and on top of that, code-only changes need to be internally consistent to make sense but DB schema changes almost inevitably require some corresponding code change as well to be useful. Then you have all the fun of trying to deploy both changes while keeping everything in sync and working throughout.
You've hit on something there as well, but essentially it comes down to forced rewrites and flexibility. We tend to choose the more flexible systems to avoid forced upfront work when changes are needed even when it's the wrong choice for the project in the long run.
Eh, not so much databases, but in terms of code, super flexible 'all things to all people' generic abstractions tend to be a lot more work and a lot more difficult to debug than a tight solution tailored to the problem it's solving, written in domain terminology.
If only I had a nickel for every hour I've spent debugging abstractions and indirection that were just there for the sake of adding flexibility that would never be needed.
I'm in agreement with you actually. I was badly suggesting that that kind of flexibility up front bites us in the ass later and it's a bad impulse often followed.
Is every NoSQL database non-relational, and every relational database SQL? It sure seems to me that you could have a relational database without SQL. Something non-text could be nice. It might be a binary data format or compiled code.
One might even convert SQL to binary query data at build time, with a code generator. It could work like PIDL, the code generator Samba uses to convert DCE/RPC IDL files to C source with binary data. Binary data goes over the wire. Another way is that both client code and server code could be generated, with the server code getting linked into the database server.
If I were an evil tech giant, I would open source a bunch of libraries that require significantly more effort to use than necessary, and pitch them as the One True Solution. Just to slow my competitors down.
So I bought all the XP books, dog eared them, left them on my desk. My team nearly mutinied. I asked them to wait and see. Two weeks later, nemesis announced his team was all in for XP, Agile, pair programming, etc.
They never recovered, didn't make another release.
Not a joke. When i worked at Pivotal Labs, sales / executives were very excited about the synergy between helping clients build microservice architectures and selling them Cloud Foundry.
The question one should ask is: what $hot_tech can we adopt without the product becoming significantly worse than with $old_tech? That is, which things do we adopt only or mostly to make the project or company attractive to work with or invest in?
Saying “why would you ever do that rather than building the solution the cheapest and with the lowest risk” doesn’t fully appreciate the importance of attractiveness.
I’m selling my hours of work time for a salary, fun, and resume points. My employer pays me in all 3. I’ll always push for $fun_tech or $hot_tech despite it not always being in the short term interest of anyone but myself or my fellow developers. I’ll keep justifying this by “if we do this in $old_tech then me and half the team will leave, and that’s a higher risk than using $new_tech”.
(By tech I here mean things like languages and frameworks not buzzwords like microservices or blockchain, ai... )
IMO it's a no-brainer to choose complex technology, the incentives are much more attractive.
Go the simple way and it works, you get paid and off to the next project, if it doesn't work, its your fault.
But on the job experience with scaling tech is way more valued than doing some online course, you get paid to learn by doing on company time, and you don't lose anything by possibly wasting company resources. So you tick the box that often job postings have "proven track record of <insert scale buzz here>" which could possibly lead to a much better salary, its all incentives.
To me this is a little bit weird, because while OP is totally correct in that monoliths are totally fine too when it's the best tool for the job, the default should still be microservices. It's not really harder to use once you have the practice in place and advantages will usually be quite visible in time. But of course there are times when there are great monoliths you can just use and you should use them.
There are challenges with microservices that hints me to build monoliths by default unless not viable.
Things that are trivial in monoliths are hard in microservices like error propagation, profilling, line by line debbugging, log aggregation, orquestration, load balancing, health checking and ACID transactions.
It can be done but requires more complex machinery and larger teams.
Do you mean “a monolith” or “the monolith?” The essential characteristic of monoliths is that you don’t get to start new ones for new projects.
The real skill of architecture is understanding everything your company has built before and finding the most graceful way to graft your new use case onto that. We get microservices proliferation because people don’t want to do this hard work.
I don't understand splitting an API into a bunch of "microservices" for scaling purposes. If all of the services are engaged for every request, they're not really scaled independently. You're just geographically isolating your code. It's still tightly coupled but now it has to communicate over http. Applications designed this way are flaming piles of garbage.
The idea is that you can scale different parts of the system at different rates to deal with bottlenecks. With a monolith, you have to deploy more instances of the entire monolith to scale it, and that’s if the monolith even allows for that approach. If you take the high load parts and factor them out into a scalable microservice, you can leave the rest of the system alone while scaling only the bottlenecks.
All of this is in the assumption you need to scale horizontally. With modern hardware most systems don’t need that scalability. But it’s one of those “but what if we strike gold” things, where systems will be designed for a fantasy workload instead of a realistic one, because it’s assumed to be hard to go from a monolith to a microservice if that fantasy workload ever presents itself (imho not that hard if you have good abstractions inside the monolith).
I understand how microsercices work, but I'm referring to a specific kind of antipattern where an application is arbitrarily atomized into many small services in such a manner that there's zero scaling advantage. Imagine making every function in your application a service as an extreme example.
This seems to be an example of a more general antipattern in software development, where a relatively large entity is broken down into multiple smaller entities for dogmatic reasons. The usual justification given is how much simpler each individual entity now is, glossing over the extra complexity introduced by integrating all of those separate entities.
Microservice architectures seem to be a recurring example of this phenomenon. Separating medium to long functions into shorter ones based on arbitrary metrics like line count or nesting depth is another.
Assuming every function is called the same amount of times and carries the same cost it would indeed be silly to cut up a system like that. But in the real world some parts of the system are called more often or carry a high cost of the execution. If you can scale those independently of the rest of the system, that is a definite advantage.
For me the antipattern poses itself when the cutting up into microservices is done as a general practice, without a clearly defined goal for each service to need to be separate.
(And by the way, i’ve seen a talk before of an application where the entire backend was functions in a function store, exactly as you described. The developer was enthusiastic about that architecture.)
> you have to deploy more instances of the entire monolith to scale it,
That's a common argument for microservices and one that I always thought was bunk.
What does that even mean? You have a piece of software that provides ten functions, running 100 instances of it in infeasible but running 100 of one, 50 of three and 10 of six is somehow not a problem?
That must be really the perfect margin call of some vsz hungry monstrosity. While not an impossible situation in theory, surely it can't be very common.
There are plenty of reasons to split an application but that seems unlikely at best.
I have seen multiple production systems, in multiple orgs, where "he monolith" provides somewhere in the region of 50-100 different things, has a pretty hefty footprint, and the only way to scale is to deploy more instances, then have systems in front of the array of monoliths sectioning off input to monolith-for-this-data (sharding, but on the input side, if that makes sense).
In at least SOME of these cases, the monolith would've been breakable-up into a smaller number of front-end micro-services, with a graph of micro-services behind "the thing you talk to", for a lesser total deployed footprint.
But, I suspect that it requires that "the monolith" has been growing for 10+ years, as a monolith.
> imho not that hard if you have good abstractions inside the monolith
And that is the big if! The big advantage of micro services is that it forces developers to think hard about the abstractions, and can’t just reach over the boarder breaking them when they are in a hurry. With good engineers in a well functioning organisation, that is of course superfluous, but those preconditions are unfortunately much rarer than they should be.
Especially true when the services are all stateless. If there isn’t a conway-esque or scaling advantage to decoupling the deployment... don’t.
I had a fevered dream the other night where it turned out that the bulk of AWS’s electricity consumption was just marshaling and unmarshalling JSON, for no benefit.
I recently decided to benchmark some Azure services for... reasons.
Anyway, along this journey I discovered that it's surprisingly difficult to get a HTTPS JSON RPC call below 3ms latency even on localhost! It's mindboggling how inefficient it actually is to encode every call through a bunch of layers, stuff it into a network stream, undo that on the other end, and then repeat on the way back.
Meanwhile, if you tick the right checkboxes on the infrastructure configuration, then a binary protocol between two Azure VMs can easily achieve a latency as low as 50 microseconds.
The first thing that comes to my mind is that there are different axes that you may need to scale against. Microservices are a common way to scale when you’re trying to increase the number of teams working on a project. Dividing across a service api allows different teams to use different technology and with different release schedules.
I don't necessarily disagree, but I believe that you have to be very careful about the boundaries between your services. In my experience, it's pretty difficult to separate an API into services arbitrarily before you've built a working system - at least for anything that has more than a trivial amount of complexity. If there's a good formula or rule of thumb for this problem, I'd like to know what it is.
I agree. From my perspective, microservices shouldn’t be a starting point. They should be something you carve out of a larger application as the need arises.
People always talk about nosql scaling better, but some of the largest websites on the internet are mysql based. I'm sure some people have problems where nosql is genuinely an appropriate solution, but i find it hard to believe that most people get anywhere near that level of scalability.
Exactly, and from a features standpoint Postgres can do everything Dynamo can do and so much more. I think a lot of software devs don't really know SQL or how RDBMS work so they don't know what they are giving up.
Postgres even has JSONB support, so if you really want to store whole documents NOSQL-style, you can - and you can still use all the usual RDBMS goodness alongside it!
Those very large mysql deployments typically use it as a nosql system, with a sharded database spread over dozens or hundreds of instances, and referential integrity maintained by the business layer, not by the database.
For a good example of a high volume site using a proper rdbms approach I would look at stackoverflow. It can (and has) run on a single ms sql server instance.
Even if that's so, still suggests rdbms are a good choice.
I do know for Wikipedia, english wikipedia is mostly a single master mysql db + slaves, with most of the sharding being on the site language level (article text contents stored elsewhere)
Truth be told I am yet to see a reason to use in-memory database. Datstructures, maps/trees/set - yes. Concurrent/lock free/skip lists/whatever - all great. I don't need a relational database when I can use objects/structs/etc.
I think that depends on what you're doing with the data. If you're just grabbing one thing and working with it, or looping through and processing everything, maybe not.
But if you're doing more complicated query-like stuff, especially if you want to allow for queries you haven't thought of yet, then the DB might be useful.
Sometimes a hybrid of query-able metadata in a DB along with plain old data files is good.
That depends very much on your data, how much things key to each other, and what you're doing with it.
That's some kind of fallacy - standard datastructures would totally destroy any-sql-alike thing, if it comes to performance (and memory footprint). I guess it does depend on where the background comes when it comes to convenience - or how people tend to see their data. However like I said - for close to 3 decades I have not seen a single reason to do so. On the contrary I've had cases where optimization of 3 orders of magnitude was possible.
It's easier to find devs who know basic SQL than it is to find devs who know pandas or whatever your language specific SQL-like library is. And the more complicated the queries, the more the gulf widens.
There is no D from the ACID. For the D to happen, it takes transaction logs + write barrier (on the non-volatile memory).
Doing Atomic, consistent and isolated is trivial in memory (esp. in GC setup), and a lot faster: no locks needed.
Validations and constraints are simple if-statements, I'd never think of them as sql.
It sounds like you're talking about toy databases which don't run at a lot of TPS. Let me point out some features missing from your simple load a map in memory architecture.
You also have to do backup and recovery. And for that, you need to write to disk, which becomes a big bottleneck since besides backup and checkpointing there is no other reason to ever write to disk.
Then, you have to know that even in mem database, data needs to be queried, and for that you need special data structures like a cache aware B+tree. Implementing one is non trivial.
Thirdly, doing atomic, consistent and isolated transaction is certainly trivial in a toy example but in an actual database where you have a high number of transactions, it's a lot harder. For example, when you have multiple cores, you certainly will have resource contention, and then you do need locks.
And last thing about gc, again, gc is great, but there has to be a custom gc for a database. You need to make sure the transaction log in memory is flushed before committing. And malloc is also very slow.
I'd suggest reading more into in mem research to understand this better. But in mem db is certainly not the same as a disk db with cache or a simple Hashmat/B+tree structure.
It's the actual garbage collection that might be expensive, but since that process deals with the fragmentation, there is no need to keep a data structure with available blocks of memory around.
That's also the reason why, depending on the patterns of memory usage, a GC can be faster than malloc+free.
Correct. So we're talking about in memory databases like MongoDb, and all of the things I listed here are true about MongoDb. For example, MongoDb migrated their database memory manager away from mmap and towards a custom memory manager (point being that gc and memory management for databases is not something you can just use jvm or operating system constructs for)
You _can_ have forms of durability if you wish to. You can get "good enough" (actually fairly impressive...) performance for most problems (vs only in-memory) with SQLite making memory the temp store, turning on synchronous and WAL. Then fsync only gets called at checkpoints and you have durability at the checkpoint.
Oh, that’s nothing. My company took over a contract from another company that had two DBA writing a schema to store approximately one hundred items in a database! We converted it to a JSON file.
Eh, the second one is probably the only point I was kind of meh on. You should almost always start with an RDBMS, and it will scale for most companies for a long long time, but for some workloads or levels of scale you're probably going to need to at least augment it with another storage system.
Are there other constraints that might make DynamoDB a good fit? For example I made an app at a client. We could use RDS or we could use Dynamo. I went with Dynamo because it could fit our simple model. What’s more, it doesn’t get shut off nightly when the RDS systems do to save money. This means we can work work on it when people have to time shift due to events in the life like having to pick up the kids.
The problem with NoSQL is that your simple model inevitably becomes more complex over time and then it doesn't work anymore.
Over the past decade I've realised using a RDBMS is the right call basically 100% of the time. Now pgsql has jsonb column types that work great, I cannot see why you would ever use a NoSQL DB, unless you are working at such crazy scale postgres wouldn't work. In 99.999% of cases people are not.
There are specific cases where a non SQL database is better. Chances are if you haven't hit problems you can't solve with an SQL database you should be using an SQL database. Postgres is amazing and free why would you use anything else.
Time series is one. Consider an application with 1000 time series, 1 hosts, and 1000 RPS. You are trivially looking at 1M writes per second per host. This usually requires something more than "[just] using a RDBMS".
a bit more context: High-velocity transactional systems (e.g any e-commerce store with millions of users all trying to shop at the same time), I helped to build such a 10 years ago here is the presentation - https://qconlondon.com/london-2010/qconlondon.com/dl/qcon-lo...
We just ported a system that kept large amounts of data in postgres jsonb columns over to mongodb. The jsonb column approach worked fine until we scaled it beyond a certain point, and then it was an unending source of performance bottlenecks. The mongodb version is much faster.
In retrospect we should have gone with mongo from the start, but postgres was chosen because in 99% of circumstances it is good enough. It was the wrong decision for the right reasons.
Yep, I agree there are cases where mongodb will perform better. However, many use cases also require joins and the other goodness that relations provide.
So really the use case for mongo etc is 'very high performance requirements' AND 'does not require relations'.
Many projects may be ok with just one of those. But very few require meet both of those constraints.
FWIW I've seen many cases which are sort of the opposite: great performance with mongodb, but then because of the lack of relations for a reporting feature (for example) performance completely plummets due to horrible hacks being done to query the data model that doesn't work with the schema, eventually requiring a rewrite to RDBMS. I would guess that this is much more common.
I found that for an EAV type database, NoSql is a much better match as it doesn't require queries with a million joins. But that's a very specific case indeed.
At scale it’s a little bit more than a few bucks. Across the board, we spend hundreds of thousands on ec2 instances for dev/test, so turning them off at night when nobody uses them saves you quite a lot of money.
I can't speak to your specific use case, but I can tell you that a relatively small RDS instance is probably a lot more performant than you think. There is also "Aurora Serverless" now which I've just started to play with but might suit your needs.
As far as what makes Dynamo a good fit, I almost take the other approach and try to ask myself, what makes Postgres a bad fit? Postgres is so flexible and powerful that IMO you need a really good reason to walk away from that as the default.
Aurora wasn’t allowed at the time. The system is a simple stream logging app. Wonderful for our use case. Dynamo for so far. Corp politics made the RDS instance annoying to pursue.
> In general, RDBMS > NoSql
These two bullet points resonate with me so much right now. I'm a consultant and a lot of my client absolutely insist on using DynamoDB for everything. I'm building an internal facing app that will have users numbering in the hundreds, maybe. The hoops we are jumping through to break this app up into "microservices" are absolutely astounding. Who needs joins? Who needs relational integrity? Who needs flexible query patterns? "It just has to scale"!