> Designing scalable systems when you don't need to makes you a bad engineer. > ...

PragmaticPulp · on Jan 24, 2021

As an engineer-turned-manager, I spend a lot of time asking engineers how we can simplify their ambitious plans. Often it’s as simple as asking “What would we give up by using a monolith here instead of microservices?”

Forcing people to justify, out loud, why they want to use a specific technology or trendy design pattern is usually sufficient to scuttle complex plans.

Frankly, many engineers want to use the latest trends like microservices or NoSQL because they believe that’s what’s best for their resume, even if it’s not necessarily best for the company. It doesn’t help that some companies screen out resumes that don’t have the right signals (Microservices, ReactJS, NoSQL, ...). There’s a certain amount of FOMO that makes early-career engineers feel like they won’t be able to move up unless they can find a way to use the most advanced and complex architectures, even if their problems don’t warrant those solutions.

reader_mode · on Jan 24, 2021

>Forcing people to justify, out loud, why they want to use a specific technology or trendy design pattern is usually sufficient to scuttle complex plans.

Does that really work ?

Usually these guys read the sales pitch from some credible source. Then you need to show them that the argument is X works really well for scenario Y but your scenario Z is not really similar to Y so reasons why X is good for Y don't really apply. To do this you usually rely on experience so you need to expand even further.

And the other side is usually attached to their proposal and starts pushing back and because you're the guy arguing against something and need a deep discussion to prove your point chances are people give up and you end up looking hostile. Even if you win you don't really look good - you just shut someone down and spent a lot of time arguing, unless the rest of the team was already against the idea you'll just look bad.

I just don't bother - if I'm in a situation where someone gives these kind of people decision power they deserve what they get - I get paid either way. And if I have the decision making power I just shut it down without much discussion - I just invoke some version of 'I know this approach works and that's good enough for me'.

ncann · on Jan 24, 2021

Yeah, god helps if a higher up is a zealot about a technology. They will try to suggest that at every opportunity and arguing against it makes you stand out like a sore thumb that after a while you wonder why you even bother.

Chris_Newton · on Jan 24, 2021

Frankly, many engineers want to use the latest trends like microservices or NoSQL because they believe that’s what’s best for their resume

The sad thing is, they might well be right.

People used to not get hired for a job involving MySQL because their DB experience was with Postgres, but usually more enlightened employers knew better. Today, every major cloud provider offers the basic stuff like VMs and managed databases and scalable storage, and the differences between them are mostly superficial. However, each provider has its own terminology and probably its own dashboard and CLI and config files. Some of them offer additional services that manage more of the infrastructure for you one way or another, too. There is seemingly endless scope for not having some specific combination of buzzwords on an application even for a candidate and a position that are a good fit.

I don’t envy the generation who are applying for relatively junior positions with most big name employers today, and I can hardly blame them for the kind of job-hopping, résumé driven development that seems to have become the norm in some areas.

u678u · on Jan 24, 2021

Agreed, I found it really hard to get good roles 5 years ago. Then I worked on some cool shiny stuff - in general I dont like microservices, k8s, React/JS but it opens a whole new world of jobs.

arvinsim · on Jan 24, 2021

> As an engineer-turned-manager, I spend a lot of time asking engineers how we can simplify their ambitious plans. Often it’s as simple as asking “What would we give up by using a monolith here instead of microservices?”

Funny you mentioned this. I have the exact opposite problem.

That is, I am an engineer trying to push back against management mandating the use of microservices and microfrontends because they are the new “hot” tech noawadays.

Twisol · on Jan 24, 2021

On my reading, this is the exact same problem, not the exact opposite problem. The break-even bar for a reasonable monolith is a lot lower than for microservices, so the GP's question is specifically asking, under a hypothetical where the team simply uses a monolith, what benefits the team would miss out on relative to microservices. If there are none, or they aren't relevant to the project scenario, then microservices probably isn't justifiable.

(I, too, am in the position of pushing back against microservices for hotness' sake.)

pseudalopex · on Jan 24, 2021

The point is it can be engineers pushing back against managers. Not just managers pushing back against engineers.

Twisol · on Jan 24, 2021

Aha! Opposite in that direction; I misread. (I'm also an engineer pushing back on managers who want microservices.)

Jcampuzano2 · on Jan 24, 2021

This. I'm a consultant and 90% of the time the technology has already been decided by our fancy management team who haven't written code in 10+ years before a line of code has been written. But they know the buzzwords like the rest of us and know they sell.

Problem is they no longer have to implement, so they are even more inclined to sell the most complicated tech stack that have marketing pages claiming they scale to basically infinity.

alexb7c4 · on Jan 24, 2021

In my company we store financial data for hundreds of thousands of clients in sql db. It's decade okd system and we have hundreds of tables, stored procedures (some touching dozen+ tables) and rely on transactions.

It took me weeks to convince my managers not to migrate to new hot nosql solution because "it's in cloud, it's scalable and it also supports sql queries".

busterarm · on Jan 24, 2021

> Frankly, many engineers want to use the latest trends like microservices or NoSQL because they believe that’s what’s best for their resume, even if it’s not necessarily best for the company.

Probably nobody is using NoSQL for their resume. It's because picking a relational database, while usually the correct choice, is HARD when you're operating in an environment that changes quickly and has poorly defined specifications.

When you start seeing engineers have difficulty reasoning about what the data model should be and nobody willing to commit to one, it's the clearest sign that organizationally things are sour and you need to start having very firm and precise conversations with product.

potta_coffee · on Jan 24, 2021

I'm facing this issue now. App is supposed to deliver to clients after this sprint - and the data model still isn't locked down. After arguing through about 10 hours worth of meetings this week, I think I need a new job.

theshrike79 · on Jan 24, 2021

The best thing about the lockdown is that you can apply, interview and change jobs without leaving your desk at home =)

busterarm · on Jan 24, 2021

My condolences.

Shorel · on Jan 24, 2021

There are only two options for me: MySQL or Postgres.

And using AWS generally means using Aurora. Then the choice is already made. Not hard at all.

Yep, my work involves heavy use of SQL and I find it better than the NoSQL insanity.

orobinson · on Jan 24, 2021

Just curious, for what reasons would you choose MySQL over Postgres?

kenhwang · on Jan 24, 2021

Personally, when I want speed or easy upkeep and intend on doing dumb simple things.

Postgres is more featureful, but if you don't intend on using those features, MySQL is consistently faster and historically smoother to update and keep running.

busterarm · on Jan 24, 2021

Also in the Enterprise, if you're doing a lot of sharding and replication across networks, Percona MySQL is a very compelling product. I say that as a Postgres diehard.

twic · on Jan 24, 2021

> MySQL is consistently faster

Unless you want to do a join.

xorcist · on Jan 24, 2021

Traditionally it was because you needed replication or sharding that you didn't have to boil half an ocean for, or at least half decent full text indices. These days however I believe the differences are smaller and in other areas.

Most often, you choose a database because of what you application supports and is tested with, not the other way around. Or what your other applications already use. Complete green fields aren't all that common.

Shorel · on Jan 25, 2021

Replication.

And our DBAs are already familiar with the gotchas of MySQL.

You just have to write queries in a different way (subqueries are slow, so they are to be rewritten as joins).

malinens · on Jan 24, 2021

Probably the same way sometimes people choose sqlite versus mysql- simplicity. There are many cons and pros for both of them!

ithrow · on Jan 24, 2021

It's because picking a relational database, while usually the correct choice, is HARD when you're operating in an environment that changes quickly and has poorly defined specifications.

Wouldn't this apply if you are using a static typed language too? what's harder about changing the schema in the DB?

tikhonj · on Jan 24, 2021

You (mostly) don't have to deal with data migrations with statically typed languages. Releasing a new version of some code is usually easier than making structural changes to a database that's in active use.

Chris_Newton · on Jan 24, 2021

Releasing a new version of some code is usually easier than making structural changes to a database that's in active use.

Yes, and on top of that, code-only changes need to be internally consistent to make sense but DB schema changes almost inevitably require some corresponding code change as well to be useful. Then you have all the fun of trying to deploy both changes while keeping everything in sync and working throughout.

busterarm · on Jan 24, 2021

You've hit on something there as well, but essentially it comes down to forced rewrites and flexibility. We tend to choose the more flexible systems to avoid forced upfront work when changes are needed even when it's the wrong choice for the project in the long run.

Falkon1313 · on Jan 24, 2021

Eh, not so much databases, but in terms of code, super flexible 'all things to all people' generic abstractions tend to be a lot more work and a lot more difficult to debug than a tight solution tailored to the problem it's solving, written in domain terminology.

If only I had a nickel for every hour I've spent debugging abstractions and indirection that were just there for the sake of adding flexibility that would never be needed.

busterarm · on Jan 24, 2021

I'm in agreement with you actually. I was badly suggesting that that kind of flexibility up front bites us in the ass later and it's a bad impulse often followed.

souprock · on Jan 24, 2021

Is every NoSQL database non-relational, and every relational database SQL? It sure seems to me that you could have a relational database without SQL. Something non-text could be nice. It might be a binary data format or compiled code.

One might even convert SQL to binary query data at build time, with a code generator. It could work like PIDL, the code generator Samba uses to convert DCE/RPC IDL files to C source with binary data. Binary data goes over the wire. Another way is that both client code and server code could be generated, with the server code getting linked into the database server.

eecks · on Jan 24, 2021

Which parts are hard?

blabitty · on Jan 24, 2021

I like to half jokingly assert that microservices are a pysop to sell cloud hosting

nitrogen · on Jan 24, 2021

If I were an evil tech giant, I would open source a bunch of libraries that require significantly more effort to use than necessary, and pitch them as the One True Solution. Just to slow my competitors down.

specialist · on Jan 24, 2021

I had a nemesis who would steal all my ideas.

So I bought all the XP books, dog eared them, left them on my desk. My team nearly mutinied. I asked them to wait and see. Two weeks later, nemesis announced his team was all in for XP, Agile, pair programming, etc.

They never recovered, didn't make another release.

I tossed my copies, unread.

potta_coffee · on Jan 24, 2021

I'm praying this story is true, very funny.

pestaa · on Jan 24, 2021

With a competent team, XP should work really well, right? So what happened?

kenhwang · on Jan 24, 2021

Ah, I see you've used k8s.

twic · on Jan 24, 2021

Not a joke. When i worked at Pivotal Labs, sales / executives were very excited about the synergy between helping clients build microservice architectures and selling them Cloud Foundry.

Gibbon1 · on Jan 24, 2021

I've long thought that Java write once, run anywhere, really was a desperate attempt to save Sun's legacy server market from doom.

naikrovek · on Jan 24, 2021

I bet you're half-right as well.

jiggawatts · on Jan 24, 2021

> “What would we give up by using a monolith here instead of microservices?”

I love that.

I think many people are loathe to "turn the argument around", and pretend they're going the other way.

For example, imagine some legacy app used by 10 people out of 10,000 is incompatible with something like a Microsoft Office upgrade.

In many organisations, the argument goes like this: "We can't upgrade to Office 2023 because StupidApp will break!"

Turning that around: "If Office 2023 was already rolled out, would you roll that back to Office 2021 just for StupidApp?"

Aeolun · on Jan 24, 2021

> Frankly, many engineers want to use the latest trends like microservices or NoSQL because they believe that’s what’s best for their resume

Then they are bad engineers. It is true that it’s best for their resume, but I also have my professional integrity to maintain.

Ma8ee · on Jan 26, 2021

That integrity isn’t worth much if you can’t get hired.

alkonaut · on Jan 24, 2021

The question one should ask is: what $hot_tech can we adopt without the product becoming significantly worse than with $old_tech? That is, which things do we adopt only or mostly to make the project or company attractive to work with or invest in?

Saying “why would you ever do that rather than building the solution the cheapest and with the lowest risk” doesn’t fully appreciate the importance of attractiveness.

I’m selling my hours of work time for a salary, fun, and resume points. My employer pays me in all 3. I’ll always push for $fun_tech or $hot_tech despite it not always being in the short term interest of anyone but myself or my fellow developers. I’ll keep justifying this by “if we do this in $old_tech then me and half the team will leave, and that’s a higher risk than using $new_tech”.

(By tech I here mean things like languages and frameworks not buzzwords like microservices or blockchain, ai... )

mrisoli · on Jan 24, 2021

IMO it's a no-brainer to choose complex technology, the incentives are much more attractive.

Go the simple way and it works, you get paid and off to the next project, if it doesn't work, its your fault.

But on the job experience with scaling tech is way more valued than doing some online course, you get paid to learn by doing on company time, and you don't lose anything by possibly wasting company resources. So you tick the box that often job postings have "proven track record of <insert scale buzz here>" which could possibly lead to a much better salary, its all incentives.

KaoruAoiShiho · on Jan 24, 2021

To me this is a little bit weird, because while OP is totally correct in that monoliths are totally fine too when it's the best tool for the job, the default should still be microservices. It's not really harder to use once you have the practice in place and advantages will usually be quite visible in time. But of course there are times when there are great monoliths you can just use and you should use them.

hu3 · on Jan 24, 2021

There are challenges with microservices that hints me to build monoliths by default unless not viable.

Things that are trivial in monoliths are hard in microservices like error propagation, profilling, line by line debbugging, log aggregation, orquestration, load balancing, health checking and ACID transactions.

It can be done but requires more complex machinery and larger teams.

closeparen · on Jan 24, 2021

Do you mean “a monolith” or “the monolith?” The essential characteristic of monoliths is that you don’t get to start new ones for new projects.

The real skill of architecture is understanding everything your company has built before and finding the most graceful way to graft your new use case onto that. We get microservices proliferation because people don’t want to do this hard work.

potta_coffee · on Jan 24, 2021

I don't understand splitting an API into a bunch of "microservices" for scaling purposes. If all of the services are engaged for every request, they're not really scaled independently. You're just geographically isolating your code. It's still tightly coupled but now it has to communicate over http. Applications designed this way are flaming piles of garbage.

Joeri · on Jan 24, 2021

The idea is that you can scale different parts of the system at different rates to deal with bottlenecks. With a monolith, you have to deploy more instances of the entire monolith to scale it, and that’s if the monolith even allows for that approach. If you take the high load parts and factor them out into a scalable microservice, you can leave the rest of the system alone while scaling only the bottlenecks.

All of this is in the assumption you need to scale horizontally. With modern hardware most systems don’t need that scalability. But it’s one of those “but what if we strike gold” things, where systems will be designed for a fantasy workload instead of a realistic one, because it’s assumed to be hard to go from a monolith to a microservice if that fantasy workload ever presents itself (imho not that hard if you have good abstractions inside the monolith).

potta_coffee · on Jan 24, 2021

I understand how microsercices work, but I'm referring to a specific kind of antipattern where an application is arbitrarily atomized into many small services in such a manner that there's zero scaling advantage. Imagine making every function in your application a service as an extreme example.

Chris_Newton · on Jan 24, 2021

This seems to be an example of a more general antipattern in software development, where a relatively large entity is broken down into multiple smaller entities for dogmatic reasons. The usual justification given is how much simpler each individual entity now is, glossing over the extra complexity introduced by integrating all of those separate entities.

Microservice architectures seem to be a recurring example of this phenomenon. Separating medium to long functions into shorter ones based on arbitrary metrics like line count or nesting depth is another.

Joeri · on Jan 24, 2021

Assuming every function is called the same amount of times and carries the same cost it would indeed be silly to cut up a system like that. But in the real world some parts of the system are called more often or carry a high cost of the execution. If you can scale those independently of the rest of the system, that is a definite advantage.

For me the antipattern poses itself when the cutting up into microservices is done as a general practice, without a clearly defined goal for each service to need to be separate.

(And by the way, i’ve seen a talk before of an application where the entire backend was functions in a function store, exactly as you described. The developer was enthusiastic about that architecture.)

xorcist · on Jan 24, 2021

> you have to deploy more instances of the entire monolith to scale it,

That's a common argument for microservices and one that I always thought was bunk.

What does that even mean? You have a piece of software that provides ten functions, running 100 instances of it in infeasible but running 100 of one, 50 of three and 10 of six is somehow not a problem?

That must be really the perfect margin call of some vsz hungry monstrosity. While not an impossible situation in theory, surely it can't be very common.

There are plenty of reasons to split an application but that seems unlikely at best.

randomswede · on Jan 25, 2021

I have seen multiple production systems, in multiple orgs, where "he monolith" provides somewhere in the region of 50-100 different things, has a pretty hefty footprint, and the only way to scale is to deploy more instances, then have systems in front of the array of monoliths sectioning off input to monolith-for-this-data (sharding, but on the input side, if that makes sense).

In at least SOME of these cases, the monolith would've been breakable-up into a smaller number of front-end micro-services, with a graph of micro-services behind "the thing you talk to", for a lesser total deployed footprint.

But, I suspect that it requires that "the monolith" has been growing for 10+ years, as a monolith.

Ma8ee · on Jan 26, 2021

> imho not that hard if you have good abstractions inside the monolith

And that is the big if! The big advantage of micro services is that it forces developers to think hard about the abstractions, and can’t just reach over the boarder breaking them when they are in a hurry. With good engineers in a well functioning organisation, that is of course superfluous, but those preconditions are unfortunately much rarer than they should be.

captrb · on Jan 24, 2021

Especially true when the services are all stateless. If there isn’t a conway-esque or scaling advantage to decoupling the deployment... don’t.

I had a fevered dream the other night where it turned out that the bulk of AWS’s electricity consumption was just marshaling and unmarshalling JSON, for no benefit.

jiggawatts · on Jan 24, 2021

I recently decided to benchmark some Azure services for... reasons.

Anyway, along this journey I discovered that it's surprisingly difficult to get a HTTPS JSON RPC call below 3ms latency even on localhost! It's mindboggling how inefficient it actually is to encode every call through a bunch of layers, stuff it into a network stream, undo that on the other end, and then repeat on the way back.

Meanwhile, if you tick the right checkboxes on the infrastructure configuration, then a binary protocol between two Azure VMs can easily achieve a latency as low as 50 microseconds.

hardlianotion · on Jan 24, 2021

A few years ago, my good friend used to say that the first two main duties of a financial quant library are string manipulation and memory allocation.

potta_coffee · on Jan 24, 2021

Exactly!

Nycto · on Jan 24, 2021

The first thing that comes to my mind is that there are different axes that you may need to scale against. Microservices are a common way to scale when you’re trying to increase the number of teams working on a project. Dividing across a service api allows different teams to use different technology and with different release schedules.

potta_coffee · on Jan 24, 2021

I don't necessarily disagree, but I believe that you have to be very careful about the boundaries between your services. In my experience, it's pretty difficult to separate an API into services arbitrarily before you've built a working system - at least for anything that has more than a trivial amount of complexity. If there's a good formula or rule of thumb for this problem, I'd like to know what it is.

Nycto · on Jan 24, 2021

I agree. From my perspective, microservices shouldn’t be a starting point. They should be something you carve out of a larger application as the need arises.

bawolff · on Jan 24, 2021

People always talk about nosql scaling better, but some of the largest websites on the internet are mysql based. I'm sure some people have problems where nosql is genuinely an appropriate solution, but i find it hard to believe that most people get anywhere near that level of scalability.

cbdumas · on Jan 24, 2021

Exactly, and from a features standpoint Postgres can do everything Dynamo can do and so much more. I think a lot of software devs don't really know SQL or how RDBMS work so they don't know what they are giving up.

whymauri · on Jan 24, 2021

This is similar to how I feel about graph databases. Twitter (FlockDB) and Facebook (TAO) built scalable graph abstractions over SQL without a hitch.

Why would I want to use a graph DB directly then?

GordonS · on Jan 24, 2021

Postgres even has JSONB support, so if you really want to store whole documents NOSQL-style, you can - and you can still use all the usual RDBMS goodness alongside it!

Postgres really is a wonderful database.

Joeri · on Jan 24, 2021

Those very large mysql deployments typically use it as a nosql system, with a sharded database spread over dozens or hundreds of instances, and referential integrity maintained by the business layer, not by the database.

For a good example of a high volume site using a proper rdbms approach I would look at stackoverflow. It can (and has) run on a single ms sql server instance.

bawolff · on Jan 24, 2021

Even if that's so, still suggests rdbms are a good choice.

I do know for Wikipedia, english wikipedia is mostly a single master mysql db + slaves, with most of the sharding being on the site language level (article text contents stored elsewhere)

peteradio · on Jan 24, 2021

POC scales easier, thats all that matters to win the idiot match.

u678u · on Jan 24, 2021

hey.com is the latest one that is on mysql.

specialist · on Jan 24, 2021

I was on a team which used DynamoDB for their hottest data set. Which would trivially fit in RAM.

busterarm · on Jan 24, 2021

If I had a dollar for every senior engineer I've worked with who has never heard of SQLite...

xxs · on Jan 24, 2021

Truth be told I am yet to see a reason to use in-memory database. Datstructures, maps/trees/set - yes. Concurrent/lock free/skip lists/whatever - all great. I don't need a relational database when I can use objects/structs/etc.

Falkon1313 · on Jan 24, 2021

I think that depends on what you're doing with the data. If you're just grabbing one thing and working with it, or looping through and processing everything, maybe not.

But if you're doing more complicated query-like stuff, especially if you want to allow for queries you haven't thought of yet, then the DB might be useful.

Sometimes a hybrid of query-able metadata in a DB along with plain old data files is good.

That depends very much on your data, how much things key to each other, and what you're doing with it.

xxs · on Jan 24, 2021

>doing more complicated query-like stuff

That's some kind of fallacy - standard datastructures would totally destroy any-sql-alike thing, if it comes to performance (and memory footprint). I guess it does depend on where the background comes when it comes to convenience - or how people tend to see their data. However like I said - for close to 3 decades I have not seen a single reason to do so. On the contrary I've had cases where optimization of 3 orders of magnitude was possible.

jawns · on Jan 24, 2021

It's easier to find devs who know basic SQL than it is to find devs who know pandas or whatever your language specific SQL-like library is. And the more complicated the queries, the more the gulf widens.

recursive · on Jan 24, 2021

I don't think pandas was the proposal here. I think "standard data structures" refers to arrays, hash tables, trees, and the like.

busterarm · on Jan 24, 2021

Performance is not god. It is not the altar at which we sacrifice all other considerations.

throwaway08320 · on Jan 24, 2021

> for close to 3 decades I have not seen a single reason to do so

Evidently you don't have dataset far exceeding the amount of RAM you can afford.

For a good example look at LMDB.

busterarm · on Jan 24, 2021

ACID transactions, validations & constraints, and the ability to debug/log by dumping your data to disk which can then easily be queried with SQL.

All of the same reasons you would store relational data in a dbms...

xxs · on Jan 24, 2021

>ACID transactions, validations & constraints

There is no D from the ACID. For the D to happen, it takes transaction logs + write barrier (on the non-volatile memory). Doing Atomic, consistent and isolated is trivial in memory (esp. in GC setup), and a lot faster: no locks needed.

Validations and constraints are simple if-statements, I'd never think of them as sql.

piggubiggu · on Jan 24, 2021

It sounds like you're talking about toy databases which don't run at a lot of TPS. Let me point out some features missing from your simple load a map in memory architecture.

You also have to do backup and recovery. And for that, you need to write to disk, which becomes a big bottleneck since besides backup and checkpointing there is no other reason to ever write to disk.

Then, you have to know that even in mem database, data needs to be queried, and for that you need special data structures like a cache aware B+tree. Implementing one is non trivial.

Thirdly, doing atomic, consistent and isolated transaction is certainly trivial in a toy example but in an actual database where you have a high number of transactions, it's a lot harder. For example, when you have multiple cores, you certainly will have resource contention, and then you do need locks.

And last thing about gc, again, gc is great, but there has to be a custom gc for a database. You need to make sure the transaction log in memory is flushed before committing. And malloc is also very slow.

I'd suggest reading more into in mem research to understand this better. But in mem db is certainly not the same as a disk db with cache or a simple Hashmat/B+tree structure.

sobani · on Jan 25, 2021

> And malloc is also very slow.

Isn't one of the advantages of a GC environment that malloc is basically free? Afaik the implementation of malloc_in_gc comes down to

    result_address = first_free_address;
    first_free_address += requested_bytes;
    return result_address;

It's the actual garbage collection that might be expensive, but since that process deals with the fragmentation, there is no need to keep a data structure with available blocks of memory around.

That's also the reason why, depending on the patterns of memory usage, a GC can be faster than malloc+free.

xxs · on Jan 24, 2021

>It sounds like you're talking about toy databases which don't run at a lot of TPS.

The original talk was explicitly about SqlLite and in-memory databases, no idea where you got the rest of.

piggubiggu · on Jan 24, 2021

Correct. So we're talking about in memory databases like MongoDb, and all of the things I listed here are true about MongoDb. For example, MongoDb migrated their database memory manager away from mmap and towards a custom memory manager (point being that gc and memory management for databases is not something you can just use jvm or operating system constructs for)

https://docs.rocket.chat/installation/docker-containers/mong...

I'm happy to justify every single point I made with research papers.

Lastly I know I came off as a bit condescending. Just having a bad day, nothing personal. But you should read more about in mem dbs.

busterarm · on Jan 24, 2021

You _can_ have forms of durability if you wish to. You can get "good enough" (actually fairly impressive...) performance for most problems (vs only in-memory) with SQLite making memory the temp store, turning on synchronous and WAL. Then fsync only gets called at checkpoints and you have durability at the checkpoint.

earthboundkid · on Jan 24, 2021

Oh, that’s nothing. My company took over a contract from another company that had two DBA writing a schema to store approximately one hundred items in a database! We converted it to a JSON file.

jugg1es · on Jan 24, 2021

I've definitely had to push back on engineers wanting to use Redis for caching data they just pulled from the database. "Just store it in RAM guys..."

d23 · on Jan 24, 2021

Eh, the second one is probably the only point I was kind of meh on. You should almost always start with an RDBMS, and it will scale for most companies for a long long time, but for some workloads or levels of scale you're probably going to need to at least augment it with another storage system.

k__ · on Jan 24, 2021

I think, it's mostly a question of education.

Universities taught SQL for years, so everyone knows it and its edge cases.

NoSQL databases are all different AND they weren't all taught for decades.

If you put real effort into learning a specific NoSQL database and it is suited for your problem things work out pretty well.

Aeolun · on Jan 24, 2021

Depends? If you know you are going to need that scale you can take it into account when selecting your RDBMS technology/setup.

jugg1es · on Jan 24, 2021

I can't think of a worse decision than trying to use DynamoDB just for the sake of using it.

mathattack · on Jan 24, 2021

I’ve seen similar issues where people got stuck on Mongo because it’s easy to install.

vmception · on Jan 24, 2021

In my own different comment I highlighted the same two points with the opposite conclusion haha!

I find dynamodb to be unnecessary but I prefer nosql systems

virmundi · on Jan 24, 2021

Are there other constraints that might make DynamoDB a good fit? For example I made an app at a client. We could use RDS or we could use Dynamo. I went with Dynamo because it could fit our simple model. What’s more, it doesn’t get shut off nightly when the RDS systems do to save money. This means we can work work on it when people have to time shift due to events in the life like having to pick up the kids.

martinald · on Jan 24, 2021

The problem with NoSQL is that your simple model inevitably becomes more complex over time and then it doesn't work anymore.

Over the past decade I've realised using a RDBMS is the right call basically 100% of the time. Now pgsql has jsonb column types that work great, I cannot see why you would ever use a NoSQL DB, unless you are working at such crazy scale postgres wouldn't work. In 99.999% of cases people are not.

xupybd · on Jan 24, 2021

There are specific cases where a non SQL database is better. Chances are if you haven't hit problems you can't solve with an SQL database you should be using an SQL database. Postgres is amazing and free why would you use anything else.

maccard · on Jan 24, 2021

People keep saying there are specific cases where NoSQL is better, but never what any of those cases are.

throwaway870923 · on Jan 24, 2021

Time series is one. Consider an application with 1000 time series, 1 hosts, and 1000 RPS. You are trivially looking at 1M writes per second per host. This usually requires something more than "[just] using a RDBMS".

kfir · on Jan 24, 2021

Here you go, this is from a system I helped building 10 years ago that is an eternity in tech - https://qconlondon.com/london-2010/qconlondon.com/dl/qcon-lo...

kfir · on Jan 25, 2021

a bit more context: High-velocity transactional systems (e.g any e-commerce store with millions of users all trying to shop at the same time), I helped to build such a 10 years ago here is the presentation - https://qconlondon.com/london-2010/qconlondon.com/dl/qcon-lo...

Joeri · on Jan 24, 2021

We just ported a system that kept large amounts of data in postgres jsonb columns over to mongodb. The jsonb column approach worked fine until we scaled it beyond a certain point, and then it was an unending source of performance bottlenecks. The mongodb version is much faster.

In retrospect we should have gone with mongo from the start, but postgres was chosen because in 99% of circumstances it is good enough. It was the wrong decision for the right reasons.

martinald · on Jan 24, 2021

Yep, I agree there are cases where mongodb will perform better. However, many use cases also require joins and the other goodness that relations provide.

So really the use case for mongo etc is 'very high performance requirements' AND 'does not require relations'.

Many projects may be ok with just one of those. But very few require meet both of those constraints.

FWIW I've seen many cases which are sort of the opposite: great performance with mongodb, but then because of the lack of relations for a reporting feature (for example) performance completely plummets due to horrible hacks being done to query the data model that doesn't work with the schema, eventually requiring a rewrite to RDBMS. I would guess that this is much more common.

lodovic · on Jan 24, 2021

I found that for an EAV type database, NoSql is a much better match as it doesn't require queries with a million joins. But that's a very specific case indeed.

tomnipotent · on Jan 24, 2021

> it doesn’t get shut off nightly when the RDS systems do to save money

If your company needs to shutdown RDS to save a couple of bucks a month, there's a much larger problem at hand than RDS vs Dynamo.

Aeolun · on Jan 24, 2021

At scale it’s a little bit more than a few bucks. Across the board, we spend hundreds of thousands on ec2 instances for dev/test, so turning them off at night when nobody uses them saves you quite a lot of money.

cbdumas · on Jan 24, 2021

I can't speak to your specific use case, but I can tell you that a relatively small RDS instance is probably a lot more performant than you think. There is also "Aurora Serverless" now which I've just started to play with but might suit your needs.

As far as what makes Dynamo a good fit, I almost take the other approach and try to ask myself, what makes Postgres a bad fit? Postgres is so flexible and powerful that IMO you need a really good reason to walk away from that as the default.

virmundi · on Jan 24, 2021

Aurora wasn’t allowed at the time. The system is a simple stream logging app. Wonderful for our use case. Dynamo for so far. Corp politics made the RDS instance annoying to pursue.