Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Having ran MySQL in prod for a decade and PostgreSQL in prod for half a decade I can say without doubt that your data team is telling fibs.

Firstly we consider that there are multiple replication possibilities of both technologies- however I'm going to assume the defaults because that's pretty much what everyone uses except if there's an actual case for using something else. It's the exception.

But by default MySQL uses statement based replication (in a weird binary format with log positions and stuff) and postgresql does logical replication (as in, you transmit the binary differences of what you'll be doing to the replica's database files directly and the replica just follows along)

Both of these approaches have trade-offs depending on what you want.

Statement based replication is great if you want to have _different_ datasets on each side, You can transform the data or remove huge chunks of it on a slave and use it for a dedicated purpose. However that applies the other way, you can never really be 100% sure that your replica looks exactly like your master.

this bit me a few times with MySQL when I assumed that because the replica was 'up to date' with the master and it was set to read only, that the data had integrity- it absolutely did not.



I don't think the claim of MySQL replication being better is related to statement vs. row vs. binary diff. I think it is about the tooling and community knowledge about replication, and about running MySQL in large scale production environments in general.

MySQL is run more often at extremely large scale (Facebook, YouTube, Twitter, Dropbox, etc.) than Postgres. That results in very battle tested and/or featureful tooling like orchestrator (https://github.com/github/orchestrator), MHA (https://github.com/yoshinorim/mha4mysql-manager), ProxySQL (http://www.proxysql.com/), and gh-ost (https://github.com/github/gh-ost), along knowledge and best practices shared by those organization.


What are anybody's real-world use case for proxysql?

I have been toying with the idea of using it for local devs to access our prod DB for reads (for accurate data) and using a local DB on their machines for writes.

Not sure how to handle UPDATEs, though.


Use cases for ProxySQL: many.

- failover

- query routing (e.g., for sharded deployments)

- caching

- workload stats/metrics

- query rewriting

etc.


Having also supported hundreds of production MySQL databases, statement based replication is absolutely inferior. But it should also be noted that row based replication (similar to streaming replication in that the actual data changes are synced) has been supported in MySQL since v5.1.5 (current is v5.7). And row based replication is the default since v5.7.7 https://dev.mysql.com/doc/refman/5.7/en/replication-formats....


That's something I didn't know actually, so +1 to you.

I've been migrating away from MySQL for a number of years now.


Did you mean "physical replication"? Logical replication corresponds to MySQL's model, I think, whereas WAL streaming is just copying over the changed bytes as they get written to the WAL on the master database.

I like Postgres's physical replication for its straightforwardness. It's pretty easy to tell if your replica is up to date unless something really weird is going on. (undetected data corruption?).

That said, PostgreSQL doesn't really make replication appear easy, so I can understand people thinking that even a basic master-slave setup is difficult (In my experience its behaviour is much easier to understand than with MySQL). However, MySQL is ahead in multi-master user friendliness, and setting up eg. a simple galera cluster is pretty easy.

Whether an "easy" multi-master galera set up is actually production-quality is another matter entirely, but it is not difficult to get up and running.


> Did you mean "physical replication"? Logical replication corresponds to MySQL's model

nope postgresql supports logical replication since 10. https://www.postgresql.org/docs/10/static/logical-replicatio...

> Whether an "easy" multi-master galera set up is actually production-quality is another matter entirely, but it is not difficult to get up and running.

if you have regular network partitons (actually having network paritions is always the case, especially inside clouds) than a galera cluster can actually broke in several cases that are even worse than any failure even the most broken replication on postgresql/mysql non galera can do.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: