Hacker Newsnew | past | comments | ask | show | jobs | submit | AlterEgo20's commentslogin

As for number 5 - Postgres already does that. This feature is called "constraints". Such constraints can be applied to multiple columns and are used to build better query plans (as well as limiting DML operations)


Even a single rare rollback can kill a prod DB. We had a situation when a bug in our code caused a series of huge UPDATEs in Oracle DB and LOCKed some critical tables. After ~2 hours we found this session and killed it.

The problem - we had to wait for 2 more hours with dead prod while Oracle was rolling back everything. In Oralce killed sessions still hold their locks while rolling back.

In Postgres rollback is almost instant.


There is not suck thing as "NoSQL store". NoSQL is just a buzzword that is used to describe dozens (hundreds?) of very different solutions.


You are being obtuse. It is obvious what I'm talking about.


No, I don't think it's obvious nor are you right about this.


Look at every mobile gaming company and you will see document stores. Dynamo, BigTable, DocumentDB. Google's very own analytics is built on BigTable. The examples are countless.


While a document store is a subset of NoSQL databases, NoSQL is not synonymous with document store. It also includes graphs, key-value stores, etc. That's what the original person was getting at - you should've started with "document store".

While I may use JSON for storing click data, I wouldn't do it using a NoSQL database.


Neither Dynamo nor BigTable are document stores.

And there is no reason why Postgres couldn't handle this kind of stuff. Sure it doesn't come with the needed scaling tools built in but it's doable. I'm running billions of rows through a Postgres analytics setup.

A purpose built system can be more efficient but it also doesn't need to mean NoSQL. You can see some of the serious-scale datastores adopting SQL-like query interfaces. For example Hive for Hadoop or PlyQL for Druid.

NoSQL is a loose term for a mish-mash of technologies.


Dynamo started as a document store. Bigtable is not but I included it because it is another very good option for analytics and is still lumped under the nosql umbrella.

Using relational database for event analytics means that you have a trivial use case. You might have a lot of rows, but your data is dead simple. Otherwise you would refactor every week to change the schema and sharding. Not to mention how much money you would spend on hardware.


> Dynamo started as a document store

No it didn't. The original Dynamo paper doesn't even include the word "document" once! The title of the paper makes it obvious what kind of system it is: "Dynamo: Amazon’s Highly Available Key-value Store"

You should really do some fact checking on your statements.

Bigtable alone does not make an analytics system. You'll need a lot more around that.

I don't have a trivial use case. We use the jsonb data type in Postgres and have no upfront defined schema for metrics.

Postgres' jsonb can out-perform many of the open source NoSQL databases out there. Our system consists of a single digit of nodes that can handle hundreds of thousands of writes per second and as I mentioned already, billions of rows. A single CPU core can scan though more than a million rows per second. Is it the fastest that there is? No. Is it good enough for most people? Absolutely.


While big table can be classified as a database that can be scaled to extremely large size, I would not recommend it for the backend of a game at all.

It would be extremely expensive to run a workload like this, and would not be very performant.

This is if we are assuming a very high transaction rate, lots of concurrency and likely highly volatile data. Big table is just not really for that. Either is Dynamo. Both of these are great at giant scale multi purpose databases however.


Google "Postgres BDR". It is new, but it allows you to replicate changes on master to another master. With some configuration it can even change replicated data.


Thank you. I believe that bidirectional replication will solve the ETL problem, if it can be configured to stream from a replica in the source/oltp cluster (to minimize network IO on the source cluster master db) to the master of the destination/olap cluster. The olap master can mirror the oltp data, but also use triggers to denormalize oltp tables.

One thing I'd like to see is the ability to sanitize/munge data from the source cluster before it is sent to the destination cluster, for masking sensitive data in the source cluster (PCI, HIPAA, etc).


Any SQL DBMS is able to store and handle NUMERIC data type. It is a binary decimal with any precision you may want.

Some of the DBMS's also have dedicated currency types.


Backup: http://www.postgresql.org/docs/current/static/app-pgdump.htm...

In short:

  pg_dump -Fc mydb > db.dump
Restore: http://www.postgresql.org/docs/current/static/app-pgrestore....

In short:

  pg_restore -C -d postgres db.dump
Command directly from the shell:

  psql -c "psql command"


and the default dumps are simple sql files, which you can actually pipe to another server.

pg_dump -C mydb | psql -h ${some_other_server} mydb


Easy. Just use dblink or foreign table to execute part of the statement inside other transaction. dblink to the same server/db is often used as way to create an "AUTONOMOUS_TRANSACTION"


Why `nohup postgres 2>&1 > postgres.log &` instead of `pg_ctl start`? You could also use `pg_ctl -w start` to wait for proper server startup instead of `sleep 2`


Why? Ignorance. I'm used to running postmaster from years past (when not starting from an /etc script). Postmaster is now just called postgres, and it works for a disposable test instance setup.

I'll look into that, though, as it sounds like the right thing vs the sleep hack.


... and I put the 2>&1 in the wrong place :-(

(before the file redirection instead of after)


"modern development frameworks can handle data integrity stuff". Wrong. They cant. If you have 2+ applications working with the same db, there will be problems with data integrity that no "modern development framework" can solve.


If you have two applications making use of the same database, then build a shared ORM layer and have the applications interact with that. Modern applications do not interact with the database directly there is an ORM.

Separate that out and share it between however many applications you want you should be able to swap out data stores not be confined to them.


Your solutions to these problems (like "do validation and integrity in the application layer" and "build a shared ORM layer for multiple applications") are definitely possible solutions to the problems, but you state them as if they are obviously the best solutions. They actually sound a lot harder to me and seem to be driven by some theoretical purity in having a dumb data store that I don't really understand. Building a shared ORM layer for multiple applications to use is just a hack around having a shared data store layer doing validation and integrity, which is what a smarter database already does. It's reinventing one of the really hard things that databases are already good at. What is the advantage?

I've also never made an application that doesn't eventually want to do something that the ORM I'm using doesn't know how to do, at which point it makes a lot more sense to interact directly with the database than to shrug my shoulders and say "I guess I can't do that".

Maybe my applications just aren't "modern" enough.


There are `security_barrier` VIEWs. They give a "way of restricting a given connection/user to only rows with the right foreign key in a table". Still you have to create them all (not a simple task for complex DB scheme).


That'd make in a very simple case (an app with a user_profile table) one view per user? This doesn't sound particularly appealing...


"one view per user" - you can have a single VIEW, that filters rows based on user name ("owner_user_name" column in the base table). Or use some kind of mapping table for "user_name" <-> "table PK id". It gets more complicated with inherited user roles, but still manageable.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: