Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
CodernityDB — pure Python, NoSQL, fast database (codernity.com)
183 points by amarsahinovic on Nov 14, 2012 | hide | past | favorite | 49 comments


Since these questions will inevitably come up:

  - You fsync yourself to ensure durability.  I can't see at a glance 
    what fsync settings are used for the speed tests.
  - It's not transactional, although single operations are atomic.
  - Indexes operate on a single-writer, multiple reader basis.
  - No traditional joins, although you can of course write
    a procedural function that joins for you.


These have come to be common features of NoSQL in general. I wouldn't be too surprised to see them here in a product marketed as NoSQL.

The other attributes i'm not too fussed about personally, i figure i can always work around them or face the question "am i using the right tool for this job?" but the fsync one still gives me the heeby-jeebies (even though it's never bitten me despite many many TBs of data).

Cassandra does periodic fsync's (although it can be configured). I understand HBase and Mongo have similar shennanigans.


A cursory glance through the code revealed a bug: https://bitbucket.org/codernity/codernitydb/issue/1/_rev-not...


You can learn a lot about a community, a product, and a company based on how they respond to bug reports. Promptness, formality, details and politeness are all incredible indicators about what's going on under the hood.


And what did you make of their reply?


By the way, the risk of a collision with 10 times more members is much more than 10 times as likely, due to the birthday paradox.


Yeah, I realised I got the probability wrong but haven't written a follow up (typing on my phone is tough, writing code is impossible). The probability of being able to update an old version of the document, is simply 1/ 65536. That's a very high probability given thousands of documents or a modest number of competing updates. The biggest issue is that it's undetectable and non-deterministic: the update that should fail will succeed silently and it can happen at any time.

I'll re-open the bug in the morning with a proof-of-concept.


"CodernityDB pure python, NoSQL, fast database"

Is the fact that it is written in "pure Python" really the most important thing to reenforce after the name of the product itself?

Why would I use this over established products like Riak, Redis, MongoDB, etc?


> Is the fact that it is written in "pure Python" really the most important thing to reenforce after the name of the product itself?

It's definitely a selling point for Python programmers, because it's so easy to use with your Python projects. You just set the package requirement and you're done, it will work in the same environment wherever your app works, upgrades are a piece of cake, no need to worry about platform support, permissions, etc.

f.ex. Whoosh (full-text search) gained a lot of traction in the Python world not because it was the fastest at the time nor the most full-featured (compared to the more mature Java-based ones), but because of convenience. Such solutions, even when they're not the most advanced player in the league, are great for starting up fast and pushing features out the door.

On a side note, it's a pleasure for me to see it's from Poland. Will test it out on a feature project in the next few days.


The thing I like most in python is not python itself but a "pythonic way". "Pythonic" > Python.

In my opinion, it's not "pythonic" to use DBMS, just because it written in python.

So the "selling point" is flawed.

Disclosure: I'm using python since 2006. I've made a lot of evangelizing it. It is my weapon of choice for many tasks. And I will never use python for many other tasks. (Don't tell me about PyPy or Stackless)


'a selling point' != 'just because'.

I get your point and I'm all in for being language-agnostic when it comes to the tools I have to use, but that doesn't change the fact that convenience plays a role when you are in a point where you can't affort a long-term decision process. the best technology to quickly launch something is most often simply the one you know.


You make an excellent point, but when working in Python there are advantages to tying in to tools that are also written in Python. They will tend to be easier to integrate into your project than non-Python code[1] and more than that will generally be easier for you to extend or tweak[2] if you run into a case where you have to "look under the hoos" for some reason.[3]

[1] Python works well with other languages of course, and is often used as a "glue" between other components.

[2] This assumes that you know Python better than whatever language it was made in, but for many cases that will be true.

[3] I often like to look at libraries just to understand how they work, but that is different. There are often cases, especially where the tool uses some abstraction that can leak, where you need to look under the hood just to get things working.


I'm an architect in my firm and I'm tired of: a) .NET-ters who will use something only if it is .NET or at least Microsoft b) Javers who think that everything non-java is universal evil.

(We have 45% of .NET-ters, 35% of javers, 20% : devDBAs, Js-ers, others)

My propaganda is always: "Be programmers, mazafakerz!" And python is excellent tool for explaining ideas between this groups. Javer will not be offended if I show him .NET-code and otherwise.

For them python is "executable pseudo-code".

It will be sad for me if pythonistas became a caste like .NET, Java ... and Haskel (they are not a caste but have all possibilities to became one)


If you look at the speed benchs they compare it to kyoto cabinet. I think that CodernityDB is meant to be embedded and is thus only interesting to python people. In that case the advantage would be to avoid the compilation step on intall that kyoto cabinet or sqlite would incur.

It's a library database like sqlite and kyoto cabinet


Looks like they also have an HTTP service and a Python client library, so you can use it other ways than embedded, and for the Python client, you can switch back and forth from the embedded one to the HTTP one with the same API. Seems pretty slick.


Python is on almost every host out there now. Riak, Redis, and MongoDB are not, and on many hosts you won't be able to compile them from the source. So yes, that's relevant.


What platforms are people trying to run Riak, Redis, and MongoDB on where a package is not available, or a compiler is not available?

add repository.... {apt-get,yum,brew,port} install {mongodb,riak,redis}

I see little benefit in compiling any of these products from source.


If you are on a host without a compiler you don't need a NOSQL db.


How do you figure?


I believe the insinuation is that if you're actually building a serious app that might actually benefit from NoSQL, then you should host it on a serious machine and not some shared hosting solution.

Basically conflating the idea that NoSQL is for 'real projects' and that only 'amateur hour' hosts have no compiler.


NoSQL is targeted at extreme performance. So, yes, it sounds like a reasonable assumption.


There are other advantages to NoSQL, like flexible data structure, and not having to squish your data into two incompatible data models.


Wouldn't you want to remove extraneous software eg. a compiler from a dedicated database host for security? (Yes, Python could be included in that as well)


Avoids dependency issues when you try to deploy your work on an unmodifiable target.

I've been looking at DBM::Deep (one of perl's equivalent to CodernityDB) and App::FatPacker recently just for this case (install a script on some Macs with only Perl being needed requirement).

ref: https://metacpan.org/module/DBM::Deep | https://metacpan.org/module/App::FatPacker


For me, the db being in pure-python means a db that can be easily extended or augmented. If you have some quirky requirement your options are 1) implement this in the client, or 2) modify your db. Option 1 is probably what most people opt for (how many people have the confidence to dig into postgres/mysql source, how many of these people want to?), and a pure-python db makes option 2 more feasible.


"Is the fact that it is written in "pure Python" really the most important thing"

It is for me. It's the most unique attribute about the project. Fast? Yawn. NoSQL? Yawn.


Although pure python seems great for some apps, for a piece of software I'd like to be highly optimized, it seems to hit the wrong sweet spot -- it means a larger memory footprint and less robust multi-threading than other pure language implementations. When things need to work fast and tight in python, most implementations duck down to C, like numpy and scipy. Or maybe I've been out of touch with improvements to CPython?


Threading in Python is totally capable here. A database is more likely I/O bound than CPU bound.


What about memory? There's a fairly large overhead for pure python data structures compared to C/C++.


perhaps it could be attracting devs, since python is a really awesome language to work with. but really, let's quit the snarkyness.


It would be helpful to add "How is this different from other NoSQL databases" to the FAQ.


It seems as though the key differentiator is 'pure python'. For python environments, this is a huge bonus, as it makes it easy to install and support with existing tools.


This seems like a speedy solution for inserts but I'm curious about reads

"Indexes tries to reuse as much space as possible, because metadata size is fixed, during every write operation, if index finds metadata marked as removed or so, it reuses it - writes new data into that place."

I'm curious how this is implemented.


Interesting, but I'd like to see how this compares to other NoSQL such as MongoDB.


I'd love to check this out this morning, but the page is unreadable using Safari on an iPhone. The table of contents stays floating above the body text, and takes up almost the whole screen.


What is the difference between DatabaseThreadSafe and DatabaseSuperThreadSafe?

http://labs.codernity.com/codernitydb/design.html#how-it-s-b...


I've registered ##codernitydb in freenode and will idle in there, if anybody is interested in forming a community in irc


No Python 3 = no thanks


python ehh


In case, like me, you like SQL: http://gadfly.sourceforge.net/


The only thing that turns me off to a product more than a poor website is poor English in the documentation. If English isn't your primary language, PLEASE get someone that speaks it fluently to either write your docs, translate your docs, or edit what you've written.


They are nice to write the documentation in poor English, they could launch the product only with Polish documentation and then you would never be able to use it before someones translates it.


> they could launch the product only with Polish documentation and then you would never be able to use it before someones translates it

This is true. It is also true that I am not able to use it at present because I don't understand the documentation.

A large part of modern programming is getting one's code to work with external libraries. For this to be a joy (and not a pain) the library's API should be simple and the documentation well-written.

I'm sure I could understand this library, if I put effort into it. However doing that has added problems -- I might subtly misunderstand it in ways that my code works most of the time but not always. And why should I bother? Python comes with multiple ways of storing data, such as sqlite, which I've used before and like.

This is not to knock the people behind CodernityDB which for all I know might be an excellent product. But at the moment it's not one I would consider using.


While this is true, it's to the detriment of their project not to have at least some English docs. It grants them the largest possible audience[1] for their project. Generating interest in their tool is one of the things that they need to do if they want it to be a successful open source project.

[1]: I guess it's possible that 'Chinese' might be close too, but I have no idea how the different dialects (Mandarin/Cantonese) would affect this.


> [1]: I guess it's possible that 'Chinese' might be close too, but I have no idea how the different dialects (Mandarin/Cantonese) would affect this.

Cantonese isn't written down, strictly speaking; when you write Chinese, there aren't dialectic differences.


nginx didn't have english docs until 2 years after the project launched. Full story from Igor Here: http://www.ruby-forum.com/topic/151853

Also: Don't blame people for english lang proficiency.


I forked the documentation this morning. I'll edit it and submit a pull request tonight.


pretty rude. why don't you offer to help?


Probably because he has other things he cares about more. I won't deny that his original statement was rude, but we have to be able to point out flaws in things without feeling obligated to try to fix them. Otherwise, we won't be able to give most kinds of constructive criticism, which is bad for everyone.

Of course, if we won't help with problem in an open source project, we shouldn't feel entitled to a fix either.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: