Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Writing a MySQL storage engine from scratch (2016) (codeproject.com)
202 points by yinso on June 15, 2017 | hide | past | favorite | 23 comments


Somewhat related, I recently started trying to reverse-engineer the MySQL wire protocol. My idea is that if this is better documented then it will be easier for people to make databases that interoperate with MySQL clients but aren't MySQL that much easier (personally I want a custom Shard-Query-ish thing).

This is obstinately documented (http://dev.mysql.com/doc/dev/mysql-server/latest/PAGE_PROTOC...), but I can't find a nice cheat-sheet saying "to send a query, you send this packet and you receive these packets" etc. I've taken to looking at e.g. the pure-Python PyMySQL package source-code https://github.com/PyMySQL/PyMySQL/blob/master/pymysql/conne...

Here's my progress so far: https://gist.github.com/williame/ebf003dd85d5949f551a72331ae...

Anyone know what commands you can send, and what the expected responses are?


Take a closer look at the MySQL documentation. You dont have to reverse engineer anything - it's all there and documented (source: wrote a client impl based on the docs)

http://imysql.com/mysql-internal-manual/client-server-protoc...

http://imysql.com/mysql-internal-manual/connection-phase.htm...

http://imysql.com/mysql-internal-manual/com-query.html

etc


This is not the MySQL manual (dev.mysql.com) but a third party copy that is at least a few years old.

We're working on moving the internals documentation into the server code with Doxygen: http://mysqlserverteam.com/mysql-8-0-now-documented-with-dox...


I can see what parents issue was now :) I had a look at the doxygen and all the goold old protocol documentation seems to be deleted/missing from the "official" documentation website. How come -- I found it really useful?

- https://dev.mysql.com/doc/dev/mysql-server/8.0.0/page_protoc...

- https://dev.mysql.com/doc/dev/mysql-server/8.0.0/page_protoc...


Actually I remembered now that the last time I worked with the MySQL protocol (~2014) I had a similar problem: Some of our servers were still running an old version of MySQL so the client had to be backwards compatible. However, this old version of MySQL also used the "old mysql password" authentication which was found to be unsafe/broken at some point. So the MySQL guys deleted the documentation on how the old scheme worked and replaced it with a notice saying that it is insecure. Tough luck if you still had to support/implement it. The same thing has apparently happened here. Not only is the documentation gone, it's actually gone without a trace on the official website.


The content is being re-incorporated in the Doxygen docs, which is generated from the source.

(The internals manual was out of date in many cases. I do not recommend relying on the links mentioned here.)


Crikey, thx.

There's a lot of dead links, e.g. http://imysql.com/mysql-internal-manual/com-query.html doesn't describe the response? Got any idea what a query response consists of?


Turns out that all of this stuff has apparently been deleted from the official docs some time ago, so that's probably why you didn't find it :) (see sibling comment).

When I did a quick google to pull up the docs that I knew I had read when I implemented the mysql client I found & linked you to a non-official, old copy of the mysql docs from back then.

You can still find the docs for COM_QUERY result on the net though: https://github.com/cwarden/mysql-proxy/blob/master/doc/proto...


You might also want to look at the Wireshark module for MySQL: https://github.com/wireshark/wireshark/blob/master/epan/diss...

It does all the work of figuring out what's a request/response and parsing / dumping out various parts of the protocol.


You may want to also dig into Drizzle, a BSD licensed wire-compatible client library, for some of the more obscure details.

It's a few years old now, and maybe unsupported, but it was a cleanroom re-implementation by former MySQL devs, and is likely cleaner as a result.


Am I the only one who likes the "C with classes" approach over C++?


Depends on what you mean by C++. C++11, C++14, C++17 have features that are must have to me.


Well thank goodness we live in 2017.


Never got into C++, so in a way - yes :-)


I've heard this called imperative programming with classes.


Great article, makes it very clear what parts of the database are- and are not- provided by the storage engine. I wrote a C wrapper for BoltDB recently, i wonder how MySQL would perform linked to that?

I have some memories of looking at VB6 code on codeproject.com over a decade ago, interesting to see that the site is still around and useful.


Sqlite has a fairly straightforward way to implement your own storage engine as well, see the VFS documentation: https://sqlite.org/vfs.html

Virtual tables are also interesting, but higher level, somewhat like a fuse filesystem. https://sqlite.org/vtab.html


Interesting article. Not many people work on low-level storage engines nowadays.


Oh, and it becomes painfully obvious why MySQL cannot use more than one index while executing the query.


It has been able to use more than one index per table since MySQL 5.0 (2005):

https://dev.mysql.com/doc/refman/5.7/en/index-merge-optimiza...


Can PostgreSQL use more than one index per table while executing a query?



one index per table

(Sorry for being pedantic)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: