Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I was left confused after reading this article.

1. It claims Rust is ~10x faster than Ruby, based on a benchmark that reads a 23mb file, and then iterates over the data a single time. In my experience, Rust is between 20-100x faster than Ruby in purely CPU-bound workloads. But the author's main contention is that most work is IO-bound instead of CPU-bound, so probably not a big deal.

2. The author claims "it hardly matters that Ruby halts all code for 15ms to do garbage collection, if the fastest database-query takes 150ms". I've written applications that query Postgres databases with tens of millions of rows, where the 99th percentile response times are <10ms. I'm not sure why it just needs to be taken as a given that databases will take 150ms to return any data.

3. This flame graph from the article[0] seem to show that the vast majority of the request time is spent in Ruby parsing timestamps, rather than in the database. This seems to make the opposite point that the author is trying to make. I'm not familiar with this stack, so maybe I'm missing something- can anyone explain?

[0]: https://berk.es/images/inline/flamegraph_sequel_read.svg



I'm also confused.

2. Agreed. 150 ms is extraordinarily slow for a point lookup from a single table. A simple lookup should take around 1 ms since it'll be cached in memory.

3. Also agreed. The actual database time looks to be the first flame. Hovering over it shows PG::Connection::exec which accounts for 2.5% of the time.

I was curious about date parsing and dug up the source [1]. Seems like you could gain a ton of speed back by using a postgres specific timestamp parsing routine. In Go, it's 40 lines [2]

[1] https://github.com/ruby/date/blob/d21c69450a57a1931ab6190385...

[2]: https://github.com/jackc/pgx/blob/a968ce3437eefc4168b39bbc4b...


Author here. Sorry for the confusion.

1. I didn't want to make the point that Rust is faster, per sé. But wanted to show why it is faster. Because that teaches us a lot about when it matters. Indeed, IO bound vs CPU bound. The collecting/reducing is CPU and memory juggling-bound the reading of the file IO. that IO part should hardly matter, but the processing is what makes the difference. Ruby is slow here. Rust isn't. Point being: when you are doing a lot of CPU-bound stuff, ruby is probably a bad choice (and Rust a good one). But since in practice of web-services are almost all about IO (and some de/serialization) it matters less there.

2. I too have written PG backed services (in both Rust and Ruby) where database-collection is under 10ms. In a typical SAAS/PAAS setup, however, there will be a network between app and db, adding to latency. Localhost/socket makes it a lot faster, esp. visible on queries that themselves are fast. The main point, however, wasn't the absolute measures, but the relative numbers. When -relatively- your GC halting becomes significant compared to waiting times for the database, then certainly: Ruby is a severe bottleneck. This happened to me often on convoluted (and terrible) rails codebases. Where GC locking was measured in seconds (but where sometimes database queries were measured in tens of seconds).

3. The flamegraph indeed shows that Datetime::parse is the bottleneck in that particular setup. I tried to explain that with:

> The parsing (juggling of data) takes the majority of time: DateTime::parse. Or, reversed: the DateTime::parse is such a performance-hog, that it makes the time spent in the database insignificant.

But I also tried to spend time to explain all the situations in which this is mitigated. Yes! in this case Ruby truly is the bottleneck. But when we move to more wordly cases, these bottlenecks shift towards the database again. E.g. a write-heavy app. Or one that uses complex lookups that return little data.

Again, sorry for the confusion. I guess the title simply doesn't match the actual content of the article very well. Which is more about "when does the bad performance of Ruby, the language, really matter, and when doesn't it". I hoped the intro solved this, but should probably have spend more time on a better title. Sorry.


If the query selects a small number of rows and columns, and uses an index, it typically takes less than 1 ms on modern hardware. At least with MySQL.

ORM however might be slow. For example, taking 100 rows from SQLite with SQLAlchemy takes 1ms, but getting them as ORM objects takes 8 ms.


And the ORM is written in... Ruby? So then we are back to "Ruby is slow".


SQLAlchemy is python. Which is slower than Ruby :P


That's apples to oranges. Python is slower, but Sqlalchemy ORM might be able to fetch results faster than Ruby. Though I'd assume the difference would be neglectible.


Yeah. I'm confused, too.

Anecdote: I help maintain a pretty slow Rails app. I recently did some data munging in Go against the MySQL database that backs the Rails app. The Go tool was so fast, I thought it hadn't worked. It was basically instant. Accomplishing the same goals with Rails would have been slower by a factor of 10 in my experience.

I know Ruby != Rails, but if I'm doing this sort of thing in Ruby, I'm generally doing it in Rails or with a lot of the gems that Rails uses, so it's a fair comparison for my uses.


> I'm not sure why it just needs to be taken as a given that databases will take 150ms to return any data.

Run your database on a t2.small instance on AWS (1 vCPU, 2GiB RAM). Why would you do that, you ask? I don't know, but that's what we got on an old job.

This was also used to prove MongoDB is faster than PostgreSQL, even though Mongo was running on-prem on much better hardware.


Nope, single digit millisecond performance for me on those nodes when the tables are cached - anything else and you are complaining about performance of the storage medium.


How much data? We were doing a few million writes/day on peak days and the nodes couldn't keep up.


A few million writes a day is still well within the write performance of one of those nodes. But... we were talking about querying the data, no?


Well yes, but if your database is busy writing, it's going to have less time for reading.


I'm not sure what your point is. You said the nodes are slow. They are not slow, and will handle thousands of requests a second when configured correctly.

If you arent getting that then you are doing something big, something inefficient, or something stupid - and that would be the same on any size node.

Size your instances accordingly.


That’s covered on literally the line after the chart:

> The parsing (juggling of data) takes the majority of time: DateTime::parse


Yes, that's the part I'm confused about. This looks like a case where the vast majority of request time is spent in Ruby, so it would seem a faster language could give a significant speed up.

But right after seeming to acknowledge this, the author instead concludes that "even with a very poor performing ORM, the Database remains the primary time consumer".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: