I was left confused after reading this article. 1. It claims Rust is ~10x faster...

sa46 · on Nov 9, 2022

I'm also confused.

2. Agreed. 150 ms is extraordinarily slow for a point lookup from a single table. A simple lookup should take around 1 ms since it'll be cached in memory.

3. Also agreed. The actual database time looks to be the first flame. Hovering over it shows PG::Connection::exec which accounts for 2.5% of the time.

I was curious about date parsing and dug up the source [1]. Seems like you could gain a ton of speed back by using a postgres specific timestamp parsing routine. In Go, it's 40 lines [2]

[1] https://github.com/ruby/date/blob/d21c69450a57a1931ab6190385...

[2]: https://github.com/jackc/pgx/blob/a968ce3437eefc4168b39bbc4b...

berkes · on Nov 9, 2022

Author here. Sorry for the confusion.

1. I didn't want to make the point that Rust is faster, per sé. But wanted to show why it is faster. Because that teaches us a lot about when it matters. Indeed, IO bound vs CPU bound. The collecting/reducing is CPU and memory juggling-bound the reading of the file IO. that IO part should hardly matter, but the processing is what makes the difference. Ruby is slow here. Rust isn't. Point being: when you are doing a lot of CPU-bound stuff, ruby is probably a bad choice (and Rust a good one). But since in practice of web-services are almost all about IO (and some de/serialization) it matters less there.

2. I too have written PG backed services (in both Rust and Ruby) where database-collection is under 10ms. In a typical SAAS/PAAS setup, however, there will be a network between app and db, adding to latency. Localhost/socket makes it a lot faster, esp. visible on queries that themselves are fast. The main point, however, wasn't the absolute measures, but the relative numbers. When -relatively- your GC halting becomes significant compared to waiting times for the database, then certainly: Ruby is a severe bottleneck. This happened to me often on convoluted (and terrible) rails codebases. Where GC locking was measured in seconds (but where sometimes database queries were measured in tens of seconds).

3. The flamegraph indeed shows that Datetime::parse is the bottleneck in that particular setup. I tried to explain that with:

> The parsing (juggling of data) takes the majority of time: DateTime::parse. Or, reversed: the DateTime::parse is such a performance-hog, that it makes the time spent in the database insignificant.

But I also tried to spend time to explain all the situations in which this is mitigated. Yes! in this case Ruby truly is the bottleneck. But when we move to more wordly cases, these bottlenecks shift towards the database again. E.g. a write-heavy app. Or one that uses complex lookups that return little data.

Again, sorry for the confusion. I guess the title simply doesn't match the actual content of the article very well. Which is more about "when does the bad performance of Ruby, the language, really matter, and when doesn't it". I hoped the intro solved this, but should probably have spend more time on a better title. Sorry.

codedokode · on Nov 9, 2022

If the query selects a small number of rows and columns, and uses an index, it typically takes less than 1 ms on modern hardware. At least with MySQL.

ORM however might be slow. For example, taking 100 rows from SQLite with SQLAlchemy takes 1ms, but getting them as ORM objects takes 8 ms.

rob74 · on Nov 9, 2022

And the ORM is written in... Ruby? So then we are back to "Ruby is slow".

boxed · on Nov 9, 2022

SQLAlchemy is python. Which is slower than Ruby :P

tomwojcik · on Nov 9, 2022

That's apples to oranges. Python is slower, but Sqlalchemy ORM might be able to fetch results faster than Ruby. Though I'd assume the difference would be neglectible.

christophilus · on Nov 9, 2022

Yeah. I'm confused, too.

Anecdote: I help maintain a pretty slow Rails app. I recently did some data munging in Go against the MySQL database that backs the Rails app. The Go tool was so fast, I thought it hadn't worked. It was basically instant. Accomplishing the same goals with Rails would have been slower by a factor of 10 in my experience.

I know Ruby != Rails, but if I'm doing this sort of thing in Ruby, I'm generally doing it in Rails or with a lot of the gems that Rails uses, so it's a fair comparison for my uses.

tpxl · on Nov 9, 2022

> I'm not sure why it just needs to be taken as a given that databases will take 150ms to return any data.

Run your database on a t2.small instance on AWS (1 vCPU, 2GiB RAM). Why would you do that, you ask? I don't know, but that's what we got on an old job.

This was also used to prove MongoDB is faster than PostgreSQL, even though Mongo was running on-prem on much better hardware.

supermatt · on Nov 9, 2022

Nope, single digit millisecond performance for me on those nodes when the tables are cached - anything else and you are complaining about performance of the storage medium.

tpxl · on Nov 9, 2022

How much data? We were doing a few million writes/day on peak days and the nodes couldn't keep up.

supermatt · on Nov 10, 2022

A few million writes a day is still well within the write performance of one of those nodes. But... we were talking about querying the data, no?

tpxl · on Nov 11, 2022

Well yes, but if your database is busy writing, it's going to have less time for reading.

supermatt · on Nov 11, 2022

I'm not sure what your point is. You said the nodes are slow. They are not slow, and will handle thousands of requests a second when configured correctly.

If you arent getting that then you are doing something big, something inefficient, or something stupid - and that would be the same on any size node.

Size your instances accordingly.

wonnage · on Nov 9, 2022

That’s covered on literally the line after the chart:

> The parsing (juggling of data) takes the majority of time: DateTime::parse

boloust · on Nov 9, 2022

Yes, that's the part I'm confused about. This looks like a case where the vast majority of request time is spent in Ruby, so it would seem a faster language could give a significant speed up.

But right after seeming to acknowledge this, the author instead concludes that "even with a very poor performing ORM, the Database remains the primary time consumer".