Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There was a giant super-long GitHub issue about improving Rust std mutexes a few years back. Prior to that issue Rust was using something much worse, pthread_mutex_t. It explained the main reason why the standard library could not just adopt parking_lot mutexes:

From https://github.com/rust-lang/rust/issues/93740

> One of the problems with replacing std's lock implementations by parking_lot is that parking_lot allocates memory for its global hash table. A Rust program can define its own custom allocator, and such a custom allocator will likely use the standard library's locks, creating a cyclic dependency problem where you can't allocate memory without locking, but you can't lock without first allocating the hash table.

> After some discussion, the consensus was to providing the locks as 'thinnest possible wrapper' around the native lock APIs as long as they are still small, efficient, and const constructible. This means SRW locks on Windows, and futex-based locks on Linux, some BSDs, and Wasm.

> This means that on platforms like Linux and Windows, the operating system will be responsible for managing the waiting queues of the locks, such that any kernel improvements and features like debugging facilities in this area are directly available for Rust programs.



> This means SRW locks on Windows, and futex-based locks on Linux, some BSDs, and Wasm.

Note that the SRW Locks are gone, except if you're on a very old Windows. So today the Rust built-in std mutex for your platform is almost certainly basically a futex though if it is on Windows it is not called a futex and from some angles is better - the same core ideas of the futex apply, we only ask the OS to do any work when we're contended, there is no OS limited resource (other than memory) and our uncontended operations are as fast as they could ever be.

SRW Locks were problematic because they're bulkier than a futex (though mostly when contended) and they have a subtle bug and for a long time it was unclear when Microsoft would get around to fixing that which isn't a huge plus sign for an important intrinsic used in all the high performance software on a $$$ commercial OS...

Mara's work (which you linked) is probably more work, and more important, but it's not actually the most recent large reworking of Rust's Mutex implementation.


> if it is on Windows it is not called a futex

What is it called?


WaitOnAddress or from Rust's point of view wait_on_address


> Prior to that issue Rust was using something much worse, pthread_mutex_t

Presumably you're referring to this description, from the Github Issue:

> > On most platforms, these structures are currently wrappers around their pthread equivalent, such as pthread_mutex_t. These types are not movable, however, forcing us to wrap them in a Box, resulting in an allocation and indirection for our lock types. This also gets in the way of a const constructor for these types, which makes static locks more complicated than necessary.

pthread mutexes are const-constructible in a literal sense, just not in the sense Rust requires. In C you can initialize a pthread_mutex_t with the PTHREAD_MUTEX_INITIALIZER initializer list instead of pthread_mutex_init, and at least with glibc there's no subsequent allocation when using the lock. But Rust can't do in-place construction[1] (i.e. placement new in C++ parlance), which is why Rust needs to be able to "move" the mutex. Moving a mutex is otherwise non-sensical once the mutex is visible--it's the address of the mutex that the locking is built around.

The only thing you gain by not using pthread_mutex_t is a possible smaller lock--pthread_mutex_t has to contain additional members to support robust, recursive, and error checking mutexes, though altogether that's only 2 or 3 additional words because some are union'd. I guess you also gain the ability to implement locking, including condition variables, barriers, etc, however you want, though now you can't share those through FFI.

[1] At least not without unsafe and some extra work, which presumably is a non-starter for a library type where you want to keep it all transparent.


> The effect of referring to a copy of the object when locking, unlocking, or destroying it is undefined.

https://pubs.opengroup.org/onlinepubs/9699919799/functions/V...

I.e., if I pthread_mutex_init(&some_addr, ...), I cannot then copy the bits from some_addr to some_other_addr and then pthread_mutex_lock(&some_other_addr). Hence not movable.

> Moving a mutex is otherwise non-sensical once the mutex is visible

What does "visible" mean here? In Rust, in any circumstance where a move is possible, there are no other references to that object, hence it is safe to move.


Well, technically if you only have a mutable borrow (it's not your object) then you can't move from it unless you replace it somehow. If you have two such borrows you can swap them, if the type implements Default you can take from one borrow and this replaces it with its default and if you've some other way to make one you can replace the one you've got a reference to with that one, but if you can't make a new one and don't have one to replace it with, then too bad, no moving the one you've got a reference to.


You're right and I edited my comment.


> What does "visible" mean here? In Rust, in any circumstance where a move is possible, there are no other references to that object, hence it is safe to move.

And other than during construction or initialization (of the mutex object, containing object, or related state), how common is it in Rust to pass a mutex by value? If you can pass by value then the mutex isn't (can't) protect anything. I'm struggling to think of a scenario where you'd want to do this, or at least why the inability to do so is a meaningful impediment (outside construction/initialization, that is). I understand Rust is big on pass-by-value, but when the need for a mutex enters the fray, it's because you're sharing or about to share, and thus passing by reference.


Depends on the program, and it can be a very useful tool.

Rust has Mutex::get_mut(&mut self) which allows getting the inner &mut T without locking. Having a &mut Mutex<T> implies you can get &mut T without locks. Being able to treat Mutex<T> like any other value means you can use the whole suite of Rust's ownership tools to pass the value through your program.

Perhaps you temporarily move the Mutex into a shared data structure so it can be used on multiple threads, then take it back out later in a serial part of your program to get mutable access without locks. It's a lot easier to move Mutex<T> around than &mut Mutex<T> if you're going to then share it and un-share it.

Also It's impossible to construct a Mutex without moving at least once, as Rust doesn't guarantee return value optimization. All moves in Rust are treated as memcpy that 'destroy' the old value. There's no way to even assign 'let v = Mutex::new()' without a move so it's also a hard functional requirement.


You can pass the mutex by value and it does continue to protect its value.

https://play.rust-lang.org/?version=stable&mode=debug&editio...


I’m actually thinking of the sheer size of pthread mutexes. They are giant. The issue says that they wanted something small, efficient, and const constructible. Pthread mutexes are too large for most applications doing fine-grained locking.


On a typical modern 64-bit Linux for example they're 40 bytes ie they are 320 bits. So yeah, unnecessarily bulky.

On my Linux system today Rust's Mutex<Option<CompactString>> is smaller than the pthread mutex type whether it is locked and has the text "pthread_mutex_t is awful" inside it or maybe unlocked with explicitly no text (not an empty string), either would only take like 30-odd bytes, the pthread_mutex_t is 40 bytes.

On Windows the discrepancy is even bigger, their OS native mutex type is this sprawling 80 byte monster while their Mutex<Option<CompactString> is I believe slightly smaller than on Linux even though it has the same features.


> On Windows the discrepancy is even bigger, their OS native mutex type is this sprawling 80 byte monster

I guess you are referring to CRITICAL_SECTION? SRWLock, which has the size of a pointer, has been introduced in Windows Vista. Since Windows 8 you can use WaitOnAddress to build even smaller locks.


Yes, CRITICAL_SECTION is far too large. Mara asked some years ago whether SRWLock could guarantee what Rust actually needs for this purpose (the documentation at that time refused to clarify whether we can move it for example) and that's why her change was to SRWLock from CRITICAL_SECTION.

And yes, the newer change uses WaitOnAddress to provide the same API as the futex from the various Unix platforms. Raymond Chen's description of the differences is perhaps rather exaggerated, which isn't to say there's no difference, but it's well within what's practical for an adaptor layer.

Also although the SRWLock itself is the same size as a pointer (thus, 64 bits on a modern computer, compared to a 32-bit Futex) there's a reason it's the same size as a pointer - it actually is basically a pointer, and so in some cases it's pointing at a data structure which we should reasonably say is also part of the overhead.

The pointer is to a large aligned object, which means the bottom bits would be zero and so SRWLock uses those for flag bits. It's a nice trick but we should remember that it isn't really comparable to a Futex though it's certainly cheaper than CRITICAL_SECTION.


I dunno, it seems to me that the standard mutex performs very well on all scenarios, and doesn't have any significant downsides, except for the hogging case, which could be fixed by assigning the non-hogging threads a higher priority.

Whereas parking_lot has a ton of problematic scenarios, where after the spinlock times out, and it yields the thread to the OS, which has no idea to wake up the thread after the resource is unblocked.

It could be even argued that preventing starvation is outside the design scope of the Mutex as a construct, as it only guarantees mutual exclusion and that the highest priority waiting thread should get access to it.


Seems like the simple solution to this problem would be to have both, no?

A simple native lock in the standard library along with a nicer implementation (also in the standard library) that depends on the simple lock?


The simplest solution is for `std::mutex` to provide a simple, efficient mutex which is a good choice for almost any program. And it does. Niche programs can pull in a crate.

I doubt `parking_lot` would have been broadly used—maybe wouldn't even have been written—if `std` had this implementation from the start.

What specifically in this comparison made you think that `parking_lot` is broadly needed? They had to work pretty hard to find a scenario in which `parking_lot` did much better in any performance metrics. And as I alluded to in another comment, `parking_lot::Mutex<InnerFoo>` doesn't have a size advantage over `std::mutex::Mutex<InnerFoo>` when `InnerFoo` has word alignment. That's the most common situation, I think.

If I were to make a wishlist of features for `std::mutex` to just have, it wouldn't be anything `parking_lot` offers. It'd be stuff like the lock contention monitoring that the (C++) `absl::Mutex` has. (And at least on some platforms you can do a decent job of monitoring this with `std::mutex` by monitoring the underlying futex activity.)


With zero prior knowledge of the context, the impression I got from the parent comment was that `parking_lot` had features that made it desirable enough that there was a decent amount of discussion about replacing the standard lib locks with it. I didn't catch that the previous standard lib option was terrible and has since been replaced with something better.


This. the standard library has a responsibility to provide an implementation that performs well enough in every possible use case, while trying to be generally as fast as possible.


My takeaway is that the documentation should make more explicit recommendations depending on the situation -- i.e., people writing custom allocators should use std mutexes; most libraries and allocations that are ok with allocation should use parking_lot mutexes; embedded or libraries that don't want to depend on allocate should use std mutexes. Or maybe parking_lot is almost useless unless you're doing very fine-grained locking. Something like that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: