Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The Unexpected Importance of the Trailing Slash (tookmund.com)
213 points by pabs3 on April 9, 2022 | hide | past | favorite | 51 comments


Back in the day, the company I worked for required all newly hired system and software engineers to take their Unix Fundamentals class. 8 hours a day for 4 days they covered how the shell, coreutils, and filesystems work.

Sometimes people would complain that they didn't need a "beginner" class but I bet there wasn't a single person who didn't learn some important subtlety like how the trailing / works in different situations.

I've never had another company require or even offer a training like that. Do other companies still teach this stuff?

edit: typos


I’m not familiar with the “Unix Fundamentals” class (got a link?) but when I have interns I make the “missing semester”[1] course part of their onboarding. I find it’s a nice balance of pragmatism and the classroom structure they are already familiar with.

I will also note that I think skills like this end up being way more valuable to me as a mentor than hard CS knowledge.

[1] https://missing.csail.mit.edu/


Unfortunately I don't have a link. It was an internally developed class at that company. This was in 2002.


When I was much younger I worked at a small ISP/CLEC started with a week of training, including editing regex mail filters, configuring bind, and sending email via telnet.

It was a tech support job for mostly dial up customers. The manager had developed a binder of training materials that included many things to help us troubleshoot.


JP Morgan Chase does (or did) offer a 18 month mainframe training program called zUniversity that combined in-house materials and IBM training programs. Not sure about new hires (it would have varied by team and line of business), but they also had lots of formal and instructor led training courses for new technology, frameworks, and patterns


Debian has a #debian-til chat channel where people post things they learned today. Often there will be things that someone learned that senior folks will say they didn't know. Sometimes people say that they re-learned some basic thing they forgot too :)


There was an old joke at my company that anyone who showed up for the beginner's Unix class would be fired on the spot.


That seems like a terrible joke. Even as a joke rather than a real policy, that's a great way to create a culture where people are afraid to ask questions and instead pretend to understand things they don't.


I personally always found it intuitive. But the rules are easy to remember: dir/ means "inside dir", dir means "dir and what's inside".


/dir to me is "the dir itself".


Hot tip: when making a symlink to a directory, include the trailing slash ('ln -s dirname/ linkname') so that if the directory disappears, a copy to the link results in an error instead of creating a file called 'linkname'.


Do you mean a file called dirname? Interesting behavior in either case.


No, `linkname` is correct. It’s the name of the link in the example. So the command `cp <file> linkname/` will give you an error if the taget is gone.


No, `dirname` is correct. The goal of `ln -s dirname/ linkname` is that even if you accidentally do `cp file linkname` (note without slash) when dirname is gone, it will error out. Without the trailing slash in the ln command, a file (instead of directory) named dirname will be created.


rsync is what taught me to respect the trailing slash.


When I was implementing rclone which implements a lot of rsync's functionality but for cloud storage, I deliberately didn't copy the trailing slash feature.

This annoys people who know rsync well, but newbies are very happy with it!

Even though I know rsync extremely well I am super careful with this. Test first with `--dry-run`!


Thanks so much for rclone. Awesome piece of software.


I love rsync for what it can do, but I hate that with a passion. It's terrible design that should have never seen the light of day.


I end up reposting this comment whenever rsync's slash treatment gets mentioned.

rsync has "weird" syntax for a reason. Unlike other unix-like commands, it treats trailing slashes as significant AND consistently. If a directory has a trailing slash, it means "contents" of the directory". No slash means "the directory itself". These are two different concepts, and a program that copies directories should take the difference into account. scp (and cp, for that matter) don't take this difference into account. That leads to gotchas with recursive (-r) copying. Most importantly, scp isn't idemopotent:

scp -r fromdir todir

If todir doesn't exist, scp will copy the contents of directory fromdir to a new directory named todir.

Execute the same command again (now that todir exists), and scp will copy fromdir to todir/fromdir .

On the other hand:

rsync -a fromdir/ todir

will always copy the contents of fromdir into a directory named todir (effectively, a directory rename operation), whether todir exists or not.

rsync -a fromdir todir

will always copy the directory fromdir into the directory todir, whether todir exists or not.

These rsync operations are idempotent, which it important because rsync is designed to incrementally re-sync directories. It is expected that it will commonly be run more than once, which is why it needed to address this IMHO fundamental bug/limitation in cp and scp.


I agree with you, but the cases when I want to copy the contents of a directory instead of the directory are extremely rare.

Because such cases are so rare, it does not make any difference for me if I have to add "/*" instead of the slightly shorter "/" (even when I also have to add "/.*" for hidden files/directories).

So I would like if rsync would have a command-line option to ignore trailing slashes, like other programs have (and which I always use with them).

As it is, rsync is the only program where I have to be extra cautious before executing a command-line, because the directory names are in most cases provided by auto-complete, and most shells terminate them with a trailing slash, which I have to be very careful to always delete, which slows me a lot.

An exception is when using zsh, which sometimes, but not always, deletes itself the trailing slash added by auto-complete (zsh might be confused by the aliases used by me, so it usually deletes the trailing slashes when they do not matter, but retains them when I would want them to be deleted).


> Because such cases are so rare, it does not make any difference for me if I have to add "/*" instead of the slightly shorter "/" (even when I also have to add "/.*" for hidden files/directories).

I think this is the wrong way to think of it. It's more like "rsync foo/ bar/" means "make these two directories the same" - it does change the last-modified timestamp on bar/ to match foo/. If you use "foo/*" it excludes the directory and last-modified becomes now.

For all the combinations this is what's in my head, and I find it for the most part very consistent and predictable:

* rsync -a foo bar/ -> put this file in this directory (a directory is a special type of file, so you end up with bar/foo/)

* rsync -a foo/ bar/ -> put the contents of foo/ into bar/ (but also we're syncing the directories, which is why bar/'s last-modified is changed)

* rsync -a foo bar -> put this file in here, whatever "bar" is (makes a file named bar, unless foo is a directory - you can't put a directory inside a non-directory file, so it makes directory bar and puts foo inside it)

* rsync -a foo/ bar -> This is the only one I have issues with because it doesn't particularly make sense so I never have reason to even try using it - put a directory into a file? But it's the same as "foo/ bar/" case.


> If you use "foo/*" it excludes the directory and last-modified becomes now.

True, but instead of adding "/" to the source you obtain the same effect by deleting the last component of the destination (which is normally a non-action, you just do not auto-complete far enough).

If you meant that you might want to have "foo" and "bar" directories with the same content and timestamps, but with different names, you can rename the target directory before and after the rsync.

This is negligible overhead, because in many decades of using computers for all kinds of applications, I have not encountered a single case when I would have needed to have 2 directories with identical content and even with identical directory timestamps, but with different directory names.

Making a backup directory in the same directory with the original directory would not be a good example, that would be just stupid. Any backup directory should be placed as far as possible from the original directory, when not on a different filesystem, in which case they can have the same name.

The way how rsync handles names without trailing slashes is completely right.

The way how rsync handles trailing slashes to encode alternative behaviors might have been OK if this feature would not have been in conflict with the auto-complete feature of the shells.

For all the rsync behaviors that can be obtained by adding some "/", there are alternatives almost as simple as that.

A slight simplification in the encoding of the seldom used alternative behaviors is paid by a large complication in the typing of the frequently used behaviors, where you must always insert a backspace + space sequence after each directory name, to delete the dangerous trailing slash.


> you can rename the target directory before and after the rsync.

The problem is that the ergonomics of renaming a directory on a remote machine is vastly inferior to doing so on a local machine. Locally, you can use mv. There is no "rmv".

Even worse, rsync is from a different generation of unix utilities than cp/mv or rcp/rsh or scp/ssh. It is more "Swiss army knife" style, able to stand on its own. It shouldn't need to depend on a mythical rmv command.

Finally, I very frequently use the "copy a directory to a new name" feature, both locally and remotely. Locally, I use it to make a quick backup of a directory for emergency recovery. Remotely, syncing a local log directory to a remote "log.machine-name" is pretty common I think. Without renaming, you would do "log -> remote:machine-name", but machine-name would have to exist on the remote machine, necessitating a separate "rmkdir" command.


Great explanation! I prefer rsync to cp, etc., because of this consistency, but I never appreciated the idempotency implications.


Once I found the rsync --relative (-R) command mode and its /./ meta syntax, I no longer appreciated this trailing slash feature as a good idea on its own.

When you are worrying about idempotent re-sync of trees, I think it is so much more important to think about where the roots of the synchronization should match up between local and remote, and to make that explicit in the calling convention.

   rsync -Ra /src-only/./{a,b,c}  remote:/remote-only/./
I also think you should be writing this idempotent call in a script to presumably be reused over time, and this helps document the goal.


I didn't know about this meta-syntax, thanks - it looks very useful.

I would have to think about how it fits in with the philosophy of the trailing slash sometime, but I don't want to hurt my head right now.


Isn't that problem also solved with the `dir/.` syntax? Adding the `.` signifies content versus the directly itself

I think it's `cp` that uses that syntax


I knew most of this but it was a great refresher and I learned a few more details! Thanks!


I think that's a failure on SCP side. It should have never allowed scp -r dir target_dir with non-existent target_dir in the first place.


I get what you are saying: cp/scp should have consistent behavior when the second arg is a directory. The "problem" operation you describe is analogous to copying plain files, though (copying contents to a new name).

I don't think having to create a target directory first is obvious, although neither is the fact that bare "cp" doesn't copy directories.

Finally, for remote copying it would be a nightmare for scp to require the destination directory to exist. To copy a directory tree to a remote server with a new name you'd need to either ssh a remote mkdir command, or add a flag to scp to specify the new name.


I wish it had some kind of rsync repositories

To make backups, you clone a directory into the rsync repository, and when you later want to update the backup, you pull the newest changes from the directory into the repository.

And it would be impossible to destroy the backup by forgetting a slash or syncing another directory on the wrong backup because it would remember which directory belongs to which backup repository.


Came here to say exactly that, smiled at yours being the first comment. And further I used to hate rsync's parsing of the trailing slash and now have come to love it, I don't know when the transition happened.


I'm still scared every time I use rsync because of that and have to thoroughly check the man-page examples :)


No need to fear! Use --dry-run to confirm your command line before making the changes. It's documented in that very man page!


I have a list of aliases I pass to my minions, one is to alias rsync with this, and the real rsync is aliased as rsyncplz


That kind of stuff sounds good, but it’s dangerous. You risk getting used to it, and then getting hosed when you’re on a normal system. Some people use a similar thing where they set "rm" as an alias to "rm -i", i.e. ask interactively before removal of each file.

I once heard a story about some SunOS consultant who was used to "rm" being an alias for "rm -i", and the first thing this consultant did, as root, was to "cd /etc" and then "rm *".


One of my first introductions to the fact that GNU getopt is much different from BSD/UNIX getopt was running `rm * -i` on Solaris. Instead of prompting me for which files to delete, it removed all of my files and then printed an error about failing to remove the non existent file "-i".


In principle this could be solved by having two aliases `rsync-dry-run` and `rsync-for-real`, and aliasing `rsync` to `echo "Use rsync-dry-run or rsync-for-real“`.


That's great, I'll do that now. Thanks!


I've recently had to copy a bunch of files over two Windows Servers and was pleasantly surprised by robocopy's (basically Windows' rsync) solution to this dilemma: the first two parameters always only refer to directories, and then you can list specific files starting with the third parameter.

This way there can be no confusion whether you're copying the contents of a directory or the whole directory, and you can't create a duplicated directory within your destination by mistake.


Every time I use one of the --delete/--delete-* arguments I first try with --dry-run since I don't trust myself that I got the trailing slashes in the right spot.


Actually a trailing slash is significant not only in rsync, but in most other command-line utilities.

Unfortunately, rsync lacks a helpful option like the "--strip-trailing-slashes" of GNU cp and mv.

I never use bare cp or mv, but only aliases which replace their undesirable default options with other more useful options, including "--strip-trailing-slashes", which makes the presence or absence of a trailing slash irrelevant.


same. my rule of thumb is: always always always use the trailing slash on source/ and target/ paths and it will always do the right thing.


Yes there are inconsistences and surprises with trailing slash handling. For e.g rmdir doesn't follow symlink-to-dir/ on Linux. I tried to at least clarify what was happening in this case recently: https://github.com/coreutils/coreutils/commit/9de1d153


Don't forget the variation where you also end it with a dot: foo/.


"I really mean it".


What does that do?


Great question, I was wondering how if anyone would be interested to ask!

tl;dr: Basically, it's a way to say "do this operation inside this folder".

For example, if you do

  cp -r A B
an A is a directory, the behavior is different depending on whether B already exists or not. If B is a file, you get an error. If B is a directory, then it tries to make a copy of A inside B (i.e. B/A). If B doesn't exist, B itself becomes a copy of A.

This is pretty terrible behavior. The question is how do you avoid this. On Linux, if you always want A to be overlaid on top of B, then you can do

  cp -T A B
which disambiguates in favor of overlay. On the other hand, if you always want A to be copied inside B, then you can do

  cp A B/.
which disambiguates in favor of creating a subdirectory (and gives you an error if B doesn't exist).

Note: I have not checked whether this behavior is uniform across other POSIX systems, but I hope it is.


I'm starting to be convinced that symlinks are way more trouble than they're worth, especially in modern filesystems with reflink support.

Reflink copies are basically invisible, except when you need to meter disk usage. Symlinks and hardlinks add a billion corner cases to file handling.


ZFS not having reflinks is still so sad. https://github.com/openzfs/zfs/issues/405


Ahh well the pitfalls of writing portable shell scripts. There is still the myth flying around that bash scripts are the best solution for write once run anywhere scripts. I tip my head to the projects that are written in plain bash or sh.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: