I think it's all about branding, the same disgrace that plagues mobile operating...

thekingofravens · on Aug 23, 2021

Avoiding this cycle of crap is my personal favorite thing about Linux.

titzer · on Aug 23, 2021

The cycle of crap on Linux is different; less corporate and underhanded, but more of an ever-expanding bloat. Install Clang/LLVM. It's like 300MB. I remember when almost every system had a fairly small C compiler. It had to be small, because it was the basis of everything else. Now the base is enormous. OCaml is something like 200MB. And of course, it has its own package manager. So does Python, and Ruby, and all these other things that are supposed to be the base of so much other software.

Take another example. Install LaTeX. It's something like 5GB. It's huge because it bundles enormous numbers of packages.

It seems like there are zillions of Linux packages > 100MB. What does all this crap do? Why does everything depend on everything?

Take another example. Node. Building it from scratch takes a pretty beefy machine and a lot of time. (It takes over 20 mins on my 6-core workstation with 32GB RAM). Most of that is building V8. When I worked on V8, we periodically spent some time trying to get build times under control, but the needle barely moved until it got going again. We spent months and months of effort, over years, splitting V8 into more source files and more directories and enforcing header discipline, but all of it made build times worse. Despite how cool V8 is, I feel embarrassed in retrospect that the build system is so bonkers.

Linux is like this everywhere. Monstrous and labyrinthine. It really is impossible to understand it all now.

DiabloD3 · on Aug 23, 2021

Linux has nothing to do with Node, though. Most of the Linux world (and the Windows world, and the OSX world, etc) wishes it would just go away forever. Javascript is unadulterated pain.

As for Linux package sizes go... why are you installing so many packages that you don't need just to complain about it?

On Debian and Ubuntu, `dpkg-query -Wf '${Installed-Size}\t${Package}\n' | sort -n` will tell you the install size of things sorted by worst offender; for me, on one of my development machines, is git, followed by Perl packages required by the system, neovim, and then a bunch of normal things expected on any install. `df -h` minus `/home` is a hair over 600mb.

git, being the largest thing, has an install size of 38 megs. Indeed, I cannot tell you why git is 38 megs, there may or may not be bloat here.

As a comparison, Windows uses around 6GB of space, and a MSVC install that has a common set of toolchains may take up to 20GB and... arguably does less than my <1GB Linux install (when it comes to dev work, anyways).

fsckboy · on Aug 24, 2021

> why are you installing so many packages that you don't need just to complain about it?

there are 1000s of packages, who has time to select a small subset of that? easier to install chubby swathes at a time.

and you can't install devel versions without bringing in many various forms of TeX

recently I install the emacs package on a system and it required python; that's sacrilege

kg · on Aug 24, 2021

Your dev linux machine is a gigabyte? That's kind of surprising. Just llvm+clang on my machine according to your query clocks in at over a gigabyte. Are you only developing using perl and not doing any from-source builds?

DiabloD3 · on Aug 24, 2021

The machine I looked at doesn't have gcc installed. With gcc, it'd probably cross the 1GB line, but not by far.

Y_Y · on Aug 23, 2021

dpigs from moreutils is also a nice way to find bloated packages on Debian-based systems.

avhception · on Aug 23, 2021

While I share some of your resentment (especially as a Gentoo user who builds Chromium quite regularly), a few extra gigabytes of storage or a few more config files I don't grok are relatively easy to ignore compared to the kind of dark patterns I see happening with 3rd party software on Windows desktops. And Microsoft itself is increasingly willing to sink to that level as well.

NullPrefix · on Aug 23, 2021

>Install Clang/LLVM. It's like 300MB

Isn't this like a first computer world problem? How many gigabytes does current Visual Studio take? Not talking about VS Code, because it's only an IDE, which still requires the actual Visual Studio C++ as the compiler.

Tijdreiziger · on Aug 23, 2021

If anything, it seems like a non-first world problem. I assume they have older, less powerful hardware in developing countries.

AndriyKunitsyn · on Aug 23, 2021

It is possible to have “headless” MS compiler & build tools in Windows, by installing Windows SDK. It may be used with VSCode.

The full Visual Studio is not required.

NullPrefix · on Aug 24, 2021

How much space does it take?

ptx · on Aug 24, 2021

A surprisingly large amount. Many gigs. I'm not on my Windows machine right now, so I can't give you any figures, but it's enormous.

asddubs · on Aug 23, 2021

>Install LaTeX. It's something like 5GB

if you install the full version that has every single package for everything, all with their own manuals etc. At least on debian-based distros there are options to just get the ones relevant to you.

rossy · on Aug 24, 2021

Maybe, but it feels unfair to bring software bloat into a comparison between Linux and Windows. I don't think Linux could ever come close to the degree of software bloat in Windows. Your Clang install is 300MB? Sure. My Visual Studio install is 4.5GB, and it includes at least two versions of Clang (the DirectX shader compiler is a fork of an old version of Clang.) You think dependencies are a problem? You should be happy there's only one copy of each library on your computer (two if you have a multilib system.) I counted more than 25 copies of the C runtime on one Windows computer. You think Linux is monstrous and labyrinthine. Have you seen the scope of the Windows API? There's no comparison.

> It seems like there are zillions of Linux packages > 100MB. What does all this crap do? Why does everything depend on everything?

I dunno man, but software can get complicated, and a lot of these packages probably have features that people use. Why quibble over hundreds of megabytes when popular Windows packages like Office and Creative Cloud are multiple gigabytes? It all seems very unfair that Linux is subject to this level of scrutiny when other systems have worse bloat by literal orders of magnitude.

matheusmoreira · on Aug 24, 2021

Modern Linux user space is monstrous and labyrinthine. Linux itself is not. I found it to be a really clean system.

The basis of everything is actually the Linux system call binary interface. We can actually trash the entire user space and start from scratch with nothing but Linux. We can even trash the ubiquitous GNU stuff if we want.

Why can't we have a compiler with built in system call support? Just add a system_call keyword that inlines Linux system call code using the supplied parameters. No need for libc bullshit in the middle. No need for C or any specific language. Someone could make a language today and that single feature would make it as capable as C is for systems programming. They could write software and boot Linux directly into it.

titzer · on Aug 24, 2021

> Why can't we have a compiler with built in system call support?

Funny you should ask that. That is exactly how Virgil's compiler supports the Linux (and Darwin) kernels. Other than generating a small amount of startup assembly (10-20 ins), the compiler just knows the ELF (and MachO) binary formats and the calling conventions of the respective kernels. With some unsafe escape hatches (e.g. getting a pointer into the middle of a byte array), the rest is regular Virgil code that calls the kernel directly.

Take a look, I've been working on this for more than 10 years:

https://github.com/titzer/virgil/blob/master/rt/x86-64-linux...

The "Linux.syscall" is a special operator know to the compiler and it will let you pass an int (the syscall number) and whatever arguments you want (any types--it is implemented with flattening and polymorphic specialization) to the kernel.

With this I have implemented all kinds of stuff, including the userspace runtime system and even a JIT compiler (for my new Wasm engine).

matheusmoreira · on Aug 24, 2021

Thanks for this, it's extremely awesome! Really happy to see others have gone so much farther than I ever did.

I started looking into this myself some years ago. Even started developing a liblinux with process startup code and everything. Abandoned it after I found the kernel itself had an awesome nolibc.h file that was much more practical for my C programming needs:

https://elixir.bootlin.com/linux/latest/source/tools/include...

My code is in a bad state but if you'd like to take a look:

https://github.com/matheusmoreira/liblinux

It's amazing how this really lets you do everything... Want a JIT compiler? Map some executable pages and emit some code. You can statically allocate memory at process startup and use that for bootstrapping code. This lets you implement dynamic memory allocation and even garbage collection in your own language.

titzer · on Aug 24, 2021

Nice work!

> Want a JIT compiler? Map some executable pages and emit some code.

Yep, this is exactly what Wizard does.

kazinator · on Aug 24, 2021

Targeting POSIX standard functions like open by going through the Linux syscall table looks like just making work for yourself when porting this to other systems.

Some syscalls don't correspond to standard library functions. As an exmaple, if you want to bind to opendir/readdir/closedir, you have to write those yourself in terms of the Linux-specific _NR_getdents64 system call.

Is your LinuxConst.SYS_open actually _NR_open? That's supposedly obsolete. glibc uses _NR_openat for open(). _NR_open is listed in the asm/unistd.h header in a section under the heading "All syscalls below here should go away really ..."

How about signal handling; are you dealing with sigreturn and all that?

You can get a small executable footprint (in terms of not requiring a dynamic C library) by maintaining all this yourself, though.

titzer · on Aug 24, 2021

Oh, I know it's work, but I am not going to assume POSIX, as that's implemented in userspace with C code. In my universe, C code doesn't exist (except I use a little in some testing utilities in order to get going on a new platform). I never ported to Windows, but doing so would be as simple as teaching the compiler the Windows kernel calling convention, adding that little process entry code, and then writing an implementation of System using Windows calls. Oh, yeah, and generating COFF :)

Virgil has its own calling convention internally (though this is basically System V on x86-64). That only matters when getting into V3 code or out, e.g. process entry, calling the kernel, and signal entry. For signals, the compiler generates a tiny stub that copies the signal handler arguments into the V3 registers and then calls user code. To install signals, user code just needs to fill out the right sigaction buffer, as any other system call. To return from signals properly, I studied assembly examples I found online. The runtime doesn't use signals for anything other than catching fatal errors (DivideByZero and NullCheck), so it just prints a source-level stacktrace and then exits. But Wizard needs to recover from signals in order to do proper OOB handling of user programs, so it actually does the proper sigreturn dance, but Wizard only does the fancy stuff on 64-bit.

In my universe, only three things exist: Virgil, wasm, and machine code. I have no need of other languages except as means to test those others :)

Virgil runs on the JVM and on Wasm too, and those require slightly different ways of getting off the ground.

matheusmoreira · on Aug 24, 2021

> porting this to other systems

Why care about this? I want Linux on everything instead.

kazinator · on Aug 24, 2021

> Why can't we have a compiler with built in system call support? Just add a system_call keyword that inlines Linux system call code using the supplied parameters

It can be implemented as a small function, that first appeared in 4 BSD. It's available in Linux.

$ man syscall

(Unfortunately, this function, lives in glibc. Obviously, though, it doesn't have to. All I'm saying is that this, or a similar function, can be a linkable symbol in some small compiled object file, and not an inline primitive that has to live in the compiler.)

matheusmoreira · on Aug 24, 2021

You're of course correct about all this. I believe the glibc thing has created mainly cultural problems. People don't look at Linux as a separate thing.

If I look up Linux system calls on Wikipedia I get diagrams showing glibc wrapping the Linux system call interface because that's what you're supposed to be using. If I look at Linux man pages what I really get is glibc man pages with the actual system calls being almost an afterthought. Glibc wrappers actually do a ton of stuff like add cancellation mechanisms. Glibc also drops support for system calls that break their threading model such as clone.

It's the same problem with systemd. I look up Linux init system man pages and get systemd stuff instead. I expected to see kernel APIs useful for writing my own.

kazinator · on Aug 24, 2021

This isn't any different from any current or historic Unix. The system interface has been a C library going back to early Unix.

The library and kernel interface are more separated in Linux systems than in prior Unixes, with user space C libs being totally separate projects.

Over a Linux kernel you can find glibc, ucLibc, musl, Android s Bionic (newlib derivative from BSD), ...

grifball · on Aug 23, 2021

My gcc is tens of KBs

$ wajig size | grep gcc gcc-5-multilib 6 installed gcc-multilib 8 installed gcc 44 installed gcc-6-base 60 installed gcc-5-base 66 installed libx32gcc1 98 installed libgcc1 105 installed lib32gcc1 125 installed libx32gcc-5-dev 6,280 installed lib32gcc-5-dev 7,020 installed libgcc-5-dev 12,193 installed gcc-5 23,648 installed

jjoonathan · on Aug 23, 2021

> attempt to offer purchases for consumables (as printer drivers do)

Or to brick knockoffs! FTDI blazed the trail, now printers are doing it too. First party malware!

freediver · on Aug 23, 2021

What you described brings back distant memories of Windows. Since switching to macOS few years ago none of that behavior is a thing any more. Apps 'just work' and do so in a predictable way. There is no (noticeable) attempt from app makers to bundle bloat. This alone makes macOS much more comfortable to work with. And the bonus is there is an actual usable shell.

snth · on Aug 23, 2021

Try installing an HP printer on macOS by following the provided instructions

phkahler · on Aug 24, 2021

That's not macOS specific. My phone has HP crapware on it for some reason.

cma · on Aug 23, 2021

Logitech control center enters the chat.

nvarsj · on Aug 24, 2021

Have you tried installing a gaming mouse driver on MacOS? Same deal.

When I used it, MacOS also had the issue that some software didn't even have uninstallers. So you have to drag the app to the trash bin, then go and manually remove the various system software that it installed. Or run a script off of the internet. Software management on MacOS is no better than Windows in my opinion, and in many cases much worse.

mr_toad · on Aug 24, 2021

Nvidia’s bloated GeForce experience is about 99% advertising and 1% actual driver. I hate them so much.

dannyw · on Aug 24, 2021

You can just install the driver without installing GeForce experience.

chris37879 · on Aug 23, 2021

The good news is, almost definitely! It's really hard to protect web apps from reverse engineering because a lot of decisions were made to make them out of human readable files ages ago, almost any electron app can be broken open and tinkered with, there are exceptions, though. VSCode, for instance, is an electron app, but basically only for their UI at this point, larges swathes of the app's interesting functionality is written in cross compiled C++. But for an electron app that's just tweaking settings in a Driver? Probably. Ultimately the electron code cannot be very tightly integrated since that's not how drivers work, but then it becomes an arms race of weird protocols to force you into using their app instead of an alternative and we're back to square 1, only now the device has this weird opaque, and possibly crypto signed api for controlling it instead of just bit banging values into memory

mattmcknight · on Aug 24, 2021

> every software maker tries its best to force users install their logo on their screen, so that if yesterday one could use 100 services just by saving 100 bookmarks on a browser, now they need 100 apps,

The other side of this is Android makes it really simple for users to drop a shortcut to your website on the home screen. Apple, to encourage people into their cash cow app store, does not. I was all set to get a company going with mobile web as their primary UI, but it was too hard to get users to put the home screen shortcut down on iOS, so we ended up having to pay to have apps built and now have triple the complexity.

mr_toad · on Aug 24, 2021

> The question is if a giant ball of Electron bloat could be dissected in a similar fashion

If you could just assume Node and a modern browser already existed on the target machine you could just ship a node package and a minimal installer. But you’d need to deal with potential version and browser incompatibility.

5faulker · on Aug 23, 2021

Apparently bureaucracy is not just an office thing.

amelius · on Aug 23, 2021

> The question is if a giant ball of Electron bloat could be dissected in a similar fashion in order to extract the important stuff and throw away the rest. I hope so.

Perhaps OSes could be smarter. After installing an app, it could monitor the "hot" paths in the code, and only load those instructions the next time the app is loaded. Also resources that are never used could be not loaded into memory. Etc. I know, it would be difficult to build this as you're basically instrumenting an app or driver and then rewriting it, but I guess it would be a great innovative feature in a time when OS research seems stagnant.

arghnoname · on Aug 23, 2021

What you're describing is some combination of stuff language runtimes and linkers do (shared libraries, runtime loading, JITting) and demand paging.

It may be the case that one could optimize for the case where a bunch of applications ship that and are statically compiled but use the same underlying libraries. In this case, some agent on the system could analyze the code segments of these binaries and on demand construct shared libraries that strip the shared portion from the binaries. Subsequent invocations would load the constructed shared libraries for redundant sections.

Still, this probably wouldn't help much and would lead to its own issues. One of the problems with these flabby things is just how the runtimes are themselves constructed. You still have per process data structures you'd need to populate and they probably have fat data structures that are not very space efficient, and so on. The size of the instructions is probably not significant relatively speaking.

amelius · on Aug 23, 2021

One trick could be to run the program in a "lazy" way. E.g. don't run a statement like "a=b+c", but evaluate it only when a is needed. This would require a complete and automatic rewrite at the assembly level, but you wouldn't be doing anything that you don't need. Then from this you could determine the "hot" paths, and optimize those for speed (translate back into non-lazy form).

admax88q · on Aug 23, 2021

Tracking all those dependencies is not free either.

Unfortunately there isn't a magic bullet to code bloat.

cowvin · on Aug 23, 2021

Optimizing hot paths is already done via PGO (https://en.wikipedia.org/wiki/Profile-guided_optimization).

wrs · on Aug 23, 2021

I can’t find a reference now, but I believe Windows did (does?) have a page fault tracer that would preload from disk the pages that are observed to be needed. I can’t remember if this was just at boot time or for app launches too.

_carbyau_ · on Aug 23, 2021

I think you are referring to Windows Superfetch.

It helps to load the required pages into memory fast to enhance application launch times and then usual memory paging handles the rest of keeping required paths in memory.

Google says Windows\Prefetch is the directory for the cache. You could look here and see if there are any references to applications rather than just windows boot stuff.