I think it's all about branding, the same disgrace that plagues mobile operating systems: every software maker tries its best to force users install their logo on their screen, so that if yesterday one could use 100 services just by saving 100 bookmarks on a browser, now they need 100 apps, with all the implications wrt waste of space and resources, security etc.
Although perfectly legit, it's a practice that cannot scale.
Drivers installers on Windows aren't exempt from this nonsense as well since years, so every driver started coming with its companion app, very often a huge VB "thing" that did nothing important except showing its shiny icon on the desktop. Then years later they started phoning home to profile users, attempt to offer purchases for consumables (as printer drivers do), etc.
In the old days however one had a way to control it: the .exe was often a self extracting .zip, so it could be fed to 7Zip and the like to extract the archive without executing it, then one would throw away the unnecessary bloat except the relevant .inf file and a few dlls or other small files, point to that location when searching for drivers, and voila: driver installed without the cruft.
The question is if a giant ball of Electron bloat could be dissected in a similar fashion in order to extract the important stuff and throw away the rest. I hope so.
The cycle of crap on Linux is different; less corporate and underhanded, but more of an ever-expanding bloat. Install Clang/LLVM. It's like 300MB. I remember when almost every system had a fairly small C compiler. It had to be small, because it was the basis of everything else. Now the base is enormous. OCaml is something like 200MB. And of course, it has its own package manager. So does Python, and Ruby, and all these other things that are supposed to be the base of so much other software.
Take another example. Install LaTeX. It's something like 5GB. It's huge because it bundles enormous numbers of packages.
It seems like there are zillions of Linux packages > 100MB. What does all this crap do? Why does everything depend on everything?
Take another example. Node. Building it from scratch takes a pretty beefy machine and a lot of time. (It takes over 20 mins on my 6-core workstation with 32GB RAM). Most of that is building V8. When I worked on V8, we periodically spent some time trying to get build times under control, but the needle barely moved until it got going again. We spent months and months of effort, over years, splitting V8 into more source files and more directories and enforcing header discipline, but all of it made build times worse. Despite how cool V8 is, I feel embarrassed in retrospect that the build system is so bonkers.
Linux is like this everywhere. Monstrous and labyrinthine. It really is impossible to understand it all now.
Linux has nothing to do with Node, though. Most of the Linux world (and the Windows world, and the OSX world, etc) wishes it would just go away forever. Javascript is unadulterated pain.
As for Linux package sizes go... why are you installing so many packages that you don't need just to complain about it?
On Debian and Ubuntu, `dpkg-query -Wf '${Installed-Size}\t${Package}\n' | sort -n` will tell you the install size of things sorted by worst offender; for me, on one of my development machines, is git, followed by Perl packages required by the system, neovim, and then a bunch of normal things expected on any install. `df -h` minus `/home` is a hair over 600mb.
git, being the largest thing, has an install size of 38 megs. Indeed, I cannot tell you why git is 38 megs, there may or may not be bloat here.
As a comparison, Windows uses around 6GB of space, and a MSVC install that has a common set of toolchains may take up to 20GB and... arguably does less than my <1GB Linux install (when it comes to dev work, anyways).
Your dev linux machine is a gigabyte? That's kind of surprising. Just llvm+clang on my machine according to your query clocks in at over a gigabyte. Are you only developing using perl and not doing any from-source builds?
While I share some of your resentment (especially as a Gentoo user who builds Chromium quite regularly), a few extra gigabytes of storage or a few more config files I don't grok are relatively easy to ignore compared to the kind of dark patterns I see happening with 3rd party software on Windows desktops.
And Microsoft itself is increasingly willing to sink to that level as well.
Isn't this like a first computer world problem? How many gigabytes does current Visual Studio take? Not talking about VS Code, because it's only an IDE, which still requires the actual Visual Studio C++ as the compiler.
if you install the full version that has every single package for everything, all with their own manuals etc. At least on debian-based distros there are options to just get the ones relevant to you.
Maybe, but it feels unfair to bring software bloat into a comparison between Linux and Windows. I don't think Linux could ever come close to the degree of software bloat in Windows. Your Clang install is 300MB? Sure. My Visual Studio install is 4.5GB, and it includes at least two versions of Clang (the DirectX shader compiler is a fork of an old version of Clang.) You think dependencies are a problem? You should be happy there's only one copy of each library on your computer (two if you have a multilib system.) I counted more than 25 copies of the C runtime on one Windows computer. You think Linux is monstrous and labyrinthine. Have you seen the scope of the Windows API? There's no comparison.
> It seems like there are zillions of Linux packages > 100MB. What does all this crap do? Why does everything depend on everything?
I dunno man, but software can get complicated, and a lot of these packages probably have features that people use. Why quibble over hundreds of megabytes when popular Windows packages like Office and Creative Cloud are multiple gigabytes? It all seems very unfair that Linux is subject to this level of scrutiny when other systems have worse bloat by literal orders of magnitude.
Modern Linux user space is monstrous and labyrinthine. Linux itself is not. I found it to be a really clean system.
The basis of everything is actually the Linux system call binary interface. We can actually trash the entire user space and start from scratch with nothing but Linux. We can even trash the ubiquitous GNU stuff if we want.
Why can't we have a compiler with built in system call support? Just add a system_call keyword that inlines Linux system call code using the supplied parameters. No need for libc bullshit in the middle. No need for C or any specific language. Someone could make a language today and that single feature would make it as capable as C is for systems programming. They could write software and boot Linux directly into it.
> Why can't we have a compiler with built in system call support?
Funny you should ask that. That is exactly how Virgil's compiler supports the Linux (and Darwin) kernels. Other than generating a small amount of startup assembly (10-20 ins), the compiler just knows the ELF (and MachO) binary formats and the calling conventions of the respective kernels. With some unsafe escape hatches (e.g. getting a pointer into the middle of a byte array), the rest is regular Virgil code that calls the kernel directly.
Take a look, I've been working on this for more than 10 years:
The "Linux.syscall" is a special operator know to the compiler and it will let you pass an int (the syscall number) and whatever arguments you want (any types--it is implemented with flattening and polymorphic specialization) to the kernel.
With this I have implemented all kinds of stuff, including the userspace runtime system and even a JIT compiler (for my new Wasm engine).
Thanks for this, it's extremely awesome! Really happy to see others have gone so much farther than I ever did.
I started looking into this myself some years ago. Even started developing a liblinux with process startup code and everything. Abandoned it after I found the kernel itself had an awesome nolibc.h file that was much more practical for my C programming needs:
It's amazing how this really lets you do everything... Want a JIT compiler? Map some executable pages and emit some code. You can statically allocate memory at process startup and use that for bootstrapping code. This lets you implement dynamic memory allocation and even garbage collection in your own language.
Targeting POSIX standard functions like open by going through the Linux syscall table looks like just making work for yourself when porting this to other systems.
Some syscalls don't correspond to standard library functions. As an exmaple, if you want to bind to opendir/readdir/closedir, you have to write those yourself in terms of the Linux-specific _NR_getdents64 system call.
Is your LinuxConst.SYS_open actually _NR_open? That's supposedly obsolete. glibc uses _NR_openat for open(). _NR_open is listed in the asm/unistd.h header in a section under the heading "All syscalls below here should go away really ..."
How about signal handling; are you dealing with sigreturn and all that?
You can get a small executable footprint (in terms of not requiring a dynamic C library) by maintaining all this yourself, though.
Oh, I know it's work, but I am not going to assume POSIX, as that's implemented in userspace with C code. In my universe, C code doesn't exist (except I use a little in some testing utilities in order to get going on a new platform). I never ported to Windows, but doing so would be as simple as teaching the compiler the Windows kernel calling convention, adding that little process entry code, and then writing an implementation of System using Windows calls. Oh, yeah, and generating COFF :)
Virgil has its own calling convention internally (though this is basically System V on x86-64). That only matters when getting into V3 code or out, e.g. process entry, calling the kernel, and signal entry. For signals, the compiler generates a tiny stub that copies the signal handler arguments into the V3 registers and then calls user code. To install signals, user code just needs to fill out the right sigaction buffer, as any other system call. To return from signals properly, I studied assembly examples I found online. The runtime doesn't use signals for anything other than catching fatal errors (DivideByZero and NullCheck), so it just prints a source-level stacktrace and then exits. But Wizard needs to recover from signals in order to do proper OOB handling of user programs, so it actually does the proper sigreturn dance, but Wizard only does the fancy stuff on 64-bit.
In my universe, only three things exist: Virgil, wasm, and machine code. I have no need of other languages except as means to test those others :)
Virgil runs on the JVM and on Wasm too, and those require slightly different ways of getting off the ground.
> Why can't we have a compiler with built in system call support? Just add a system_call keyword that inlines Linux system call code using the supplied parameters
It can be implemented as a small function, that first appeared in 4 BSD. It's available in Linux.
$ man syscall
(Unfortunately, this function, lives in glibc. Obviously, though, it doesn't have to. All I'm saying is that this, or a similar function, can be a linkable symbol in some small compiled object file, and not an inline primitive that has to live in the compiler.)
You're of course correct about all this. I believe the glibc thing has created mainly cultural problems. People don't look at Linux as a separate thing.
If I look up Linux system calls on Wikipedia I get diagrams showing glibc wrapping the Linux system call interface because that's what you're supposed to be using. If I look at Linux man pages what I really get is glibc man pages with the actual system calls being almost an afterthought. Glibc wrappers actually do a ton of stuff like add cancellation mechanisms. Glibc also drops support for system calls that break their threading model such as clone.
It's the same problem with systemd. I look up Linux init system man pages and get systemd stuff instead. I expected to see kernel APIs useful for writing my own.
What you described brings back distant memories of Windows. Since switching to macOS few years ago none of that behavior is a thing any more. Apps 'just work' and do so in a predictable way. There is no (noticeable) attempt from app makers to bundle bloat. This alone makes macOS much more comfortable to work with. And the bonus is there is an actual usable shell.
Have you tried installing a gaming mouse driver on MacOS? Same deal.
When I used it, MacOS also had the issue that some software didn't even have uninstallers. So you have to drag the app to the trash bin, then go and manually remove the various system software that it installed. Or run a script off of the internet. Software management on MacOS is no better than Windows in my opinion, and in many cases much worse.
The good news is, almost definitely! It's really hard to protect web apps from reverse engineering because a lot of decisions were made to make them out of human readable files ages ago, almost any electron app can be broken open and tinkered with, there are exceptions, though. VSCode, for instance, is an electron app, but basically only for their UI at this point, larges swathes of the app's interesting functionality is written in cross compiled C++. But for an electron app that's just tweaking settings in a Driver? Probably. Ultimately the electron code cannot be very tightly integrated since that's not how drivers work, but then it becomes an arms race of weird protocols to force you into using their app instead of an alternative and we're back to square 1, only now the device has this weird opaque, and possibly crypto signed api for controlling it instead of just bit banging values into memory
> every software maker tries its best to force users install their logo on their screen, so that if yesterday one could use 100 services just by saving 100 bookmarks on a browser, now they need 100 apps,
The other side of this is Android makes it really simple for users to drop a shortcut to your website on the home screen. Apple, to encourage people into their cash cow app store, does not. I was all set to get a company going with mobile web as their primary UI, but it was too hard to get users to put the home screen shortcut down on iOS, so we ended up having to pay to have apps built and now have triple the complexity.
> The question is if a giant ball of Electron bloat could be dissected in a similar fashion
If you could just assume Node and a modern browser already existed on the target machine you could just ship a node package and a minimal installer. But you’d need to deal with potential version and browser incompatibility.
> The question is if a giant ball of Electron bloat could be dissected in a similar fashion in order to extract the important stuff and throw away the rest. I hope so.
Perhaps OSes could be smarter. After installing an app, it could monitor the "hot" paths in the code, and only load those instructions the next time the app is loaded. Also resources that are never used could be not loaded into memory. Etc. I know, it would be difficult to build this as you're basically instrumenting an app or driver and then rewriting it, but I guess it would be a great innovative feature in a time when OS research seems stagnant.
What you're describing is some combination of stuff language runtimes and linkers do (shared libraries, runtime loading, JITting) and demand paging.
It may be the case that one could optimize for the case where a bunch of applications ship that and are statically compiled but use the same underlying libraries. In this case, some agent on the system could analyze the code segments of these binaries and on demand construct shared libraries that strip the shared portion from the binaries. Subsequent invocations would load the constructed shared libraries for redundant sections.
Still, this probably wouldn't help much and would lead to its own issues. One of the problems with these flabby things is just how the runtimes are themselves constructed. You still have per process data structures you'd need to populate and they probably have fat data structures that are not very space efficient, and so on. The size of the instructions is probably not significant relatively speaking.
One trick could be to run the program in a "lazy" way. E.g. don't run a statement like "a=b+c", but evaluate it only when a is needed. This would require a complete and automatic rewrite at the assembly level, but you wouldn't be doing anything that you don't need. Then from this you could determine the "hot" paths, and optimize those for speed (translate back into non-lazy form).
I can’t find a reference now, but I believe Windows did (does?) have a page fault tracer that would preload from disk the pages that are observed to be needed. I can’t remember if this was just at boot time or for app launches too.
It helps to load the required pages into memory fast to enhance application launch times and then usual memory paging handles the rest of keeping required paths in memory.
Google says Windows\Prefetch is the directory for the cache. You could look here and see if there are any references to applications rather than just windows boot stuff.
Drivers installers on Windows aren't exempt from this nonsense as well since years, so every driver started coming with its companion app, very often a huge VB "thing" that did nothing important except showing its shiny icon on the desktop. Then years later they started phoning home to profile users, attempt to offer purchases for consumables (as printer drivers do), etc. In the old days however one had a way to control it: the .exe was often a self extracting .zip, so it could be fed to 7Zip and the like to extract the archive without executing it, then one would throw away the unnecessary bloat except the relevant .inf file and a few dlls or other small files, point to that location when searching for drivers, and voila: driver installed without the cruft.
The question is if a giant ball of Electron bloat could be dissected in a similar fashion in order to extract the important stuff and throw away the rest. I hope so.