Imo functional programming is one of those things that makes sense from a theoretical perspective, but comes with compromises when it comes to reality.
The thing about functional programming is that the confidence you get from immutability comes at the cost of increased memory usage thanks to data duplication. It's probably going to create a ceiling in terms of the absolute performance which can be reached.
There are just other ways to solve the problems like NPE's and memory safety, like ADT's and ownership rules, which don't come at the same costs as FP.
This is actually an area where there’s room to improve FP languages.
If you track ownership in functional languages, you can statically determine if a value is used more than once.
If it’s only used once, you can apply any updates to that value in place without allocating more memory.
This gives the performance benefits of mutability with the safety benefits of immutability, in some common cases.
The main trick is adjusting memory layouts accordingly. You can keep it simple by only applying this optimisation for functions of A -> A, or if you’re replacing it with a different type you can analyze the transformations applied to a value and pre-emptively expand the memory layout.
If a value is likely to be used only once, but might be used multiple times, you can also apply the same approach at runtime by reference counting and updating inplace when there’s only a single reference (for functions of A -> A at least).
I believe the Roc folks are aiming to have aspects of this functionality, and I also believe there’s similar stuff becoming available in Haskell under the guise of linear types.
Finally, if you really need a shared mutable value, that can be achieved with mechanisms like the State type in Haskell.
In short, the pieces are there to create a functional programming language that doesn’t introduce needless memory usage overhead, but I don’t think anyone has put all the pieces together in a convenient and accessible way yet.
I get that not everyone does, but a large part of why I use Clojure is because it makes a whole class of concurrent designs easier. In particular, sharing that immutable data across multiple threads.
As a simplified example: one thread modifies the data, and another thread writes a snapshot of the data.
In the programs I write, it would pretty much never benefit from this optimization.
This article tries to push FP as a solution to NPE??? What the?! NPEs are a problem due to dodgy type systems... it's a type sytem problem, not a language paradigm one... i.e. it has no relation to whether a language is functional or not.
For example, Dart 2 is null-safe! No one in their right mind would claim Dart is a FP language. Even Java can be null-safe if you use some javac plugin like the Checker Framework.
Also, a language can totally be functional and yet suffer from NPE, like Clojure or Common Lisp, but I suppose the author may be forgiven here because they are talking only about "purely functional programming languages"... (they didn't mention "statically typed" though, but that's clearly implied in the content)...
I believe the author is inadvertently pushing for two things that are pretty much unrelated to FP, even if they are a requirement in most purely-functional languages:
* immutability
* strong, static type systems
I would mostly agree with both (except that local mutability is fine and desired, as anyone trying to implement a proper quicksort will tell you - also, see the Roc language[1], which is purely functional but uses local mutability), but I write my Java/Kotlin/Dart just like that and I wouldn't consider that code purely functional. I don't think whether the code is purely functional actually matters much at all compared to these 2 properties, which can be used in any language regardless of their main paradigm.
If the first thing you talk about is performance and not quality and maintainability, then you're already missing the point. Most software just isn't in some super high-perf environment - what matters is fewer bugs, easier maintainability, better communication with other engineers (through declarative code).
The code we work on in the 2020s is much, much more complex than code written 20 years ago. We need better primitives to help our weak and feeble brains deal with this complexity. FP (particularly pure FP) gives us that. It isn't a panacea, but it's a major step in the right direction.
I just disagree that performance isn't a concern in almost every context. There's a hierarchy of concerns, to be sure, and if you haven't written reliable code which solves the problem yet you shouldn't be worried about performance, but if your PL itself imposes a performance tax, that's something which has to be paid every time your program gets executed, by every user.
As programmers, our job is to not to play with abstractions, it's to move electrons and make hardware do things. We can't afford to abstract away the complexity of the hardware completely. Indeed the trends in PL popularity of the past 20 years have been to move back closer to the hardware, and away from highly abstracted environments, like scripting languages and JVM.
But Python/Ruby (and JS? Not sure how far they are with optimising that or what the comparison would be) are very slow compared to Haskell. So people are paying that price all the time without getting any benefits that Haskell (etc) can over next to it. I agree with the GP; performance is really not very interesting for most projects and most (Py/JS are the top dev languages by far I think) programmers/companies are agreeing with that by using low performance environments that make them productive. So productivity seems to win out.
For sake of the environment and hardware upgrades, I think we definitely should make an effort and we can see that improvements in compilers and PL theory do help with this when the goal is practical programming languages using these techniques; Rust does, Haskell was meant as academic language for a long time.
I think robustness/security should go first anyway as hierarchy of concerns; that's where things are really breaking now.
Python and JS are top programming languages for very specific reasons. Python because:
1. It has a low barrier of entry for non-programmers, which makes it suited for applications like data science and ML.
2. There is vast library support for math and science, which makes it basically the only choice for those domains.
But python is basically used as an API for highly optimized C libraries since any kind of a hot loop in python is basically an anti-pattern unless you own stock in energy companies or hate getting results quickly.
JS has its place because because until very recently it had a monopoly on web front end development, which is one of the largest programming domains.
So for both of those examples, it's the use-case determining the PL, not the programmer.
Why do you think that functional programming results in data duplication? I would think it's rather the opposite.
With strong immutability like in Haskell, you can share values even between threads and can avoid defensive copying. Two versions of the same immutable tree-like data structure can also share part of their representation in memory.
(Haskell has other problems causing increased memory usage, but not related to data duplication in my mind)
Why do you think that functional programming results in data duplication? I would think it's rather the opposite.
I suspect it has something to do with the perceptions around always returning a new thing from a function rather than the mutated input to the function. For example, if you need to mutate the property of an object based on other inputs, the perception is you would clone that object, modify the property appropriately, and then return the cloned object.
So, ELI5: What do you do instead, where you return a modified object, but still have immutability? Or do you avoid the problem by not trying to do that?
But if you modify the leaf, don't you also have to modify the branch node that points to the leaf, so that it points to the new one? And every node from there to the root?
Yes, you (or rather, core library developers) need to pick a data structure appropriate to your access patterns: linked lists for iteration, search trees for concatenation, finger trees for random access, and so on. But you should be doing this anyway for clarity, even if you face no performance constraints.
Efficient functional programming often uses tree-like data structures. These can be immutable but still avoid duplication.
Consider if you "duplicate" a Data.Sequence Seq (finger tree) for modification. You're not actually copying the whole structure, you are creating a new root node and re-using as much as possible of the common structure.
The end result is that a bit more memory is used in the simplest case, but not due to duplication I think.
The benefit is that a thread can make a modified value at cheaper cost without affecting another thread that is still using the original value. I also think it's easier for the programmer to understand the code.
Won't this still result in a lot of fragmentation? I.e. won't you have disjoint allocations for those new branches of the tree? Sounds pretty cache-unfriendly.
In a strict language or with poorly optimized lazy code, yes. If you can get good stream fusion, not really. If your code fuses really well (common for lists, harder elsewhere) the whole thing will happen on the stack.
Sure, but let's assume that the program has more than one thread and that another thread could still be using the old value. In that case, an imperative program might be required to copy the whole structure or sleep until the existing users are done, which is often less efficient and is always more complicated.
If it's ok to support only a single concurrent user of the value, then a mutable structure is indeed more efficient. Even in Haskell we have mutable structures that can be used for such purposes.
The interesting question to me is, what should be the default? I think there is a good argument that it should be the safer, simpler immutable structures.
"Simpler". If you're only single-threaded, the mutable structures are simpler.
If you're multi-threaded, you have to choose: Do I go with immutable, which means that the other thread gets something safe? It also means that the other thread gets something stale (out of date), which can have its own consequences.
Or do I use a lock or mutex, which means that I have to place guards in all the right places, which is error-prone and can break in weird ways if I miss one?
Or do I use mutable data without guards, and just let it crash? (Because it's probably going to crash if I do that...)
> Imo functional programming is one of those things that makes sense from a theoretical perspective, but comes with compromises when it comes to reality.
Ironically whay you say is true in theory, but not true in the real world.
My problem with functional programming is refactoring. If you don't have side effects when you want to do 2 things deep into the call tree to something that is in completely different call tree branch - you have to extract it all the way up to the common parent and pass it through all the intermediates just so that one function deep there can access it.
It's incredibly frustrating when you work in a functional language, and yet it's the main benefit (no side effects).
I'd like to have a language that is imperative when written and functional when read :)
> I'd like to have a language that is imperative when written and functional when read :)
Depending on which language Monads give you exactly that bridge between imperative and functional.
In your example, you can always choose to have side effects in that deep call.
Personally, I like a type-driven approach. Then, I do not care so much where functions are (they will be grouped logically, but could be anywhere), as long as the type in and the type out matches.
I've handled this (in Clojure) either by passing a context map through the entire stack or binding some context atom at the top and using it lower down the stack. The binding is less obvious at first glance, so I prefer to pass the context, but both make testing quite easy and reduce the need for heavy refactoring.
I like that aspect because it gives you a hint that your call tree is not how it should be.
Lifting out side-effects and passing the data back in is one approach, but not the only one.
Turning a deep call tree into a flat pipeline is a popular other approach and often leads to less complexity, better compos-ability.
I like it when I'm done. I very much dislike it when I'm not sure yet what my code should do so it changes constantly. And that's most of the time in gamedev for example.
Immutability does not even have to be bound to functional programming. One can use persistent (= immutable from the user's perspective) data structures, like immutable.js and get most of the benefits.
Moreover, persistent data structures can be optimized well (see Clojure), so that the performance issues may be relegated to the inner loops, as usual, and the rest of the program may use safe-for-sharing data structures.
To some degree. But it really is the case that a persistent functional data structure is going to have a slower insert operation than a traditional mutable set. There's no getting around that.
The asymptotic behavior is not the entire story. Because persistent data structures almost necessarily need to have their data allocated non-contiguously they have terrible cache performance and prefetching/speculation behavior in comparison to data structures that take advantage of contiguous memory locations.
Duplication problem is solved by immutable data structures with structural sharing. They are part of Clojure standard library and exist in some other languages.
From my experience if you are not dogmatic/extremist about it you can gain a lot, so I try to do things in a functional way in my non functional languages.
Yeah I totally agree. I think writing code which is "functional where possible" is super powerful and offers a ton of advantages. Trying to cover that last 20% of cases where you have to bend over backwards to create a performant functional solution, to solve a problem which is trivial with a little bit of mutability, doesn't make sense.
The other problem with functional programming is it's harder to look at code and figure out the time complexity. At least with non-functional languages I can easily figure out why there are performance problems with it. Stuff like lazy evaluation may seem cool, but it's not when you have to figure out why something's slow.
That's an Haskell issue not a functional programming issue. Haskell is pretty much the only functional programming language which is lazy by default. It's the sole one for the reason you give. Every other ones are eager by default and you can opt in to lazy evaluation when needed.
In addition to sibling’s recommendations, consider OCaml too for a strict language that is vaguely similar to Haskell and performs similarly well in benchmarks:
This is lazy vs strict, not functional vs whatever. I'm a fan of functional programming, not so much a fan of lazy evaluation. Look at PureScript or Idris perhaps?
Data duplication is coming to all minstream languages these days, whether it's a community best practice or a requirement for some libraries (e.g. streams in Java).
I'm not saying that in-place operations don't have their uses, but to me, these use-cases look more and more like niche cases, as opposed to being the default choice in the past. That is well-reflected in new languages like rust, where a new name is by default immutable, and must specifically be flagged as mutable in order to be variable.
It means that the benefits of immutability and the performance tradeoffs people are willing to take are evolving. I would assume larger codebases and faster hardware means performance is less valuable, and clarity comes at a premium.
I do agree with you that NPEs and memory safety are unrelated problems, though.
Data duplication is a tool which can be used to avoid large classes of bugs, but there's no question it comes at a cost to performance.
If anything, I would say explicit mutability allows us to decrease data duplication. By being explicit about what's mutable and what's not we don't have to resort to holding "safe copies" to ensure shared memory is not being mutated unexpectedly.
The thing about functional programming is that the confidence you get from immutability comes at the cost of increased memory usage thanks to data duplication. It's probably going to create a ceiling in terms of the absolute performance which can be reached.
There are just other ways to solve the problems like NPE's and memory safety, like ADT's and ownership rules, which don't come at the same costs as FP.