In the context of JavaScript it means some numeric keys are missing (between 0 and the highest numeric key).
This is an issue because JavaScript Arrays are not like arrays in other languages - they are dictionaries where the keys happen to be numeric. So it is possible for a numeric key not to be defined, which is different from it being defined but having "undefined" as a value.
Looks like all arrays are sparse, with the “dense” being a private case where there are no holes. And the functions behave as one would expect, ignoring the holes where it makes sense.
> A problem of the sparse arrays is that many array built-in methods just skip the holes in a sparse array.
That sure seems like a feature, not a bug. I can remove entries from an array, and still iterate the array without having to recreate it - Sure, being aware that .length is incorrect is these cases is good. Likewise, it is good to know that if the nulls are meaningful, you need to use another technique. But that is all the more reason to appreciate that forEach, etc. skips the nulls, while a for loop on .length will process them. Choose whichever is appropriate.
This is a misunderstanding of how Arrays work.
When you have an object with exactly two keys, it is very fast to iterate through them.
So what this code demonstrates is that if you have to look up a billion keys, that is going to be slower than if you have to look up two keys for an object.
const sparseArray = [];
sparseArray[9999] = 'foo'; // Creates an array with dictionary elements.
In this example, allocating a full array with 10k entries would be rather wasteful. What happens instead is that V8 creates a dictionary where we store a key-value-descriptor triplets. The key in this case would be '9999' and the value 'foo' and the default descriptor is used. Given that we don't have a way to store descriptor details on the HiddenClass, V8 resorts to slow elements whenever you define an indexed properties with a custom descriptor:
As an interesting fun fact many languages, including JavaScript, inherit complex type memory organization from C lang. So the same fundamental performance implications work the same way in all these related languages, because it is more about what happens on the metal than at the syntax.
Please note this observation only applies to access during execution.
During the late 70s a developer named Paul Heckel discovered that hash maps (what JavaScript objects are) were randomly accessed in memory until the specified key was found. That random access was faster than the access of a specified index from an array, because array indexes are accessed sequentially. Because arrays are ordered sequentially, even in memory, they are substantially faster to iterate over, though.
Interesting (and unexpected): I tried searching for "Paul Heckel" in Google and only found stuff about a jazz musician. Then I searched for "Paul Heckel's algorithm" and my writing about implementing the algoritm in JavaScript was one of the first Google results.
Sorry, I don't quite understand: running through a sparse array - running through the "undefined" - is much faster than running through a dense array. What have I missed?
they are trying to show that looping a dense array is not much slower than looping the sparse, I just did a test on FF and is 78% slower to loop the dense vs the sparse
> Invoking Array(length) or new Array(length) (with a number argument) creates a fully sparse array:
In my experience the common javascript engines are so accelerated that in practice, before anything happens to the array it is uninitialised and the engine uses the length parameter as a future hint. If the array is promptly filled contiguously it will perform as a dense array.
The types which the array is filled with and subsequently maintains can have a greater impact on performance. An array that only contains integers performs faster than one containing doubles or strings, and once types are changed or mixed in the array it slows down, generally by a factor of 2 or 3.
Can you provide a source on that? Coz the v8 blog has like a monthly dedicated warning post against the dreaded "Holey" array and how it's the archnemesis of all performance optimizations. Ok exaggeration aside, it's something along the lines of "A holey array will never _ever_ become packed...".
Just many hours of time micro-optimising quirky projects like [1]. I could write a test specifically to demonstrate it... and write a little article... but just commenting here that in my humdrum experience - you might want to look into it yourself. I have tested and never seen any benefit in eg. initializing an array as [0,0,0,0,0,0,0,0,0,0,0,0,0,0,...] compared to Array(n) for (i=0;i<n; i++) A[i]=0 ... Ymmv. Takes moments to check with a benchmarking package of choice. I recommend avoiding testing it with any functional kind of expressions though, because they can be much slower than the old imperative style and obscure the picture.
...also, from a reasoning approach - why would a competitively accelerated language engine create an array full of holes as soon as one is declared? Why would the engine not try to optimally structure the type as it is used? This is something Chromes engines and Firefoxes appear to have done, quite impressively since many years back.
Tried to think through situations where I ran into this and felt the same. Usually if I did this I saw it as a poor mutation choice and would revisit a better approach. Just asking for issues IMO.
- Sparse: Most of the data is "missing"
- Dense: Most of the data is not "missing"
The key words being "most" rather than "any" or "none."
At any rate, I suppose it depends on the language and how you view data, but to my mind that's just how arrays work.
1: https://en.wikipedia.org/wiki/Sparse_matrix