Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Sparse vs. Dense Arrays in JavaScript (dmitripavlutin.com)
42 points by kiyanwang on Nov 1, 2021 | hide | past | favorite | 21 comments


I presume "sparseness" comes from the mathematical concept[1] but the definitions I've always seen have been:

- Sparse: Most of the data is "missing"

- Dense: Most of the data is not "missing"

The key words being "most" rather than "any" or "none."

At any rate, I suppose it depends on the language and how you view data, but to my mind that's just how arrays work.

1: https://en.wikipedia.org/wiki/Sparse_matrix


In the context of JavaScript it means some numeric keys are missing (between 0 and the highest numeric key).

This is an issue because JavaScript Arrays are not like arrays in other languages - they are dictionaries where the keys happen to be numeric. So it is possible for a numeric key not to be defined, which is different from it being defined but having "undefined" as a value.


Looks like all arrays are sparse, with the “dense” being a private case where there are no holes. And the functions behave as one would expect, ignoring the holes where it makes sense.


no, this blog post is just wrong

sparse arrays are usually called associative arrays, or maps


Sparse arrays are a subset of associative arrays or maps and could have additional optimizations.


> A problem of the sparse arrays is that many array built-in methods just skip the holes in a sparse array.

That sure seems like a feature, not a bug. I can remove entries from an array, and still iterate the array without having to recreate it - Sure, being aware that .length is incorrect is these cases is good. Likewise, it is good to know that if the nulls are meaningful, you need to use another technique. But that is all the more reason to appreciate that forEach, etc. skips the nulls, while a for loop on .length will process them. Choose whichever is appropriate.


Those don't skip null or undefined though, they only skip missing keys in the array. For example:

  [1,,3].length
  // 3

  [1,,3].forEach(v => console.log(v))
  // 1
  // 3
  
  [1,undefined,null,3].forEach(v => console.log(v))
  // 1
  // undefined
  // null
  // 3


Interesting! As a quick test on Chrome ("Version 95.0.4638.69 (Official Build) (x86_64)")

  t0 = new Date();
  a = Array(100000000).fill(0);
  console.log("time elapsed: ", new Date() - t0); // 22912
  a[100000] = 1;
  a[10000000] = 1;
  a.forEach(e => e == 1 && console.log(e));
  console.log("time elapsed: ", new Date() - t0); // 28371
  
  t1 = new Date();
  a = Array(100000000);
  a[100000] = 1;
  a[10000000] = 1;
  a.forEach(e => e == 1 && console.log(e));
  console.log("time elapsed: ", new Date() - t1); // 2787
Upon doing a couple not-so-scientific runs of this, it looks like the latter isn't really faster than the former.


This is a misunderstanding of how Arrays work. When you have an object with exactly two keys, it is very fast to iterate through them.

So what this code demonstrates is that if you have to look up a billion keys, that is going to be slower than if you have to look up two keys for an object.

Additional reading for this: https://v8.dev/blog/fast-properties From this article:

const sparseArray = []; sparseArray[9999] = 'foo'; // Creates an array with dictionary elements. In this example, allocating a full array with 10k entries would be rather wasteful. What happens instead is that V8 creates a dictionary where we store a key-value-descriptor triplets. The key in this case would be '9999' and the value 'foo' and the default descriptor is used. Given that we don't have a way to store descriptor details on the HiddenClass, V8 resorts to slow elements whenever you define an indexed properties with a custom descriptor:


If the array is implemented internally as a map, then the forEach method on t1 already knows it has only two items to process.

You can show that as follows:

  t0 = new Date();
  a = Array(100000000).fill(0);
  a[100000] = 1;
  a[10000000] = 1;
  counter = 0;
  a.forEach(e => e == 1 ? console.log(e) : counter++);
  console.log("elapsed: ", new Date() - t0); // 19415
  console.log("visited but not logged: ", counter); // 99999998

  t1 = new Date();
  a = Array(100000000);
  a[100000] = 1;
  a[10000000] = 1;
  counter = 0;
  a.forEach(e => e == 1 ? console.log(e) : counter++);
  console.log("elapsed: ", new Date() - t1); // 1936
  console.log("visited but not logged: ", counter); // 0


As an interesting fun fact many languages, including JavaScript, inherit complex type memory organization from C lang. So the same fundamental performance implications work the same way in all these related languages, because it is more about what happens on the metal than at the syntax.

Please note this observation only applies to access during execution.

During the late 70s a developer named Paul Heckel discovered that hash maps (what JavaScript objects are) were randomly accessed in memory until the specified key was found. That random access was faster than the access of a specified index from an array, because array indexes are accessed sequentially. Because arrays are ordered sequentially, even in memory, they are substantially faster to iterate over, though.

http://documents.scribd.com/docs/10ro9oowpo1h81pgh1as.pdf

Interesting (and unexpected): I tried searching for "Paul Heckel" in Google and only found stuff about a jazz musician. Then I searched for "Paul Heckel's algorithm" and my writing about implementing the algoritm in JavaScript was one of the first Google results.


> it looks like the latter isn't really

Sorry, I don't quite understand: running through a sparse array - running through the "undefined" - is much faster than running through a dense array. What have I missed?


they are trying to show that looping a dense array is not much slower than looping the sparse, I just did a test on FF and is 78% slower to loop the dense vs the sparse

https://jsbench.me/ukkvgj16gg/1


> Invoking Array(length) or new Array(length) (with a number argument) creates a fully sparse array:

In my experience the common javascript engines are so accelerated that in practice, before anything happens to the array it is uninitialised and the engine uses the length parameter as a future hint. If the array is promptly filled contiguously it will perform as a dense array. The types which the array is filled with and subsequently maintains can have a greater impact on performance. An array that only contains integers performs faster than one containing doubles or strings, and once types are changed or mixed in the array it slows down, generally by a factor of 2 or 3.


Can you provide a source on that? Coz the v8 blog has like a monthly dedicated warning post against the dreaded "Holey" array and how it's the archnemesis of all performance optimizations. Ok exaggeration aside, it's something along the lines of "A holey array will never _ever_ become packed...".


Just many hours of time micro-optimising quirky projects like [1]. I could write a test specifically to demonstrate it... and write a little article... but just commenting here that in my humdrum experience - you might want to look into it yourself. I have tested and never seen any benefit in eg. initializing an array as [0,0,0,0,0,0,0,0,0,0,0,0,0,0,...] compared to Array(n) for (i=0;i<n; i++) A[i]=0 ... Ymmv. Takes moments to check with a benchmarking package of choice. I recommend avoiding testing it with any functional kind of expressions though, because they can be much slower than the old imperative style and obscure the picture.

...also, from a reasoning approach - why would a competitively accelerated language engine create an array full of holes as soon as one is declared? Why would the engine not try to optimally structure the type as it is used? This is something Chromes engines and Firefoxes appear to have done, quite impressively since many years back.

[1] https://github.com/strainer/fencache.js


Does that mean this:

    Array(n).fill()
is a permanent deoptimization? And if so, is the new array returned by:

    Array(n).fill().map(x => x)
better optimized (after the, presumably, suboptimal iteration)?


You should never use sparse arrays though.

Over 7 years as Node.js developer I have never needed to use them and I don't see use appealing use case.


Tried to think through situations where I ran into this and felt the same. Usually if I did this I saw it as a poor mutation choice and would revisit a better approach. Just asking for issues IMO.


Reminded me of my old article explaining the "forEach" polyfill that takes care of a sparse arrays: https://medium.com/@sAbakumoff/a-deeper-dive-into-javascript...


An array with holes is still dense.

An array becomes sparse when the datastructure doesn't try to manage things linearly, *not* when there's a placeholder "nothing" value stored in it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: