I was about to say "the example in the post is sorting so it's n * log(n)".
Upon thinking about it, the example is comparing the relative speed of hashing every item (O(n) * O(1)) == O(n) and sorting (O(n * log(n)).
That means the ratio of these two is a log n factor n * log(n) / n == log(n).
good point.
As a side point, I think the main point of the post is that the time taken for the hash solution, increases at a greater rate than the sort + unique solution. Now, that doesn't make sense with our big 0 notation.
This is because we don't have a simple hashmap, what we have is a dynamically resizing hashmap, which we don't set a capacity on, and the resize operations are order n. Now, how often can we expect to resize? I think most hashmaps double their capacity, so there will be log(n) resize operations that each take O(m) (where m is the size of the hashmap when it needs to be resized).
Now at this point, I can't be bothered to think further about it. That feels like it should be less than O(n * log(n)), but it's kinda close. Either way, it's definitely larger than the simpler O(n) case we're thinking about.
Dynamic resizing a hash table by doubling its size when it reaches a threshold capacity is amortised O(1) per insert.
If there are deletions, both operations are amortised O(1) provided they have been sensible and made the deletion threshold a lower factor.
It's the same sort of pattern as dynamically resizing a growing string. As long as you resize by a constant > 1 factor, rather than an addition, the number of resizes is low enough that the O(n) cost of copying is amortised over prior appends.
> Either way, it's definitely larger than the simpler O(n) case we're thinking about.
It actually isn't larger than O(n), setting aside constant factors.
For a hash map's time to grow more than O(n) for n insertions starting from empty, suggests an implementation problem.
I should have thought about things more. I think this is a case of confirmation bias on my part.
I thought "what about dynamically resizing" and quickly googled complexity of rehashing a hash which confirmed my thought that it was n.
I guess I didn't think that since we must have inserted n values in at this point, we could just say "insertion is O(1)" and the constant factor would be "performing two hashes" if we are going to a point where it's resizing, i.e. pay the cost of hashing once and rehashing once.
that feels like it makes sense. I'm being a bit hand-wavy as I don't want to sit down with pen and paper.
I retract my incorrect statement above. (I can no longer edit it to add postscript retraction.)
Upon thinking about it, the example is comparing the relative speed of hashing every item (O(n) * O(1)) == O(n) and sorting (O(n * log(n)).
That means the ratio of these two is a log n factor n * log(n) / n == log(n).
good point.
As a side point, I think the main point of the post is that the time taken for the hash solution, increases at a greater rate than the sort + unique solution. Now, that doesn't make sense with our big 0 notation.
This is because we don't have a simple hashmap, what we have is a dynamically resizing hashmap, which we don't set a capacity on, and the resize operations are order n. Now, how often can we expect to resize? I think most hashmaps double their capacity, so there will be log(n) resize operations that each take O(m) (where m is the size of the hashmap when it needs to be resized).
Now at this point, I can't be bothered to think further about it. That feels like it should be less than O(n * log(n)), but it's kinda close. Either way, it's definitely larger than the simpler O(n) case we're thinking about.