Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I was about to say "the example in the post is sorting so it's n * log(n)".

Upon thinking about it, the example is comparing the relative speed of hashing every item (O(n) * O(1)) == O(n) and sorting (O(n * log(n)).

That means the ratio of these two is a log n factor n * log(n) / n == log(n).

good point.

As a side point, I think the main point of the post is that the time taken for the hash solution, increases at a greater rate than the sort + unique solution. Now, that doesn't make sense with our big 0 notation.

This is because we don't have a simple hashmap, what we have is a dynamically resizing hashmap, which we don't set a capacity on, and the resize operations are order n. Now, how often can we expect to resize? I think most hashmaps double their capacity, so there will be log(n) resize operations that each take O(m) (where m is the size of the hashmap when it needs to be resized).

Now at this point, I can't be bothered to think further about it. That feels like it should be less than O(n * log(n)), but it's kinda close. Either way, it's definitely larger than the simpler O(n) case we're thinking about.



Dynamic resizing a hash table by doubling its size when it reaches a threshold capacity is amortised O(1) per insert.

If there are deletions, both operations are amortised O(1) provided they have been sensible and made the deletion threshold a lower factor.

It's the same sort of pattern as dynamically resizing a growing string. As long as you resize by a constant > 1 factor, rather than an addition, the number of resizes is low enough that the O(n) cost of copying is amortised over prior appends.

> Either way, it's definitely larger than the simpler O(n) case we're thinking about.

It actually isn't larger than O(n), setting aside constant factors.

For a hash map's time to grow more than O(n) for n insertions starting from empty, suggests an implementation problem.


I should have thought about things more. I think this is a case of confirmation bias on my part.

I thought "what about dynamically resizing" and quickly googled complexity of rehashing a hash which confirmed my thought that it was n.

I guess I didn't think that since we must have inserted n values in at this point, we could just say "insertion is O(1)" and the constant factor would be "performing two hashes" if we are going to a point where it's resizing, i.e. pay the cost of hashing once and rehashing once.

that feels like it makes sense. I'm being a bit hand-wavy as I don't want to sit down with pen and paper.

I retract my incorrect statement above. (I can no longer edit it to add postscript retraction.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: