Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I had someone in an interview in Berlin ask me to write a garbage collector. In CSC 202, our professor talked about using reference counts.

Jeez. I think my profs covered ref counting as an introduction, then also went on to cover mark/sweep and generational collectors. We also covered compaction, copying and bump allocation. I don't think it took that long. If I were running a shop that focused on, say Java, I would want to know if my new hires knew background information relevant to optimizing code running on the JVM.

Cycle detection? Man I could probably tell you back when I studied minimum spanning trees and wrote this thing to implement Dijkstra's shortest path:

Someone should have given you breadth first search and depth first search, then ran you through how those are components to other algorithms. You should have been left with those as a "toolbox" such that you automatically spend a second thinking, "what would happen to that graph if I did DFS or BFS on it." That kind of toolbox is powerful and gives you all sorts of useful insights. You are not the only one around here to say, "Man I could probably tell you back when..." What you should be thinking now is that you were not well served by some of your teachers. Fortunately, this sort of thing is easily rectifiable.

Off the top of my head, I'd hope each node had a unique identifier and I guess I'd mark them/store the keys in a hash table.

Well good on you. You just did better than most of my last 6 interviewees.

Unless there's a way to mark the data structure, you now have a second structure the size of your key space.

This is the DFS/BFS part right here. Is the 2nd part of your statement likely to be true, and how often would it be true? Good call on the marking. (EDIT: Just thought of it: Bloom filter.)

I'm sure there are better solutions, but I wouldn't expect a senior program to know them off the top of their head

Cripes! Freshmen used to be able to do this stuff!

unless you're hiring really specifically for a position writing routing algorithms or looking for a senior airport transit architect.

How about you don't want programmers who will end up debugingg an infinite loop induced by a data cycle every single time?

EDIT: I actually just wrote cycle detection for my side project in golang the day before yesterday. It took me 4 additional lines of code. If you have a shop that uses gob or some kind of object serialization, this may well be very relevant!



I had someone ask me the cycle detection question once, and I didn't care for how they phrased it. Specifically, should I find it immediately upon entering the first cycle (at a higher memory/time cost) or should I just eventually detect it (e.g. turtle/rabbit)? It was on me to clarify, but as a newb to the industry, I felt like I should just know which was expected.

And either way, if I were to ask this question, I would spend a lot of time helping the person along the way to make sure they were able to make the logical leaps that made sense to them, not to me.


Here's how I asked the question. I would present the data for a 2 node cycle and ask what would happen to this routine. Here's a response I would get far too often: A conditional clause detecting a 2 node cycle. Then I would present a 3 node cycle, then ask how to detect an n node cycle, period.

Lots of 3.5+ GPA grads can't make that logical leap!


You're trying really hard to find a connection between implementing a GC and debugging memory leaks... and failing.

The whole freaking point of a GC like say Java's is that an average programmer can use it without having to understand how exactly it's implemented. Of course it won't hurt to know that, but it's not at all mandatory knowledge.

One just has to know which situations the GC can't cope with and avoid them. For Java there's at least one open source dedicated tool for finding leaks, it nicely explains what one needs to know.


One just has to know which situations the GC can't cope with and avoid them.

Unfortunately, many programmers believe that since Java uses garbage collection, you do not have to think about GC and ownership at all.

Oracle had to replace the fast implementation of substring that just returns a slice of a String (O(1) time) by a copying implementation (O(n)), because too many programmers do not know the basics of ownership/garbage collection and would accidentally hold on to larger strings.

Seeing the implementation details of reference counting, mark-sweep collection, and perhaps a generational collector once, makes you more aware of memory and ownership issues, even if you forget the nitty gritty details later.


You're trying really hard to find a connection between implementing a GC and debugging memory leaks... and failing.

I spent years at a vendor for a Virtual Machine. That you would compose such a sentence shows that you are ignorant of some aspects of optimization. You don't even know what you don't know, and projected that ignorance on another.

The whole freaking point of a GC like say Java's is that an average programmer can use it without having to understand how exactly it's implemented.

One of my company's most frequent consulting tasks was helping clients optimize to maximize throughput for the generational GC. That you jumped to the conclusion that I was talking about memory leaks is pretty damning.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: