Github (and similar evidence of actual abilities) are already making questions like these extinct, thank god. Reading through a single project an applicant has written themselves is immensely more informative than these silly CS brain teasers: actual solutions to actual problems, style & documentation habits, testing practices, evidence of being able to code & ship, being a self starter, etc... Classic example of actions speak louder than words. Thankfully the technology is now in place so that resumes can be replaced with coding community profiles, and interviews can focus on filling in the blanks and assessing the compatibility of the applicant as a person.
I definitely agree that for a startup team, Github is making CS interview questions extinct. Even github itself claims so: http://ozmm.org/posts/who_we_hire.html - "at GitHub we hire "The Girl or Guy Who Wrote X," where X is an awesome project we all use or admire."
However, it is a different question if you are running a company like Google or Facebook. For instance, recently there was a news that Google received > 75,000 applicants and is looking to grow by more than 5000 employees this year. http://goo.gl/9fpkt
If you want to scale at this level, you won't have time to look at Github or similar sites for every candidate in detail. There will also be great candidates who have never published their code (although they might be the minority). To maintain hiring standards in this kind of situation efficiently, asking very difficult algorithmic questions might be the only way to go - at least you will be able to maintain the same hiring standards for 75,000 applicants.
Public code repositories are the first place I go when I look at applicants.
Surprisingly, however, public code rarely tells you anything about a user's CS fundamentals. Can they tell me the time complexity of a method they just wrote? Do they know the difference between a depth first and breadth first search?
I've met individuals with good code who were unable to answer CS 101 level questions like that. It's not a deal breaker, but it needs to be given some weight and consideration in hiring. It's also a matter of seniority -- I would never offer a senior developer position to someone who didn't have CS fundamentals because I expect such a person to solve hard problems that fall outside of "hacking up a webapp".
i don't think so. CompSci 101 is gonna be stuff like fizz-buzz. you're gonna learn the absolute basics: loops, conditionals, maybe even functions, depending on how the course is structured. but depth-vs.-breadth? calculating big O, or even what big O is? that's not intro level stuff. it's certainly important, and i agree that you should think long and hard before hiring someone for a senior role if they can't handle that kind of stuff. i just strongly disagree with categorizing it as introductory.
The University of Cambridge's computer science course includes big-O notation in the first term. It is introductory.
Look at it like any other subject: you spend a lot of time analysing and understanding existing work before you make new work. Why would it be different for computer science?
While I'd love to agree, reading someone's code does not tell you:
1) If they actually wrote all of it, or if it derivative of someone else's work.
2) Whether any particular function took them 10 minutes or 10 hours.
3) What options they considered when choosing how to architect a particular solution.
4) Why they made any particular choice in their solutions.
5) Whether or not they will fit on your team, take constructive criticism, etc.
The questions posed in the article don't give you those answers either, but the applicant's speed at answering, their confidence, they demeanor as you ask further questions and discuss answers... all that communication does answer most of the above questions.
1. Do you look at my github and wonder whether I wrote it or copied it from elsewhere?
2. What do you care if anything I wrote took me ten minutes or ten hours? And with rewrite_rails, the answer is more like ten months. Is that a problem? Are you looking for LOC/hour or for a certain approach to thinking about programming?
3. A trivial coding problem will not answer this question any more than Github will, but you can always just ask: "Reg, I'm looking at Faux, and I wonder, what other approaches did you consider before settling on pushing application logic down into the browser?"
4. See above.
5. "Reg, #andand seriously sucks. This is pervasive nil checking dressed up in metaprogramming clothing with opening core classes as a third deadly sin. Before I consign your soul to the abyss, what do you have to say in your defense?"
I can say on #2 that clients I've worked with aren't specifically thinking of LOC/hour, but they do want people who are able to be reasonably productive. Everyone's definition of reasonable is different (obviously). Looking at, say, a 50 line block of code, I'd certainly prefer to know if it took someone a couple of days to arrive at that solution vs, oh, say, 18 months.
Not everyone works at the same speed, but for certain types of problems and industries, knowing that you can put out reasonable quality work in a reasonable amount of time helps people get an idea of how to plan out other parts of the project.
I'd rather know someone can do fizzbuzz in a few minutes vs 8 hours.
I find it very hard to look at anything non-trivial and figure out what is slow, about right, and fast. For example, #andand (I keep using this example because many people have heard of it, not because I'm foolish enough to think this is good work on my part).
I guess that after Benjamin Stein and I discussed the problem and one of us (I can't remember whom) said "I wish there was an and-and-dot operator in Ruby," there was a rough implementation written fairly soon after.
But the thing on my Github now probably has a hundred hours of work refactoring, dealing with edge cases, adding new functionality like .andand { |foo| ... } and so on. I wouldn't be surprised if the LOC/hr. has dropped below one at this point. And that's without considering the rewrite_rails version which uses AST rewriting.
So for that purpose, I agree that a fizzbuzz given on the spot is far superior to looking at someone's Github projects. Thanks for the valuable comment.
I guess we're thinking about somewhat different things. "I find it very hard to look at anything non-trivial" and in most cases the interviews I've been involved have tended to involved 'trivial' cases, where in some cases the interviewer wasn't sure of the right answer, they just knew some of the common 'wrong' ones to weed people out.
The false negative rate of the hiring strategy you advocate would be significantly higher that the traditional programming interviews. There certainly is a fraction of otherwise competent programmers who would fail to write the linked list code because of the interview conditions (stress, time pressure, whiteboard etc.). What is that fraction? 50%? 70%? I'm guessing no higher than that. Now, what is the fraction of competent programmers who haven't written any open-source code? I'm guessing 95%.
Also, the cost of inspecting carefully enough someone's open-source code is way higher than evaluating their answer to an interview question.
Also, at the risk of stating the obvious, if a significant fraction of employers started looking at the candidates' open-source contributions, all new grads would start producing open-source-spam projects, further decreasing the usefulness of this signal.
Are Github and open source projects turning programmers toward a portfolio based resume?
I've had success in interviews initially linking to my github repo and letting the interviewer view for themselves what I can accomplish. It has been a good diving board when you start talking in person.
Yeah except Interviewers assume by default that you don't know how to program and you somehow cheated and got someone else to write your open source project.
You may be asked about using Unicode strings. What the interviewer is usually looking for is:
- each character will be two bytes (so, for example, char
lookup table you may have allocated needs to be
expanded from 256 to 256 x 256 = 65536 elements)
- that you would need to use wide char types (wchar_t
instead of char)
- that you would need to use wide string functions (like
wprintf instead of printf)
Unicode support involves more than just the minimum number of bits used per character.
Say you're writing a method to truncate a string to a certain length. What if your text contains a code point outside the basic multilingual plane, and each character can't be represented in just 16 bits? If you truncate in the middle of a character, the result will be an invalid string. What about combining characters? If you truncate in the middle of a combining character sequence, you'll end up with half-formed output (e.g., letters missing a diacritic). If you work with text, you should know this stuff!
The OP is a page from 2002 (or possibly 1902; didn't people remember Y2K back then?), so I expect the recent new code points beyond U+65535 were still outside the imagination of potential interviewers.
This little thread illustrates nicely that the smart thing to do is not to implement list or string functions yourself but to use a library that has been tried and tested. Spend your brain cycles on the important stuff.
I find it much more interesting to ask people when and why they would choose a linked or an array/buffer based list...
I think a lot of these questions don't really reflect what someone will be doing in their real job. It's more of a "how much do you remember from your CS degree" test that has little relevance to your job duties of maintaining some monolithic Java based enterprise platform.
I mean come on, Binary Trees? I know what a Binary tree is but I'd have to look up the definitions of all the traversal types because there's no way I'd be able to remember what Post/Pre/In Order traversals actually entail.
And this question is an absolute abomination: -
What is the next line in the following sequence:11121Answer: it's 1211 and the next is 111221
That's a joke right? I'd personally answer 31 to this question.
It's to identify those that have 'architect potential'. If you pick the simple, easy to implement and blindingly obvious answer you don't make the cut, sorry. :D
I hate being given questions with ambiguity and not enough context to answer them, and then getting it "wrong". It's fine if you want to test my thinking processes, but make the question about listing my assumptions and reasoning rather than expecting a single empirically-derived answer.
The next item in the sequence could justifiably be 31, or 1111, or 121, or 0004, or 100. Given an ambiguous question, I'm going to take the less clever answer which is more readily supported by the most common assumptions (y = (x-1) * 10 + 1, as the problem appears to be presented as a numeric sequence and the entire world counts in base 10). In programming, any undocumented assumptions should be blindingly obvious. Anyone that works on a system with unobvious and undocumented assumptions is playing software Russian roulette.
If the kind of person you want to hire is the kind of person who comes up with unobvious, esoteric solutions to problems, I'm not sure I want to be working on your software team anyhow.
"You have 2 supposedly unbreakable light bulbs and a 100-floor building. Using fewest possible drops, determine how much of an impact this type of light bulb can withstand. (i.e. it can withstand a drop from 17th floor, but breaks from the 18th).
Note that the ever-popular binary search will give you a worst case of 50 drops. You should be able to do it with under 20."
Maybe there's something I don't understand, because my reaction to that question was "lolwut?"
In that scenario, with a binary search you drop from the 50th, 25th, 13th, 19th, 16th, 17th and 18th floors.
By my back of the postage stamp calculations you need to do (log N) + 1 drops...
My question: how did they get a number of 50 for binary search??? Is it because you're limited to 2 failures? And by 'binary search' they mean something like start at floor 2 and drop, if it is still intact go to floor 4 and then drop it, if it breaks you don't know about floor 3, so you have to use the remaining one to check that. That's not binary search, that's linear.
To do it in 20 or less drops you could drop it from the 20th floor, and if it smashes, go to floor 1 and start going up in increments of 1. Otherwise go to floor 39 and drop it (etc). But that makes me think it is not the best for that starting number, we can do better. Just pull out the good old (n^2 + n)/2 formula again and the first one in that progression that ticks over the 100 threshold is 14. So we drop it at floor 14 first, if it breaks then start from floor 1 and go up. Otherwise go up 13 floors and repeat...
Our major floor increments are 14, 27, 39, 50, 60, 69, 77, 84, 90, 95, 99.
I'm not sure if there is a better solution, if you have one, feel free to chip in. :D
Suppose that bulbs have a tolerance of 8.5. Doing a naive binary search causes you to drop at floor 50, breaking a bulb, and then at floor 25, breaking your second bulb. You have no more bulbs, and cannot discriminate between 25 possible values for the tolerance, so your naive strategy was not a good idea.
The binary search strategy which works is to binary search with the first bulb until you break it, then exhaustively search with the second bulb starting from the lowest possible tolerance and increasing until it breaks, thus identifying the precise breaking point with the minimum number of drops. This has a worse case performance of 50 drops, for when the bulb had a tolerance of 49.5: one drop with the first bulb, bulb breaks, 49 drops with the second bulb. (1 + 48 = 49 if we assume that it can't break on the first floor.)
Your solution is very good, but I do not believe it to be optimal. I am working on a better one.
"Your solution is very good, but I do not believe it to be optimal. I am working on a better one."
Very well sirrah, I shall consider myself slapped in the face with the object with five extrusions that is topologically identical to a plane. Your challenge is accepted. :D Duelling algorithms at dawn it is. I keenly await your opening salvo.
Coded both our algorithms. I win by a nose (more visible at larger building sizes -- 1k floors, etc.) Code quality is an abomination, please excuse exploratory programming:
Gist of my approach: dynamically recalculate number of steps to go with each jump we take, using the same approach that you do (find smallest k such that k(k + 1) / 2 > n), rather than assuming it decreases monotonically. This has superior performance around some boundary conditions.
Instructions for running script:
ruby hn.rb patio11 debug 100
ruby hn.rb stormy debug 100
Debug on or off as desired, 100 is size of building.
Then, leave off debug flags, and diff the outputs.
Do you mean "incrementally by 1 each iteration" where you wrote "monotontically", in your reference to Stormbringer's algorithm?
Mathematically, I do not see how any solution could beat Stormbringer's, by considering the cases where tolerance = k+0.5 and tolerance = ((kk +k)/2+0.5, which shows that the set of tolerances for a building of height > (kk+k)/2 requires >= k+1 drops to validate. But I have not been quite formal in my analysis.
Neither of these proposals is optimal; patio11's proposal takes linear time in the worst case.
IIRC, the correct solution to this is in sqrt(N) time; for a 100-floor building, you drop at floors 10, 20, 30, 40, 50, 60, 70, 80, 90, and 100, in that order. In general, this is sqrt(N) floors, spaced sqrt(N) floors apart.
The bulb will break at floor sqrt(N) * X [10X in this example]. Now, drop the bulb from the approximately sqrt(N) [10] floors between floor sqrt(N) * (X-1) [10 * (X-1)] and sqrt(N)* X - 1 [10 * X - 1], inclusive. When it breaks, you have the exact tolerance rounded down to the nearest floor, and you've done it with at worst 2 * sqrt(N) drops.
By the way, this is a great example of a dumb interview question; it's not particularly obvious that there is a solution in sqrt(N) worst case time, but once you're told that there is one (and, perhaps, that binary search + linear search is linear search in the worst case), it's not too difficult to get it in sqrt(N).
Also, note that average case running time analysis is meaningless if you don't have a distribution for the input; simply assuming that the tolerance is distributed uniformly at random from 1 to 100 just because you have a 100-floor building makes little sense.
For the record, Stormbringer's solution is slightly less than sqrt(n) in the worst case. floor((k * k + k)/2) = n; solve for k to get approximately sqrt(2 * n)
The (Sqrt(n) * 2)-1 solution is less efficient, but only by a roughly constant factor of approximately sqrt(2). (14 vs 19, when n=100)
Patio11's binary+linear algorithm is almost certainly the inefficient solution that the blogger hinted at.
Patio11 claims to have coded an even more efficient solution, but has not explained it.
I for one suspect his supposedly more efficient solution has a bug, but I have not inspected or tested his code.
Drop bulb a every ten floors starting with floor ten, extending to floor 100.
When bulb a breaks, start with bulb b. Drop bulb b starting from 9 floors below the floor that bulb b breaks, and extending to the floor directly below the one the b broke on.
You can probably optimize this a bit by adjusting how many floors you jump with for bulb a, but 10 seems like an easy number.
Bulb 1: drop from the Xth floor, then the 2Xth floor, 3Xth floor, and so on until you pass 100. The bulb will survive a drop from the AXth floor, but break on the (A+1)Xth floor. For X = 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 the worst case number of drops is floor(100/X) = 50, 33, 25, 20, 16, 14, 12, 11, 10, 9, 8, 7, 7, 6, 6. This narrows the break point to a range of values between AX+0.5 and (A+1)X - 0.5, and leaves X-2 floors whose result must be found.
Bulb 2: iterate over the X-2 unknown floors checking them too. Worst case performance is X-2 further tests. Total worst case performance is floor(100/X) + X - 2 = 50, 34, 27, 23, 20, 19, 18, 18, 18, 18, 18, 18, 19, 19, 20.
So pick X = 8, 9, 10, 11, 12 or 13. For example, drop the first bulb from floors 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88 and 96. Worst case scenario is the bulb survives being dropped from floor 88 but breaks when dropped from floor 96. That's 12 drops.
Drop the second bulb from floors 89, 90, 91, 92, 93, 94 and 95. Worst case scenario is the bulb survives being dropped from floor 94 and breaks/survives when dropped from floor 95. That's 6 more drops.
Total is 18 drops.
The problem with this question is that you could just randomly pick 10 and find the right answer without having to do any calculation. Meh.
Ah, I see now though that if the bulb passes at 8 then you have reduced the problem to a 92-storey building so the "step size" upwards should be reduced. You don't even need to do that to get a good answer though. Ridiculous.
For 100 storeys, you drop the first bulb at floors 9, 22, 34, 45, 55, 64, 72, 79, 85, 90, 94, 97, 99 and 100. If the first bulb breaks, iterate over the gap that's left over. Worst case is 14 drops total.
"The problem with this question is that you could just randomly pick 10 and find the right answer without having to do any calculation. Meh."
True, but consider it as a test of intuition. Why pick 10? Because it is √100 (or maybe because it is log(100)). The intuitive approach doesn't get you the perfect answer, but it gets you into the ballpark.
Having picked 10, we might then realise that the 'worst case' changes and gets worse the higher up the building we go, so then intuition might say that we need to tweak the step size somehow to get down the size of the worst case. Or we might think that that step size works fine for a 100 storey building, but what about a 10,000 storey building? Our intuition is telling us the algorithm can still be refined.
But then maybe we look at what you wrote and think to ourselves, 'what if the best sequence isn't linear, but is based off the fibonacci sequence instead?'. And then maybe we tootle around a bit with our theoretical fibonacci based algorithm. Maybe we decide it isn't quite as good because it has a worse worse case. But then we might have another intuition, that perhaps there are other ways of measuring which is the better algorithm. So lets say we measure the average, and lo and behold the theoretical fibonacci algorithm has a better average, but a worse worst case. We might be confused at that point as to which one was 'better', because they are each better by different measures. But now we can put it into a broader context and reason about which one to use in which situation.
Moreover recognising that this sort of problem can be decomposed into smaller chunks is not a bad thing. Having a good intuition towards decomposition is an important part of design.
Yes, it's because it's limited to two failures. If you drop the first bulb at floor 50 and it breaks, you have only one bulb left, so you MUST not break it until you have limited the possible range to 1, so you have to just incrementally test 1, 2, 3, 4, etc. If you went straight to floor 25 and it broke, all you know is 0 < x < 25, and you have no bulbs left.
Yes, it's because you're limited to 2 failures, and yes, "binary search" is a dumb way to describe it since you can't actually do a binary search in that situation.
Working in increments of 20 floors gives you a worst case of more than 20 drops. 20, 40, 60, 80, 100 (bang), 81, 82, ..., 99 (bang): that's 24 drops.
Your other solution is better, and indeed dropping first at floor 14 is one of the 6 initial choices that minimize the worst-case number of drops. If, subject to minimizing the worst-case number, you also want to optimize the average case (assuming all possibilities are initially equally likely) then doing floor 14 first is optimal.
Even if the lightbulbs are breakable I think you would need a maximum of (2√n)-1 drops - i.e. go up in increments of √n then when it breaks work forward from the previous drop.
It's a fairly cool recursive problem. Let N be the number of floors.
OneBulb(N) = N. With one bulb and 100 floors, you have no choice but to drop the bulb 100 times, once for each floor.
With two bulbs, a single drop at floor K has two outcomes. If the bulb survives, then you can consider all the floors above K as a separate building, and start again with two bulbs and N - K floors. If the bulb breaks, you can consider all the floors below K as a separate building, and start again with ONE bulb and K - 1 floors. So:
TwoBulbs(N) = min { 1 <= K <= N } ( max(TwoBulbs(N-K), OneBulb(K-1)) ) + 1
Now, the same is true when you go to three bulbs:
ThreeBulbs(N) = min { 1 <= K <= N } ( max(ThreeBulbs(N-K), TwoBulbs(K-1)) ) + 1
And the recursion continues.
I wrote a small Perl script which solves for arbitrary B (number of bulbs) and N (number of floors). For B = 3, N = 100, you can do it in a worst case scenario of 9 drops.
3 bulbs. 100 floors.
drops[3][100] = 9 drops.
Drop bulb at floor 8.
Bulb survives: consider floors 9 to 100 as a separate 92-floor building and observe drops[3][92] = 8 drops.
Bulb breaks: consider floors 1 to 7 as a separate 7-floor building and observe drops[2][7] = 4 drops.
I know it's supposed to be purely hypothetical, but I can't help thinking that the bulbs will be more prone to break after being dropped multiple times, and might break before they normally would have under different circumstances.
I HATE interview questions that ask the candidate to write some trivial piece of code. Particularly the ones that need to be written in C to give an optimal answer "if you just do pointer math, this would be trivial..."
Not to say that some good candidates won't be excellent at C coding, but these problems often test for specific knowledge, when they should be screening for critical thinking skills. I've seen candidates who were "good" at this sort of problem (probably because it was on a CS homework in College), but didn't have much in the way of critical reasoning skills.
I prefer asking questions that require them to devise some sort of algorithm to solve a problem (typically by walking through the situation methodically), and then implement that algorithm in the language of their choice. No real ahas or tricks, and I don't care if you write it in Python or Assembler...
These questions remind me of the time when I took AP Computer Science AB, many years ago (10 years ago to be exact Freshman year). To be honest, many of these concepts are good to know, but I find a much more useful test is to test them on something that they probably haven't used before, especially a technology that you use in your project.
For instance, our project uses a good deal of Jabber, so we test our applicants if they can write a program to act as a bot responding to the Jabber message "ping" with "pong." It tests their ability to develop rapidly and to justify their choice of a library and how fast they can pickup our technology. It's been a great filter as well as a useful test to see if they can assemble a solution on their own. Even non-working solutions work well for us because we can examine their thought process.
Asking them to develop a sort solution isn't useful for us as much as seeing if they can develop code relevant to the components we use--because, let's face it; you aren't going to be writing your own string class from scratch, but it is good to know how one works. And in our case, they'll learn about XMPP/Jabber if successful--and it will test to see if they are willing to learn. Seeing which library they use is also interesting, because it tests their system design skills, and in some cases, will indicate if they can make good design decisions. But if we do have doubts about them being able to assemble a program, we will test them computer science questions, but in the meantime, we want to see if they can engineer a solution.
But if you are looking for a low-level programmer to develop you a compiler from scratch, maybe these are the right questions to ask. But we don't expect our applicants to build a hammer; we want them to know how it works, but we don't expect them to build us one. I guess if we need a hammer maker, we will find one, but if we need an applications programmer, we will have them use the hammer. Just like your engineer knows how a spectrum analyzer works, but won't be expected to build one. Unless you are hiring them to do that.
What is the next line in the following sequence:
1
11
21
Answer: it's 1211 and the next is 111221
Why would 31 and 41 be illogical answers? I understand where 1211 comes from, but 31 and 41 are totally logical IMO. It's a linear function that goes up by 10.
All problems of the form "Provide the next number of this sequence" are attempts to demonstrate that you think, or are capable of thinking, in a fashion similar to the way the decisionmaker thinks. 31 is a totally reasonable answer to that question, mathematically. So is pi. So is "zebra". (The question was "Provide the traditional lexicographic sorting in English of the elements zebra, one, twenty one, and eleven, except o comes before e.")
The reason that 31 is not a good answer in terms of securing a job is because your decisionmaker prides himself on being intelligent and he does not see arithmetic sequences as being sufficiently impressive. The reason pi is not a good answer is, despite there being uncountably many mathematical phenomena for which {1, 11, 21, pi} are correct answers, it is insufficiently elegant/tricky for what he has in mind.
I agree about 31 but not necessarily about zebra or pi. A reasonably meaningful metric is compressibility of the sequence, or how short code could produce it. Although that just opens a different can of worms, it could be argued that 31,41 is better solution than the one given in answer in OP which is better than pi or zebra.
That is begging the question -- it moves the "I am totally winging this because it makes me feel superior" from the "prefer 'elegant' answers" to "prefer 'elegant' languages for writing compressed descriptions of problems".
A rigorous program expressing the sequences to which "pi" and "zebra" are the next answers takes only 0 bits and 1 bits respectively to compress in Patio11, a pedantic language created purely for the purpose of this post. It is far more rigorous than either English or C-style pseudocode, has a well-defined grammar, and could have an interpreter written in any common programming language that would fit on the side of a business card.
I stick with the original contention: any problem of this nature is about either identifying birds of a feather or lording one's intellectual superiority over the candidate.
A general "solution" to these kinds of sequences, if the sequences are mathematically related, is the method of common differences (e.g. http://www.purplemath.com/modules/nextnumb.htm). It would give 31 in this case, but it works for many other straightforward sequence-type problems. The basic idea is you calculate the difference between every two adjacent numbers to create a new sequence, then do the same for that; often the pattern becomes very clear then.
Ironically, the "problem" at hand appears to have come straight from that tutorial:
> Find the next number in the following sequence: 1, 11, 21, 1211, 111221,....
> This looks like a "math" sequence, but it isn't really. Instead, each term is a description of the preceding term. The first term is just one "1": 11. The second term is two "1's": 21. And so forth:
one 1: 11
two 1's: 21
one 2 and one 1: 1211
one 1, one 2, and two 1's: 111221
three 1's, two 2's, and one 1: 312211
So the next term is "312211".
An absurd question that reveals little about a person's programming ability. I would change the question entirely:
On a Cartesian grid is drawn a solid logo (e.g., show a picture of the Batman logo from the 1989 film), where each square represents a black or white pixel. How would you represent the image if you had to type the data points by hand for the computer to draw?
Probably for no reason other than it's too easy. Generally these questions seem to not be oriented around "seeing how you think" as much as they are about seeing if you come to the conclusion the interviewer wants you to come to.
I'm surprised that these are commonly asked and I'm guessing these aren't for a position building web apps.
Most of these are distant memories from a CS degree and rarely used concepts in the word of y-combinator startups where Ruby and Python are common.
I think people building web apps look more for developers who write maintainable code, design good user interfaces, write good tests, make cross browser solutions, etc.
These would be nice to have but too low level for everyday use at most startups.
At top-tier software companies, expect all of the (applicable) questions to be accompanied by the "and what is the runtime complexity of your solution" followup question. Depending on your abilities, the interviewer will try to lead you towards an optimal solution.
I've done a fair amount of web development in PHP & Python (as well as introductory C++ & Java). I had to look up "linked list" on Google just to get some understanding of what that is.
I'm going to play devil's advocate and say that while it is good to know these things, they're no longer essential. I graduated two years ago, and have been developing in Ruby since then, I haven't seen a linked list since College. Why?, because I chose to work with a newer language, with a higher level of abstraction. The programmer of the near future will have a different set of challenges to previously. In the same vein, the guys who graduated 10-15 years ago weren't learning Fortran. But 25 or 30 years ago, I'm guessing they were.
Here's a quick example of a case where knowledge of data structures and algorithmic complexity is useful, regardless of the language.
Let's say you've got two lists, named "A" and "B", which each have 1000 integers. You want to return a list of unique integers which are present in both lists (ie the intersection of two lists). These 'lists' could be arrays or linked lists.
The brute force method would be something like this:
Out = []
for a in A:
for b in B:
if a == b and !Out.contains(b)://O(n) list search
Out.append(b)
return Out
But this can be very slow, because you can end up in the territory of O(n^3) iterations across A, B, and Out (internally the language would generally be iterating across Out in order to complete that 'contains' call). In this specific example, that works out to be something along the lines of 1000x1000x1000 = 1000000000 iterations, vastly larger than the size of the original lists.
---
A better way is to sort one or both of the lists then do the comparisons along each of them (this syntax assumes the lists are specifically arrays, but it'd effectively be the same for linked lists):
Out = []
sort(A)
sort(B)
b = 0
for a in xrange(len(A)):
vala = A[a]
valb = B[b]
if vala == valb:
Out.append(vala)
++a
++b
elif vala < valb:
++a
else://vala > valb
++b
return Out
This is definitely better than the above example. Now we're performing two sorts, each O(n log n), then we're doing a linear iteration across both of those lists in parallel, each O(n). So now we end up with an overall complexity of O(n log n) as a result of those initial sorts. Let's estimate the total number of iterations to be around, I dunno, 10000(sort)+2000(iterate/compare) = 12000? The exact number of comparisons in a sort can vary by algorithm used and how things were ordered in the lists to begin with.
Not bad, definitely better than what we were doing before. But we might be able to go a little better...
---
Yet another way is to use some hash sets, which (generally) have O(1) insertions and retrievals, and only store unique values, such that if you insert the same value twice, the second insertion is effectively ignored. We can do something like this:
Out_set = set([])
A_set = set([a in A])
for b in B:
if A_set.contains(b)://O(1) set search
Out_set.append(b)
return list(Out_set)//optional, could return Out_set
Now we end up with an algorithm which is O(n), where we iterated over the items in A once to fill in A_set, then we iterated over the items in B once, and each "contains" call was an O(1) hash lookup inside of A_set. Then finally we iterated over Out_set once to create the output list. This final step is optional, we could also have just returned Out_set directly, but it doesn't effect algorithmic complexity in either case. Now we've got 2000-3000 iterations, depending on how whether we return a set or a list. And this additionally looks a bit simpler than the sort+iterate version I gave above.
---
So just by using our knowledge of data structures, we've turned ~1000000000 iterations into ~2000-3000 iterations. This is on the order of reducing the distance to the moon to around a kilometer. And that's just with two lists that are limited to 1000 items each.
And this sort of thing is a pretty common scenario in any language, no matter how 'abstracted' it is.
Thanks a bunch for the detailed examples. I've noticed this is a huge problem in my code: too many nested for loops. I've generally gotten better about using built in functions like set (in Python at least):
Out = list(set(A+B)) # right?
Or even DISTINCT in SQL (when dealing with webapps).
And now I never doubt the usefulness of fuzzing the DB with hundreds of thousands of random entries. That quicky code that took 40ms to run over several dozen entries turns into a minute or more for several thousand.... Ouch! ;-)
A linked list is a pretty elementary data structure. That being said, these questions seem like they're for a job writing client software on Windows some time before .NET existed.
Of course, way back in the dark ages when I wrote VB for client Windows apps I was all "lets keep this nice and simple, nice and simple.... arrrrrrghhh, why doesn't this POS language do what I want?! HULK SMASH", and then I'd break out the Win32 APIs and hit it with the big guns.
I feel a little sorry for the 'normal' VB programmers, because they'd be following my code and then BLAM it'd disappear into hyperspace, and re-emerge and something magical† would have happened.
It's worth noting that Python has just one linear structure - lists* (Ruby avoids lists and just has arrays). Now, they're actually much fancier under the hood, which is why you can get away with not having a bunch of choices, but it's the approach of "Oh hey, I want to store a bunch of stuff in a sequence of some sort - why should it matter how that's implemented?".
Now, I happen to agree with you that it's important to know how a basic linked list works. Beyond that, though, is far less important to some people than others.
* Okay, I lied. Halfway through writing this I remembered tuples (duh!), and I'm sure someone is going to point out several other things I've missed.
This function would be good if you know that all other integers are in the array once. However, this isn't elaborated within the question. A perfectly legal array with the restrictions we've been given is this:
[1, 3, 3, 9, 42]
All numbers in the array are integers between 1 and 1,000,000, and one integer is in the array twice.
As for the solution, allocate 1,000,000 bits and do a "bucket sort". If the next number in the array is n, check if the nth bit is 1. If it is, you've found the integer appearing twice. If the nth bit is 0, change it to 1 and continue.
picture it like finding the area of a trinagle that is half of a square (long side goes from diagonally opposite corners)
formula for area of a square is n^2, so we need half that.
What about +n part of it? It is because we are dealing in discrete chunks, not smoothly continuous.
---
Alternately, imagine a series of children's blocks
1
2
3
now we have a 'triangle' with left side 3 high and bottom 3 wide. IF we make an identical triangle and rotate it round so that they fit together to make a rectangle, it looks like this
1 + 3
2 + 2
3 + 1
This makes a rectangle 3 x 4, and we should be able to figure out that this gives us an area of 12. So the original triangle ( 1,2,3 ) gives us half that and thus the sum from 1 to 3 is 6.
So the pattern for summing the numbers 1 through N is to take a square of length N, add 1 to one of its sides, so it has sides of length N and N+1, and then find the area (which is always N^2 + N) and then halve that.
> The interviews were pretty evenly split between very large, large, and startup-sized tech companies.
Really? One person is looking for a job in a big company or a startup?? One of the most common programmer interview questions is What type of job are you looking for? I suspect that "Oh, I don't know, maybe cubicle dweller in BigCo, but then again a startup might be fun" is not the best answer, and describing each company as being your ideal destination is dishonest.
Then again, another possibility is that he looked for a startup job, and when he couldn't find the right spot he tried some medium sized companies, and so on. That might make sense in an interview: "I was really interested in exploring a startup, but when I actually looked into it I realized it was far more romantic in theory than it was practical. I like the idea of writing great software, but it turns out that the companies I met with aren't very committed to some of the best practices that interest me."
So I won't jump to conclusions about the OP, but certainly I am interested in hearing what chain of events led to what appears to be a wide spread of environments.
I honestly don't think that everyone cares about the size of the company they're working for. I'm currently applying for internships for this summer and I am far more interested in what kind of work I could be doing than how big the company is. I don't think that's uncommon.
I know that many people don't care about the size of the company they're working for, but turn it around: Do you think that employers care about whether an applicant cares?
To make this perfectly clear, let me propose this interview question: What is more important to you: The company and culture or the work that you'll be doing?
I think that people being people, they will prefer applicants who honestly are more interested in the company and culture than the work. This is absolutely true of BigCo, and applies to startups as well. What if the company needs to pivot and you are no longer working on whatever excited you about coming to work on day zero? Will you start looking for somewhere else?
Well, I'm a student, so it is entirely possible that I underestimate the importance of company and culture. That being said, I'm not sure why I should care more about culture than my actual work. I obviously want a certain work climate (no annoying micro-management, smart and friendly co-workers etc.), but that should be possible in big companies in startups, no?
> What if the company needs to pivot and you are no longer working on whatever excited you about coming to work on day zero? Will you start looking for somewhere else?
Honestly: If it is a massive change for the worse, why wouldn't I? Example: I am interested in algorithms and low-level bit-fiddly stuff. I don't particularly care about HCI and web design. If I was to join a company that works on, say, databases and file systems, I would probably do it with the expectation to work on the kind of problems I find interesting. Would it really be unreasonable to look for something new if it turned out I could no longer do that?
It isn't for me to say what you should look for. I can tell you my experience of what employers want, but then again most employers probably want .NET experience, so what most employers want isn't the most important thing. The question is, how do you find the one employer that's right for you!
One way to find a middle ground between your interests and a company's concerns is to restate your interest in terms of the company culture. For example: I'm looking for a team that owns its stack and uses that ownership to do interesting and disruptive things with its product. For that reason, I'm less interested in the size of the company than in the way it approaches technology. If a team is working from algorithms and bit -twiddling on up, I'm sure I'll be happy.
That answer might apply to any company from Apple on down to a YCombinator startup.
It bears repeating: good interview(er)s will try to determine how you think, not your end result.
I do a decent amount of interviews (maybe 2-3 a month), and we by and large are looking for people who can think through a problem, defend a solution, ask good questions, not give up, and appear to be cool to work with. That's it. Some of these questions are great, fundamental CS stuff that I would expect any engineer to have in their back pocket, but some are trivia. Trivia is great, but if you dont't know it I wouldn't hold it against you.
I guess my point is to relax. If you thought about some of these things and answered them, great. You're better prepared for next time. Memorizing answer sheets doesn't help in the long run
If I were given questions like these, I would politely refuse to answer them explaining they have nothing to do with the important things a typical software developer does.
Having a solid portfolio of past successful projects to demonstrate your abilities (including participation in a open source development) is way safer than being able to solve artificial problems.
And we would politely end the interview and not call you back. Questions like these are not made to decide who to hire -- They are made to establish a base level of competence.
The ability to answer these questions does not prove anything.
However, the INABILITY to answer these questions says quite a lot.
> There is a lesser known variation of "find a cycle in a linked list"
There's also a lesser known alternative to Floyd's tortoise-and-hare algorithm called Brent's algorithm. It's a little faster in practice, but more importantly it represents an algorithm design technique that everyone should know.
The bottom line here is that though today's interviewers' repertoire is a little bit different, it only takes a small practice to excel in the questions they ask. Author seem to make the same point.
Usual comments from "ninjas" in this thread are very amusing though.
These sequence questions drive me up the wall. It's so unbelievably stupid that out of the near infinite functions that can produce a given list you should come up with the one the interviewer thought was the "obviously correct" one. The sequence in the example: 1, 11, 21, x ... with the information given then the X=31 is just as valid as anything that the interviewer came up with.
All this is still useful. I have to fix code all the time where the original developer paid no attention to complexity and a algorithm that was fast with the 10 items they tested with is dog slow once it makes it to production and there are 10,000 items.
Problem is, when would you ever do any of that in the real world?
Has anybody ever needed to reverse a string in place since the 80s? (Demo scene excluded of course, they swallowed a different colour pill from the rest of us)
I'm aware for instance of the XOR trick to swap to variables in place. I think I used it once in the last ~20 years, and really only then because I was being pleased with my excessive cleverness.
These days I wouldn't bother, because it would just make the code fractionally harder to read, it makes for a speed-bump for the mind, whereas spelling it out with a temporary variable lets your brain keep cruising at interstate speeds.
Yep, me too. I'm pretty nervous at interviews already, so I reckon I'd melt down if I was presented with some of the questions without having vaguely prepared.
But I suppose that's why the page is there, not so we can feel stupid but so we can re-engage that "solve arbitrary problem X" mentality.