Is there a list anywhere of which Computer Science (and related) textbooks are worth reading? (Perhaps based on which universities are actually using them?)
Personally, I find that the research literature is usually (intentionally or unintentionally) opaque and that most ideas worth learning are at least 5 years old, so textbooks are the best place to actually learn things from a source intended to teach you the material.
The list I could actually vouch for would be:
CLRS
Russell and Norvig
Introduction to IR (Manning et al)
Database Systems: The Complete Book
Others that I've heard good things about are:
Dragon Book
Convex Optimization (Boyd)
Probabilistic Robotics
Foundations of Statistical NLP
I'd really like a good distributed systems or distributed databases book (rather than reading old Lamport papers or something), as well as a good statistics/probability book, and I'm not particularly interested in books about coding/engineering style any more (e.g., Code Complete, Pragmatic Programmer, Matz's Ruby book, K&R).
For distributed systems I'd read Lynch's book on Distributed Algorithms.
For stats Michael Freedman's book, Statistics, is a good simple introduction. Someone mentioned Calculus and Statistics (http://www.amazon.com/Calculus-Statistics-Dover-Books-Mathem...) here on HN a couple of weeks ago. I had looked it when it came out and re-reviewed it with that thread -- I really like it a fair bit more than I remember and a better text for those looking for a more rigorous treatment than Freedman -- although probably still too simple if you're reading Foundations of Statistical NLP.
There's also Scott's book on Programming Languages, which is worth reading.
I've read quite a few comments on HN recommending Appel's Modern Compiler Implementation (ML, Java, and C versions exist) over the Dragon Book although I haven't read either so I can't vouch for it.
Queinnec's Lisp in Small Pieces is a superb book, if you liked that you might also like Henderson's book on implementation of functional PLs (particularly SECD machines.)
Regarding the list: I think it's a nice effort, but lacks several excellent and remarkable books. Just quickly browsing, I found the following essential books missing:
* In algorithms:
- Aho, Hopcroft, Ullman: The Design and Analysis of Algorithms.
- Wirth: Algorithms and Data Structures.
* In Compilers:
- Wirth: Compiler Construction.
- Grune, Bal, Jacobs, Cerial: Modern Compiler Design.
- Muchnick: Advanced Compiler Design and Implementation.
* In Lambda calculus:
- Barendregt: Introduction to Lambda Calculus.
* In theoretical computer science:
- Kozen: Automata and Computability.
- Davis: Computability and Unsolvability.
* In concepts of PLs:
- Turbak, Gifford, Sheldon: Design Concepts in Programming Languages.
(In general, the list does not mention any of Wirth's books, which is a shame. The Project Oberon book should also be mentioned in OS stuff, I guess...)
Regarding compiler construction: As has been previously mentioned several times on HN, Cooper's and Torczon's "Engineering a Compiler" is a more recent and (IMHO much more readable and accessible) choice.
Lamport's papers are exceptionally readable, I should note. His discussion of e.g. Paxos is still current and interesting and still has implications for distributed databases even though that algorithm is too expensive for frequently (like, order of magnitude hundreds or thousands of changes a second) changing data.
The Dragon is mostly old and hoary by today's standards; we frequently use GLR instead of LALR(1), it has no discussion of interesting modern developments like packrat parsers, there is next to no discussion of any part of the compiler that isn't parsing, etc. A product of its time, but not an edifice for the ages in the way that SICP is.
If you find a good stats/probability book, let me know; I've been looking for one for years and come up short. Russell and Norvig is ironically superior to most texts in the field.
You're wrong about _any_ edition of the the Dragon.
The 0th edition (green dragon, Aho&Ullman) was mostly about parsing, but talked about everything.
The 1st edition (red dragon, Aho,Sethi&Ullman) was about everything not including JITs, vectorization, GCs and parallelization.
The 2nd edition (purple dragon, Aho,Lam,Sethi&Ullman) is about everything, and they even dropped operator precedence (Pratt) parsing to make room for more non-parsing stuff.
there is next to no discussion of any part of the compiler that isn't parsing
Are you sure? There are 200 pages that cover lexing and parsing -- of a 950 page book.
If anything I'd argue that they've largely punted on parsing given that it's a lot less important than it used to be (which may explain why they didn't add GLR or packrat parsing) and focused on things that were considered more interesting issues in compilers.
I agree. I hate the Dragon book, and I particularly hate that it's the go-to book for learning compilers. It should not be. I recommend "Engineering a Compiler" by Cooper/Torczon, or "Modern Compiler Design in X" by Appel.
Having some background in statistics, but none in either linguistics or NLP, that book was a revelation. If you read, and implemented all the exercises in that book you'd find a way to make millions, as NLP is a big deal right now. I did find a little too much concentration on the low level stuff (character parsing, bag of words etc), but in conjunction with Elements of Statistical Learning its wonderful.
Jurafsky and Martin's Speech and Language Processing complements ESL and Introduction to IR well by focusing on 'high-level' NLP, but its downside is that it focuses perhaps too much on linguistic terminology & theories instead of algorithms & implementations.
I second CLRS - I just finished an algorithms course taught from that book, and I can't imagine a single textbook being more comprehensive.
I'm surprise nobody's mentioned Knuth, though maybe that goes without saying?
As for probability and statistics, I haven't really found anything (at an advanced) level that I've been happy with. Maybe it's because my background is in statistics, so it's a perception bias (I see the flaws more easily than with other subjects), but I think that most statistics textbooks are pretty rotten.
There are really only two that I'd recommend, and only one at a high level. Gelman & Hill is a great introduction to computational statistics at a high level, while still very readable (and enjoyable!)
Other than that, the only truly stellar statistics textbook I've ever seen was the one I used in my intro class in high school. It's sad, but it's a very true comment about the current state of most statistics textbooks (that I can find).
> When a free register is wanted, and there is none left on the free-storage list, a reclamation (7) cycle starts.
> (...)
> (7) We already called this process "garbage collection", but I guess I chickened out of using
it in the paper -- or else the Research Laboratory of Electronics grammar ladies wouldn’t let
me.
I was going to just suggest adding "End to End Arguments in System Design" (http://mit.edu/6.033/www/papers/endtoend.pdf) but then realized that the entire MIT 6.033 reading list is exceptionally worthwhile.
Am I the only one who thinks it's really weird that we throw around these lists of, "things programmers must read," but we don't ever throw around lists of, "things programmers must program at least X times"?
I've written essentially the same application several times in different languages, for different frameworks, or for substantially different platforms.
I'd also count a serious refactoring or structural improvement as having programmed something again.
I think this could be a really interesting discussion. My first reaction to georgieporgie's comment was the same as omaranto's. But in practice, like you say, we do end up doing a lot of code re-writing.
Suppose we are to explain omaranto's statement to someone who is not a programmer. My try is: given that (i) programs, whether in source or binary format, can be indefinitely replicated with virtually zero cost and (ii) we know how to write modular programs and (iii) there are plenty of useful modules available for re-use, we never have to substantially re-write anything from scratch.
But (iii) is arguably false due to a variety of reasons: intellectual property issues, incompatible platforms, "incompatible" programming languages, etc.
Personally, I find that the research literature is usually (intentionally or unintentionally) opaque and that most ideas worth learning are at least 5 years old, so textbooks are the best place to actually learn things from a source intended to teach you the material.
The list I could actually vouch for would be:
Others that I've heard good things about are: I'd really like a good distributed systems or distributed databases book (rather than reading old Lamport papers or something), as well as a good statistics/probability book, and I'm not particularly interested in books about coding/engineering style any more (e.g., Code Complete, Pragmatic Programmer, Matz's Ruby book, K&R).