Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

With a parser that implements a grammar, you can prove that (a) it accepts every string that is valid as defined by the grammar and (b) it rejects every string that is invalid. The specifying of a grammar is relatively straight-forward (hopefully). Proving that an ad-hoc parser does (a) and (b) is nearly impossible.

Ad-hoc parsers can be shown to accept all "OK" strings that somebody used to test the parser and can be shown to reject all "not OK" strings that somebody used to test the parser.[1] "The problem with idiots (and black-hats) is that they are so ingenious." The only way to prove that an ad-hoc parser is truly correct is to run all possible strings through it, complete with a-priory knowledge of which strings are OK and which are to be rejected. This is an O(infinite) problem (i.e. the halting problem http://en.wikipedia.org/wiki/Halting_problem).

Guessing intent is a wormhole: how close does the request need to be? What if you guess wrong?

The combination of ad-hoc parsers with guessing intent is a potent way to introduce security flaws in your program. In the case of a web server, the "attack surface" is the whole internet, i.e. there is a huge number of idiots and black-hats that could potentially attack your program.

[1] War story: in a previous life, the company decided they needed to have a custom code standards checker program (a result of a chain of four or five decisions, all of them really stupid, but that is a different war story). They contracted out the creation of the program, complete with a requirement that the contractee company write the test cases (fox in the hen house). The program was a POS (how did you know that was coming???).

When I looked at the test cases: they had one "positive" (i.e. catches a "bad" construct) test case and NO "negative" (i.e. does not have false positive) test cases. As a result, when run on real code, the "standards checking" program was actively sabotaging good code!



Here's why I dislike it. You shouldn't rely on the HTTP request parser to save you if you have security issues elsewhere.

The HTTP parser is simple enough to not have any concerns in itself if written properly.

You should fix the security issues.


> You shouldn't rely on the HTTP request parser to save you if you have security issues elsewhere

This doesn't make sense. Why should a particular piece of the application not be coded with security in mind?

> You should fix the security issues.

One part of this is sanitizing user input. Why would you not do this as early as possible?


Because it's inefficient unnecessary overhead for static requests.

The place to block application specific hacky looking requests isn't in the general HTTP request parser. It's in the 'application specific' stuff.


The headers and such for even the most static requests still get used all over — dispatch, caches, logging, etc. The overhead is minuscule, especially compared to a hand-rolled parser that's literate enough to be maintainable.

And the purpose isn't to "block application specific hacky looking requests", it only does that as a side-effect — this isn't some inane IDS bullshit sold to PHBs. It's not looking for exploit signatures, it just sanitizes all input as a consequence of correctness.


It's quite simple really. Do you like your compiler? Or would you rather write code and hope for the best? Compilers work because they have a formal grammar of what is and is not acceptable in the programming language. This same principle is being applied to handling web requests. We have a standard - HTTP - and any requests that don't conform to the protocol are immediately rejected by Mongrel2. Since many attacks against web servers involve sending improper web requests, this sort of approach simply rejects those requests and doesn't even begin to process them. This certainly doesn't prevent Mongrel2 from implementing proper security at other appropriate places in the code. It simply stops a whole lot of potential exploits before they start.


Yeah that's not really comparable.

HTTP requests come from millions of different browsers. Some with bugs, some with idiot creators, etc etc.

My point was that an HTTP request parser is trivial to write correctly. What you do with the headers and request later on are where sometimes you need to be careful.

TBH Though I think I'm just in a different world to all of this mongrel stuff.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: