Updated accordingly. We used floats instead of ints which means that it should h...

Terretta · on Feb 24, 2012

It would be unusual to have non integer occurrences of words in the text. But given the question didn't specify how to round, both 21 and 22 could be valid answers even with only whole number counts of words. Round down gives 21, while round by half gives 22.

Interesting to see how few of the proposed answers used an HTML parsing library (simplistic matching of potentially unknown document syntaxes is a notoriously brittle approach), and surprised how few counted depth relative to the article tag.

Given embedly's business and the setup discussion, seems like a valid solution should work with any arbitrary HTML page containing an article tag and paragraphs within it, while many of the gist lists either counted P depths by hand (!) or assumed that one particular document.

If the <article> tag or the <div> by it or the <p> tags had had so much as a space before the closing angle bracket (and forget about classes or styles) most of them would have failed. For the most part, only the solutions pulling in an external parsing lib would have still worked. Python's lxml.soupparser comes to mind (or lxml.etree for this task), and was happy to see several similar libs invoked.

Interesting that you had to replace the document with a cleaned up one to get more successful answers.

Thanks for sharing the results.