Yeah Show HN has a pretty interesting distribution compared to standard posts due to the long-term visibility on the Show page. The odds of a Show HN post breaking 10 points is significantly higher than an average post, but of the posts that clear 10 points, I recall the likelihood of breaking 100 points to be similar to a regular post.
As a sidenote: That clock is so cool: I was just mesmerized for multiple minutes!
The code provided is to reproduce the analytical results from the annotated data; my impression is that you're more interested in the details of the annotation process than running into an issue with that code?
My company's core technology extends topic models to enable arbitrary hierarchical graphs, with additional branches beyond the topic and word branch. We expose those annotations in a SQL interface. It's an alternative/complementary approach to embeddings/LLMs for working with text data. In this case, the hierarchy broke submissions down into paragraphs added a layer to pool them into submissions, and added one more layer to pool them by year (on the topic branch).
Our word branch is a bit more complicated, but we have some extended documentation on our website if you are interested in digging a bit deeper. Always happy to chat more about the technical details of our topic models if you have any questions!
I totally agree that the metric is imperfect for a long term analysis. I was initially leaning toward a quantile based approach to really focus in on topic trends over time, but when I was initially exploring the data, the relative challenge of having a Show HN become popular in 2025 compared to previous years caught my curiosity, and for this decade I felt a static cutoff provided a simple and easy to understand threshold.
I do think as a metric for total reach, a static cutoff actually works reasonably well. I think some form of square root normalization over total users is probably the best balance.
Thank you! I currently don’t have much insight to this current trend. At the time of this analysis I hadn’t even heard of Clawd but that would definitely be worth my revisiting.
I was planning on doing this yearly but the Clawd excitement is definitely worth diving into.
> Code gets simpler because it has to, and architecture becomes explicit.
> The real goal isn’t to write C once for a one-off project. It’s to write it for decades. To build up a personal ecosystem of practices, libraries, conventions, and tooling that compound over time. Each project gets easier not because I've memorized more tricks, but because you’ve invested in myself and my tools.
I deeply appreciate this in the C code bases I work in (scientific computing, small team)
I generally try to use C++ as a "better C" before the design complexity makes me model higher-level abstractions "the C++ way". All abstractions have a cognitive cost and C makes it simpler and explicit.
Personally, I tried that, but it already breaks down for me, once I try to separate allocation from initialization, so I am back to C really quickly. And then I want to take the address from temporaries or create types in function declarations, and C++ is just declares that to be not allowed.
I actually conducted a similar analysis back in December. I was more focused on discovering the topics that most resonated with the community but ended up digging into this phenomenon as well (specifically focusing on the probability of getting over 100 upvotes)
The really interesting thing is that the number of posts were growing exponentially by year, but it was only in 2025 that the probability of landing on the front page dropped meaningfully. I attributed this to macroeconomic climate, and found some (shaky) evidence of voting rings based on the topics that had a unusually high likelihood of gaining 10 points and an unusually low likelihood of reaching 100 points given that they reached 10.
I did not conduct a deep dive into the specific examples: this was my takeaway from a slope plot comparing which topics clear a 10 point threshold (eg escape the new page) vs which topics clear a 100 point threshold.
> Nearly every AI related topic does worse once it clears the 10 point threshold than any other category. This means that either the people looking through the New and Show sections are disproportionately interested in AI. This is very possible, but from my interaction with this crowd from my posts, these users tend to be more technically minded (think DIY hardware, rather than landing-page builders).
It's good to know that this would be helpful. My tendency would be to dig in a bit more into the individual examples that fall into this more suspicious bucket before presenting this evidence formally, but curious if you think these high level results are sufficiently helpful?
No, seriously, you should find out by emailing the mods. Footer contact link. They’re not going to be upset at you bringing tentative concerns about voting rings with shaky evidence, so long as you aren’t knocking down the door with overconfidence and denying how shaky the evidence is — which you clearly aren’t.
I’m not even remotely equipped to judge the veracity of your work, but they are, and that you care at all is, like, 0.000001%. Take the plunge and write them a note (or simply link them your comment thread here with a one sentence FYI email!). It’ll be fine :)
When I was in high school, I got hit head on by a car while walking. It wasn’t going fast but I got thrown 1-2 feet in the air and landed hard on my backpack.
Both my Thinkpad and I (thanks to my Thinkpad) were totally fine, and I continued to use it for 4 more years.
That might be true for today’s laptops, but back then laptops had a lot more empty space to compress. Combined with a tough but flexible shell, the thinkpad might indeed have saved him!
I don’t love these “X is Bayesian” analogies because they tend to ignore the most critical part of Bayesian modeling: sampling with detailed with detailed balance.
This article goes into the implicit prior/posterior updating during LLM inference; you can even go a step further and directly implement hierarchical relationships between layers with H-Nets. However, even under an explicit Bayesian framework, there’s a stark difference in robustness between these H-Nets and the equivalent Bayesian model with the only variable being the parameter estimation process. [1]
reply