More

kianN · 2026-02-17T01:42:23 1771292543

We are going to publish that publicly next time we have a free day, though its publication will likely render the analysis redundant :)

WaitWaitWha · 2026-02-17T02:07:44 1771294064

You already posted the answer. Just a bit of review of that picture and the answer is right there. ;)

kianN · 2026-02-17T02:20:07 1771294807

Haha that’s true, but the timezone is left as an exercise for the reader for now

kianN · 2026-02-17T00:10:08 1771287008

Yeah Show HN has a pretty interesting distribution compared to standard posts due to the long-term visibility on the Show page. The odds of a Show HN post breaking 10 points is significantly higher than an average post, but of the posts that clear 10 points, I recall the likelihood of breaking 100 points to be similar to a regular post.

As a sidenote: That clock is so cool: I was just mesmerized for multiple minutes!

kianN · 2026-02-16T23:58:17 1771286297

The code provided is to reproduce the analytical results from the annotated data; my impression is that you're more interested in the details of the annotation process than running into an issue with that code?

My company's core technology extends topic models to enable arbitrary hierarchical graphs, with additional branches beyond the topic and word branch. We expose those annotations in a SQL interface. It's an alternative/complementary approach to embeddings/LLMs for working with text data. In this case, the hierarchy broke submissions down into paragraphs added a layer to pool them into submissions, and added one more layer to pool them by year (on the topic branch).

Our word branch is a bit more complicated, but we have some extended documentation on our website if you are interested in digging a bit deeper. Always happy to chat more about the technical details of our topic models if you have any questions!

Overview of Our Technology: https://blog.sturdystatistics.com/posts/technology/

Technical Docs: https://docs.sturdystatistics.com

swyx · 2026-02-17T16:41:29 1771346489

ha, great way to plug your tech. upvoting

kianN · 2026-02-16T22:17:40 1771280260

I totally agree that the metric is imperfect for a long term analysis. I was initially leaning toward a quantile based approach to really focus in on topic trends over time, but when I was initially exploring the data, the relative challenge of having a Show HN become popular in 2025 compared to previous years caught my curiosity, and for this decade I felt a static cutoff provided a simple and easy to understand threshold.

I do think as a metric for total reach, a static cutoff actually works reasonably well. I think some form of square root normalization over total users is probably the best balance.

kianN · 2026-02-16T20:31:00 1771273860

Thank you! I currently don’t have much insight to this current trend. At the time of this analysis I hadn’t even heard of Clawd but that would definitely be worth my revisiting.

I was planning on doing this yearly but the Clawd excitement is definitely worth diving into.

verdverm · 2026-02-17T04:10:59 1771301459

It could be interesting to simply plot the /show frequence vs account age at the time. I suspect an change in patterns has occurred recently.

kianN · 2026-01-30T05:47:08 1769752028

> Code gets simpler because it has to, and architecture becomes explicit.

> The real goal isn’t to write C once for a one-off project. It’s to write it for decades. To build up a personal ecosystem of practices, libraries, conventions, and tooling that compound over time. Each project gets easier not because I've memorized more tricks, but because you’ve invested in myself and my tools.

I deeply appreciate this in the C code bases I work in (scientific computing, small team)

rramadass · 2026-01-30T13:39:01 1769780341

Agreed.

I generally try to use C++ as a "better C" before the design complexity makes me model higher-level abstractions "the C++ way". All abstractions have a cognitive cost and C makes it simpler and explicit.

1718627440 · 2026-01-30T20:31:32 1769805092

Personally, I tried that, but it already breaks down for me, once I try to separate allocation from initialization, so I am back to C really quickly. And then I want to take the address from temporaries or create types in function declarations, and C++ is just declares that to be not allowed.

kianN · 2026-01-21T17:00:12 1769014812

I actually conducted a similar analysis back in December. I was more focused on discovering the topics that most resonated with the community but ended up digging into this phenomenon as well (specifically focusing on the probability of getting over 100 upvotes)

The really interesting thing is that the number of posts were growing exponentially by year, but it was only in 2025 that the probability of landing on the front page dropped meaningfully. I attributed this to macroeconomic climate, and found some (shaky) evidence of voting rings based on the topics that had a unusually high likelihood of gaining 10 points and an unusually low likelihood of reaching 100 points given that they reached 10.

Analysis here if anyone is interested: https://blog.sturdystatistics.com/posts/show_hn/

altairprime · 2026-01-21T20:40:55 1769028055

Please email the mods your shaky evidence; they care about that and have more detailed logs to investigate with!

kianN · 2026-01-21T21:30:28 1769031028

I did not conduct a deep dive into the specific examples: this was my takeaway from a slope plot comparing which topics clear a 10 point threshold (eg escape the new page) vs which topics clear a 100 point threshold.

> Nearly every AI related topic does worse once it clears the 10 point threshold than any other category. This means that either the people looking through the New and Show sections are disproportionately interested in AI. This is very possible, but from my interaction with this crowd from my posts, these users tend to be more technically minded (think DIY hardware, rather than landing-page builders).

Last visual in the following section: https://blog.sturdystatistics.com/posts/show_hn/#digging-int...

It's good to know that this would be helpful. My tendency would be to dig in a bit more into the individual examples that fall into this more suspicious bucket before presenting this evidence formally, but curious if you think these high level results are sufficiently helpful?

altairprime · 2026-01-21T21:39:05 1769031545

No, seriously, you should find out by emailing the mods. Footer contact link. They’re not going to be upset at you bringing tentative concerns about voting rings with shaky evidence, so long as you aren’t knocking down the door with overconfidence and denying how shaky the evidence is — which you clearly aren’t.

I’m not even remotely equipped to judge the veracity of your work, but they are, and that you care at all is, like, 0.000001%. Take the plunge and write them a note (or simply link them your comment thread here with a one sentence FYI email!). It’ll be fine :)

kianN · 2026-01-22T01:22:02 1769044922

I just sent an email. Thank you for the push!

riku_iki · 2026-01-22T17:20:59 1769102459

I don't believe it is possible to prove fraud base on just votes count. But data of which user upvote which post could generate very strong evidence.

altairprime · 2026-01-23T01:16:03 1769130963

The onus isn’t on us to prove; it’s to report concerns for assessment by the team responsible.

riku_iki · 2026-02-03T01:38:51 1770082731

more evidence provided -> more chance concerns won't be ignored

kianN · 2026-01-16T05:44:48 1768542288

When I was in high school, I got hit head on by a car while walking. It wasn’t going fast but I got thrown 1-2 feet in the air and landed hard on my backpack.

Both my Thinkpad and I (thanks to my Thinkpad) were totally fine, and I continued to use it for 4 more years.

hahahahhaah · 2026-01-16T05:50:15 1768542615

Physics says the thinkpad didn't save you if it was fine.

thesumofall · 2026-01-16T06:08:52 1768543732

That might be true for today’s laptops, but back then laptops had a lot more empty space to compress. Combined with a tough but flexible shell, the thinkpad might indeed have saved him!

riskassessment · 2026-01-16T06:12:38 1768543958

The thinkpad shell could have undergone elastic deformation which could reduce peak force.

kianN · 2026-01-04T16:30:54 1767544254

I don’t love these “X is Bayesian” analogies because they tend to ignore the most critical part of Bayesian modeling: sampling with detailed with detailed balance.

This article goes into the implicit prior/posterior updating during LLM inference; you can even go a step further and directly implement hierarchical relationships between layers with H-Nets. However, even under an explicit Bayesian framework, there’s a stark difference in robustness between these H-Nets and the equivalent Bayesian model with the only variable being the parameter estimation process. [1]

[1] https://blog.sturdystatistics.com/posts/hnet_part_II/

kianN · 2025-12-22T19:19:00 1766431140

I do the exact same thing and this was my first thought. To be fair, I would probably not be able to format tables in a single cope/paste