Hacker Newsnew | past | comments | ask | show | jobs | submit | BinRoo's commentslogin

Beautiful write-up, thank you.

One aspect that still bothers me is that you claim the just-say-no-engineer "was a critical role during ZIRP." I might be in the minority here, but I don't hold that same stance. I wonder if I am alone in that?


I think the "just say no" engineers were massively sidelined during ZIRP

Actually I suspect they are just massively sidelined in software in general because of how few regulations exist

The "just say no" engineer is someone who thrives in regulated environments, because (enforced) regulations are the only things that actually slow down the growth at all costs minded PMs in the world. And even then only sometimes


I think that's it. LLMs infected companies with the Rage Virus and now they're running mindlessly at anything that moves. A "just say no" engineer has no air in such an environment. It is FOMO of the most diffuse kind, with absolutely nobody knowing what the what is we're missing, but everybody (and that includes myself) knows something is going to change.

We like to dress up in suits, but we're all apes afraid of shapes in the clouds.


Personally I'm an ape afraid of being left to starve now that they have invented a mecha ape that can hunt more food faster than I can, and won't leave any food left for me.

Ever read a quantum circuit? It's all so foreign, so let me introduce to you a simple one (no speedup) called rank-select, described through an interactive story.


Great ideas, though not a huge fan of pre-commit hooks that run full CI locally :)

I'd like to add another idea: automatic PR merge contingent on another PR getting merged.


I shouldn't have to roundtrip to a central server to validate if I'm up to par. I'm not saying the tooling is there already, but we've painted ourselves in a corner by tight coupling our ci to a centralized server.


Python DSL for safe quantum programming that compiles to Qiskit, where the type system enforces coherence and ancilla cleanliness

https://github.com/binroot/b01t


LLMs fall victim to "garbage in, garbage out." Claude can solve open problems if you know what you're doing, but it can also incorrectly convince you it's right if you don't know what you're doing.

A PhD teaches you how to think, how to learn, and how to question the world. That's a vital set of skills no matter what tool exists.


Rousseau said "mathematical precision has no place in moral calculations," so I was tempted to see how far I could go. Started it during the holidays, but finally came around to putting a bow on it :)


The human still needs to think, of course. But, I can get to my answer or my primary source using a tool faster than a typical search engine. That's a super power, when used right!


The jump in productivity we had with the world wide web and search engines was several orders of magnitude higher than what you have right now with LLMs, yet I don't remember a single person back in the 2000s calling Google "the greatest tool in human history".

Almost sixty years after ELIZA, chatbots seem to still produce a very strong emotional reaction to some folks.


One of my favorite tricks in elementary school was to convince people I can calculate any logarithm for any number of their choosing.

> Me: Pick any number.

> Friend: Ok, 149,135,151

> Me: The log is 8.2

Of course I'm simply counting the number of digits, using 10 as the base, and guessing the last decimal point, but it certainly impressed everyone.


You can do even better if you memorize three numbers: 301, 477, 845. These are the values of 1000log10(n) for n = 2, 3, 7. From these you can quickly get the values for 4 (= 22), 5 (=10/2), 6 (=23), 8 (=222) and 9 (=33).

For your example 1.49 is close to 3 / 2 so the log will be very close 0.477 - 0.301 = 0.176.

This means that your answer is near 8.176 (actual value is 8.173).

This tiny table of logs can also let you answer parlor trick questions like what is the first digit of 2^1000 (the result is very nearly 10^301 but a bit above, so 1 is the leading digit).


> 149,135,151

This is 8-point-something as you say.

1.49 is in between 1.2 and 1.6 and I have memorised log(1.2)=0.1 and log(1.6)=0.2, so I would think log(1.5) is close to 0.17, using sloppy linear interpolation.

That would make log(149,135,151) approximately 8.17. My calculator also says 8.17. Your guess was good!

I have found linear interpolation such an intuitive approximation method that the tradeoff of having to memorise more logarithms is worth it.


Huge fan of minimizing Kolmogorov complexity [1]. There's a balance, but typically if the same behavior can be described in less code, then the simplicity will yield dividends.

[1] https://en.wikipedia.org/wiki/Kolmogorov_complexity


There's an article in IEEE software engineering proceedings around the turn of the millennium that said that defect count was closely associated with any complexity measure you wanted - - including lines of code.

I once said "nearly half of the problems are logic errors, nearly half are misused APIs" and got the retort "and another half are from concurrency".

Of course that as up to nearly 150%, which is probably most people's actual bug count...


What bothers me is that some programmers think that writing the code more dense is already better. But I would argue it's not the characters / less lines of code which creates the complexity, but how many logical concepts (?) you utilize to solve the problem.

Using a smaller set of different concepts also helps reducing the cognitive load, even if it leads to more verbosity ("less clever code").


Everything has tradeoffs, but there's value in reducing both line and character counts as well.

For example, nobody ever uses anything other than ijk for loop indices unless the index is particularly meaningful or they've written a deeply nested abomination. Why? It's not just laziness in typing; it gives more relative room for characters that matter. Longer names and patterns are acceptable if you can't make your point clearly enough with few characters, but length isn't the goal; communication is.

It's important to limit lines of code too (and their widths) because if an idea doesn't fit comfortably on your screen then you won't be able to leverage the pattern-recognition parts of your brain to figure out what's going on.

From a different perspective, you know that feeling you get when somebody dumps a 1000-line PR on you (or an excessively long HN comment...)? It's hard to digest because you can't grok the whole thing at once and have to switch to carefully analyzing each component just to even have the context to then give the thing a proper review. If that same PR could be wired together with a few high-level concepts (less code, but more involved baseline knowledge required to understand it), it would be instantly understandable to somebody with the same background.


This is close to what I would have written. It is almost never about actually the line count. Not even SLOC. For example different languages lend themselves to breaking lines to a different degree. In Scheme I almost always write (define name \nl (lambda (arguments) \nl ...)) Did I now waste a line? Of course not. It is still the same number of concepts and basically tokens involved as would be in Python, when I write "def name(arguments):" but Python doesn't lend itself that well to line breaking, because of its (annoying) whitespace sensitivity. Neither does this make code any more or less readable nor does it increase the chance for bugs. Same goes for many other constructs in both languages. Take a "(cond ...)" for example. I will break some lines there, because the language makes it easy, by having everything delimited with parents, while in Python I will have to type additional visual clutter to do the same.


Are you insinuating Gemini is similar in performance to o3-mini?


I've only had o3-mini for a day, but Gemini 2.0 Flash Thinking is still clearly better for my use cases.

And it's currently free in aistudio.google.com and in the API.

And it handles a million tokens.


Definitely varies by application, but the blind "taste test" vibes are very good for Gemini: https://lmarena.ai/?leaderboard


that reminds me that a week ago there was a (now deleted but has a copy of the content available in the comments) post on Reddit where the author claimed they have attempted manipulating/manipulated voting on lmarena in favor of Gemini to tip the scale on Polymarket where on a question like "which AI model will be the best one by $date" (with the outcome decided based on the scoring on lmarena) they have supposedly made O(USD10k).

Original deleted post: https://old.reddit.com/r/MachineLearning/comments/1i83mhj/lm...

A copy of the content: https://old.reddit.com/r/MachineLearning/comments/1i83mhj/lm...


Are you implying it isn't?

(evidence please, everyone)


Simple example: o3-mini-high gets this [1] right, whereas Gemini 2.0 Flash 01-21 gets it wrong.

[1] https://chatgpt.com/share/679d9579-5bb8-8008-ac4a-38cef65b45...


Great example. Thank you. Can confirm that none of the Gemini models warned about the exception without prompting.


This agrees with my limited testing so far, but in a different way: o3 being better at coding and objective tasks, with the most recent Flash 2.0-thinking stronger at subjective tasks. Similarly, o3 seems better at shorter output sizes, but drops off, tending to be lazy.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: