Hacker Newsnew | past | comments | ask | show | jobs | submit | CrazyStat's commentslogin

If you ask 10 different humans to produce the spec with the same information (prompt and context) they will also produce 10 unique answers that will contradict each other and (depending on who you asked) may be just as confident.

There are real decisions to be made when going from a vague prompt to a spec. It's not surprising that an LLM would produce different specs for the same work on different runs. If the prompt already contained answers to all the decision points that come up when writing the spec then the prompt would already be the spec itself.


LLMs aren't people. They don't reason. They're token generators, a black box. Your analogy falls on its face with any scrutiny.


I didn’t claim that LLMs are people or that they reason.

If the behavior of the llm is the same as the behavior of reasonable people then the behavior of the llm is reasonable, regardless of how black of a box they generate tokens out of.

Reasonable people will generate divergent specs for the same prompt. Thus it is reasonable for an LLM to generate divergent specs out of the same prompt.

Edit: I use “reasonable” here in the legal sense of the “reasonable person” standard, not to imply any reasoning process.


[flagged]


Please point to where in my initial comment I indicated that LLMs are human or reason.

If you are unable to do so please withdraw your accusation of gaslighting, a serious form of psychological abuse, and apologize.


Aren’t people pattern matching neural networks as well? Why does being a token generator mean something is unreliable?

Further, why does that mean “it doesn’t reason”. Logic can be encoded in language, symbols or code. If I say “all apples are red” -> “all fruit in the bowl are apples” -> “therefor all the fruit are red”. It doesn’t really matter if I understand the logic or what red is or fruit/apples are, the logic is contained in the structure of the syntax. If an LLM can output the conclusion reliably from predictive operations it is able to have the effect of reason and we don’t need to know or care about whether it “understands” the reasoning.


no, brah, humans are TOTALLY different. just don’t think about it too hard. we are just special.


it's an analogy, it didnt fall on its face at all. it's just a comparison to highlight the point being made was nonsensical. example: you're just a next action generator controlled by trillions of cells and subconscious dna-based behavior. a black box.


> you're just a next action generator controlled by trillions of cells and subconscious dna-based behavior.

With moral agency and the ability to learn (even if we presume you are correct, which I don't think you are).


moral agency and the ability to learn are implicit in the description you quoted. this isn't some special superpower, all animals have the ability to learn, and many have moral agency. these aren't human specific traits


Reductio ad absurdum.


exactly my point lol


It appears they don't need to reason or be intelligent to be able to produce working solutions for code. But sure let wild and unmonitored? I wrangle my LLMs like the code monkeys they are. They help materialize code and then you need to sculpt it (and test harness of varying sorts)

It really can be useful. It's very different from old world programming.


why do people insist on claiming that they don’t reason, when they clearly, for all intents and purposes, do. you can be vague; you can express your idea a thousand different ways, and you will get a unique blend of <your input bits> x <hidden reasoning layer> => semi-smoothed output. this is like some Searle Chinese Room bullshit that needs to just die. it is beyond clear that llms can interact with abstract concepts in an extremely meaningful way. this is like the “thought leader” version of the stupid-ass “it’s just smart autocomplete” argument. if you think that, it is user error— either a failure of creativity or a failure of perception or both. just because llms are not a panacea and are problematic for society and “overhyped” and whatever does not make it intellectually honest to claim that there is zero reasoning/creativity/cognition within the box.


LLMs do reason (they just sometimes don't reason well).

I assure you I've met many devs and "engineers" that reason less than LLMs, and are black boxes, especially in terms of the code they write.


> LLMs do reason

No, they don't.

They are token predictors that use statistical techniques to emit the randomly weighted next most likely token given the previous token list.

The result is a strange mimic of human reasoning, because the tokens it predicts are trained on strings that were produced by humans that were reasoning, but that's not the same thing.

Human cognition is complex and poorly understood, and the nature of the mind is an area of study almost as old as consciousness itself. We don't know exactly how it works, or what its exact relationship to the brain is, but we do know that it is not a simple token predictor.

LLMs, by their very nature are constrained to the concept of language and the relationship between existing words in a corpus. This is a box they can not escape.

Modern neuroscience suggests that the human brain is much more vast than that, and in many ways looks like it is constrained by language, but certainly not limited to it.


> They are token predictors that use statistical techniques to emit the randomly weighted next most likely token given the previous token list.

Sounds like an implementation detail. Now describe how human reasoning works and explain why that process of chemical and electrical signals results in "reasoning" whereas what LLMs do isn't.

The problem with being this reductive is you can do it to anything, including humans. You can’t be reductive about LLMs and refuse to be reductive about humans - that's poor reasoning, and an LLM would out-reason you on this point, further negating your case.


Human cognition is poorly understood and much more complex than it seems.

For an example, look at some of Julia Mossbridge's work.

If even a small part of her work is true and valid, it points to something far outside our current framework.

You don't need to go as far afield as Mossbridge, though - that's an extreme example. Pretty much any modern neuroscience will make you question a lot of assumptions, at least it did for me.


> For an example, look at some of Julia Mossbridge's work.

Never heard of her but I just spent about 5 minutes looking.

Her PhD is in communication sciences and disorders [1], but apparently she’s a quantum physicist now:

> AMELIA is built on the Causally Ambiguous Duration-Sorting (CADS) effect — a breakthrough discovery by Dr. Julia Mossbridge showing that light, under classical boundary conditions, behaves differently based on future temporal boundaries. [2]

Filed under crank, not going to bother investigating further.

[1] https://books.google.com/books/about/Have_a_Nice_Disclosure....

[2] https://americanelectrodynamics.com/#technology


You have moved goalposts from reasoning to "human cognition". I won't tolerate that sort of slippery wordplay.

Reasoning is making analogies between logical patterns found in conceptual space, with a direction of time (statements precede conclusions). For example. A => B and B => C. You may now deduce A => C. For something fuzzier, A~D and B~E, you may now deduce that D~=>E. This is the sort of thing that higher layer attention mechanism is capable of doing.

> This is a box they can not escape.

Would you say that Helen Keller was less capable of abstract reasoning because she had more constrained access to sensory input?


Reasoning requires cognition, otherwise there's nothing to reason about, no context or value system to use as a basis for reason.

Decision making can be done by trained machines following rules, but that's different that reasoning. A thermostat isn't reasoning when it decides to turn on the air conditioner, to argue otherwise expands the definition of "reason" to be so broad that it becomes useless.

LLMs are trained on human knowledge and reasoning that results from human cognition, and they are excellent at stochastic mimicry - if the argument is that they are actually reasoning, then some sort of equivalent to human cognition must be present for that to be true. Lacking that, they are nothing more than "token extrusion machines" with some potentially useful characteristics.


Why does reasoning require cognition? Isn’t a if else block or switch statement reasoning? Or a formal logic proof? If an LLM produces an output using formal logic or a python script why is that not reasoning? A human would offload the reasoning using similar methods. I know when I took the LSAT, I learned ways to diagram arguments and didn’t have to think/reason about it because the formal logic diagram did the “reasoning for me”.

Aren’t humans just “action potential” extrusion machines? What is unique about our neural pattern recognition to make our cognition different in nature rather than merely degree?

It seems clear at this point that the greatest insight that unlocked our current AI acceleration was scaling alone would unlock emergent properties and abilities.


"acceleration was scaling alone would unlock emergent properties and abilities."

Agreed but I would frame it in the negative, "don't worry about overfitting, the lucky ticket hypothesis just works "


Can you give a concrete example of something that is impossible for an LLM to ever do due to its lack of reasoning ability.


I try to avoid absolutes, "ever" is a long time. Who knows, maybe we crack the code of cognition at some point?

In the meantime, these [1] are pretty funny.

[1] https://x.com/huskirl


The problem with that is LLMs can output words or symbols that seen like it used "reason" to produce. But for everything the core algorithm does, it's simply nothing like the wetware reasoning to get to the same answer. So he didn't move goalposts. He always meant the reasoning that stems from human cognition.

Technically if it has that, it'd be singularity no? So basically the premise is they are doing nothing of the sort. Prove any LLM enough and it really does show it has no quarrels contradicting itself or being bossed around. Has no belief / no orientation etc. It's truly mindless but tricks our mind and soul (or whatever) probably.


> Technically if it has that, it'd be singularity no?

reasoning is not black and white. It is possible to reason poorly. Most people cannot do basic math proofs, even math majors struggle with the hardest math proofs. Reasoning in humans is also context/token dependent. I just spent one HOUR trying to show my mom (who has mild dementia) how to use amazon fire (push DOWN until your channel shows up, push RIGHT until the channel becomes big) and she could not figure it out. Rewrote the instructions in japanese and she followed the logic relatively smoothly. Ironically, i'm pretty sure her english is better than her japanese, vocabulary wise.

> it's simply nothing like the wetware reasoning to get to the same answer.

but you don't know how wetware reasoning works, so you are incapable of making that proclamation. I'm pretty sure when I do math proofs (I'm not an amazing mathematician) sometimes I have to literally tick my way through each step of the proof, sometimes breaking it down to super-basic substeps, which to me feels awful lot like what an LLM could be doing. For that matter we don't know how LLM reasoning works but my claim is that these LLMs are in principle capable of reasoning due to architecture.

If this doesn't make sense I suggest you look over the architecture of LLMs carefully and try to understand my point.

(BTW I'm not talking about "reasoning models" with "thinking turns", that's just marketing speak, I'm talking about ANY transformer-based model, even the "dumbest UX architecture" completion models)


Humans off load reasoning into language and syntax. Chinese encodes arithmetic into the grammar/syntax patterns better than French for example.

Your posts are generally insightful. Thanks for the contribution. Even if it’s a bit cranky and gruff :)


The structure of language encodes logic in many ways. So the models ability to reason may be an emergent property of the reasoning ability humanity has ejected an extracted from our neural networks and abstracted into language a symbols.


there is absolutely no line of demarcation between human reasoning and what you described


Wow, there are still people trying to claim they don't reason. What will they have to do before you'll admit that they can?


You are asking the wrong question. It's not about if you can do X which can be faked especially if you are given practically infinite tries and all failures are hidden.

The people who want to believe they actually reason just ignore all obvious evidence of contrary and cherry pick the times reasoning was faked well enough.

The people who don't want to believe will just take a second to understand how they work and then come up with ways to reveal they were faking all along. Like asking how many letters there are in a word lol.

It's only the people who don't want to believe that count because reality is what happens despite of what you believe.


You seem to believe that something is only "reasoning" if it works in a particular way. That it's not enough for it to observationally display reasoning skills; it has to be using a particular method to do that so it's not "faking" it. Is that correct?


It will be interesting to see the excuses people come up with when LLMs innevitably start solving millenium prize problems.


They very obviously reason.


it's kind of crazy to think that the transformer architecture can't encode some primitive form of reasoning.


The issue is Lllms don't learn, despite the name. A human re-implementing a spec would strive to iterate towards what they feel is a better spec. They can take in their own input and self-correct. The work of implementing the spec gives insight into pain points and strengths, even if they never actually test the spec (they 100% should, but this is to emphasize that struggle for humans is in itself iteration, even before external feedback comes in).

An LLM is isn't deterministic but also isn't iterative without an existing human. You give it the same spec 10 times and it produces 10 results that aren't far off itself but vastly different when you go into the weeds. And not different in a way of improvement. |


An LLM should not "generate specs", a human should. The LLM can work from the specs. It can never infer meaning from a vague prompt. If so, it will start guessing. Every human that ever did functional specification or information analysis at some point knows this. Or has learned the hard way, something with assumptions and asses ;)


The guessing of a LLM for a vague prompt is better than the one of your average developer.

A prompt like "write these two files on disk" will very likely make the LLM do some sort of an atomic write/swap operation, unlike the average developer which will just write the two files and maybe later encounter a race condition bug. You can argue the LLM output is overkill, but it will also be more robust on average.


What kind of race condition do you have in mind?


So what’s most important is knowing those parameters and the ranges of values, not having the final result. A human, after producing a specs, can the provide the mental model of how he created the specs. Where the inflection points are and what the range of valid results.

What has always mattered is how you decide the specs, not the specs in themselves.


> If you ask 10 different humans to produce the spec with the same information (prompt and context) they will also produce 10 unique answers

But they didn't ask humans, they asked a machine. We expect our machines to behave in predictable ways.

> If the prompt already contained answers to all the decision points that come up when writing the spec then the prompt would already be the spec itself.

This is one of the best arguments against using LLMs I've seen.

It reduces to the classic argument- at the point where you've described a problem and solution in sufficient detail to be confident in the results, you've invented a programming language.


> We expect our machines to behave in predictable ways.

I expect LLMs to produce randomly varying output. Maybe it's the thousands of hours I spent doing monte carlo simulations for my PhD.

> This is one of the best arguments against using LLMs I've seen.

> It reduces to the classic argument- at the point where you've described a problem and solution in sufficient detail to be confident in the results, you've invented a programming language.

I'm not an LLM true believer, but I use codex for various small tasks and it often (not always) does a thoroughly decent job. Yesterday I gave it a pretty vague request to set up a new Home Assistant dashboard and it handled it just fine--I told it what I wanted to see but it figured out itself which helper variables it would need to set up to realize that vision and wrote all the config for it.

I probably could have done it in 15 minutes if I was familiar with Home Assistant's yaml configuration schema and all, but I'm not so it probably would have taken me closer to an hour. Asking codex took me 30 seconds and it did just fine.

I am skeptical that LLM's are going to kill all white collar jobs or whatever anytime soon. Not being able to truly learn things is an issue. Reality has a surprising amount of detail[1], and while codex does well at things like writing Home Assistant configs and setting up a Minecraft server, where there are thousands of examples online of how to do it, when I've asked it to do some more esoteric things it has sometimes failed spectacularly. I don't think having the LLM keep notes and then read them back (filling up the context window) is a real solution here.

[1] http://johnsalvatier.org/blog/2017/reality-has-a-surprising-...


I haven't made the argument that LLMs aren't useful, I can see cases where they are.

I don't think they include areas where correctness, determinism or human reasoning are important.

At least, not in isolation.


> It's not surprising that an LLM would produce different specs for the same work on different runs This is what I don't understand: AI is a computer program with its own data. If we give the same input to that computer program every time, why does it produce different outputs every time? Or does the input include LLM data + our prompt + some random data that computer program picks from its Internet search?


LLMs have a temperature parameter. At zero temperature they are deterministic: they always choose the most likely next token at each step based on what came before and the model weights, and they will always generate the same output given the same input.

As you raise the temperature they will start (pseudo)randomly choosing tokens other than the single most likely token (though that one will still be the most likely to be chosen). It turns out this is almost always better than zero temperature, which has a tendency to get caught in repetitive loops. I imagine all the frontier labs have spent thousands (millions?) of CPU hours tuning the temperature parameters on their models for optimal performance.


  > LLMs have a temperature parameter. At zero temperature they are deterministic: they always choose the most likely next token at each step based on what came before and the model weights, and they will always generate the same output given the same input.
https://en.wikipedia.org/wiki/Softmax_function

"A value proportional to the reciprocal of β is sometimes referred to as the temperature: β = 1/kT, where k is typically 1 or the Boltzmann constant and T is the temperature. A higher temperature results in a more uniform output distribution (i.e. with higher entropy; it is "more random"), while a lower temperature results in a sharper output distribution, with one value dominating."

"Temperature" in the context of softmax does not change a "winning" token, it changes how much probable (in the sense of softmax distribution) winning token will be. If the winning token is "New York", it will be a winner with temperature close to 0 and with temperature of 1e9.

The actual selection of the random token is done separately by using inputs outside of the softmax distribution, for example, by using random number generator. I believe most of LLM configs have a seed for the random number generator.

More than that, generation of code in most programming languages is done with the more guardrails such as beam search guided by schema, syntax and semantics.


Nah. Even with zero temperature this is still variation.


In STEM fields, yes. In humanities it’s not uncommon.


The goal of a PhD is to become a world expert in a specific topic, whether or not you’re planning on staying in academia.

This may or may not be in alignment with the student’s goals, and many students don’t really understand it going in.


Yes, they don't realize it or lie to themselves because ~50% dropout.

Given the attrition, I really question if PhD programs are honest with incoming prospects. Law schools and business schools are similarly "guilty" of pimping outcomes.

ITT: it's people complaining about being overworked and mislead in their PhD programs.


> Yes, they don't realize it or lie to themselves because ~50% dropout.

I think there's some misinterpretation here. Not staying on in academia after PhD (common/modal) is not the same as not getting to complete a PhD (rare).

In CS/tech, those who exit academia after PhDs get paid $300K-$500K in the industry. I don't think there's any misleading going on.


>is not the same as not getting to complete a PhD (rare)

BTW, your perspective is bizarre.

Not sure where you're getting the idea that PhD candidate attrition is rare. Maybe at MIT where only 20% don't finish (within 10 years -- which is generous), but these are already pre-screened superstars. Most other places converge around 50%.

As for salaries, the median salary for CS PhDs outside academia is $180k. That means a lot are lower and probably aren't working at big tech with full comp pushing them above $300k. [0]

[0] https://ncses.nsf.gov/pubs/nsf26312


PhD programs have remarkably high attrition rates prior to graduation (ie dropout). I don't know that it's 50% and obviously it varies by institution and field but it's quite large.


>In CS/tech, those who exit academia after PhDs get paid $300K-$500K

Yes, I'd like to see data on what percentile gets this and breaks even for lost wages from their PhD years. IMHO, it's not fair to generalize this outcome. I could be wrong.


> Does an approximation to pi therefore slowly creep in as you increase the sides on the polygon?

Yes, under some assumptions. As the sibling comment points out, if there’s a single allowed angle theta then the expected number of intersections is cos(theta) * L/W (-pi/2 < theta < pi/2). You can get from this fact to the standard Buffon’s needle result by integrating wrt theta to find the average probability over thetas with a uniform distribution on (-pi/2 < theta < pi/2): \int 1/pi * cos(theta) * L/W d theta.

Now suppose you have two angles, theta_1 and theta_2. The expected number of intersections for each of them is as above, and if the needle falls at one or the other with equal probability then the overall expectation is 1/2 * cos(theta_1) * L/W + 1/2 * cos(theta_2) * L/W. Passing to the case with n distinct angles with equal probabilities we have \sum_i 1/n cos(theta_i) * L/W.

Now if we make the further assumption that the angles are evenly distributed over (-pi/2 < theta < pi/2), i.e. they are the angles of the sides of a regular n-gon, then we can interpret that sum as a Riemann sum. If we write it as

1/pi \sum_i pi/n cos(theta_i) * L/W

Then pi/n is the delta_i term in the riemann sum, and the limit is

lim_{n -> inf} 1/pi \sum_i pi/n cos(theta_i) * L/W = 1/pi \int cos(theta) * L/W d theta.

We can pull the L/W out, leaving \int_-pi/2^pi/2 cos(theta) d theta = sin(pi/2) - sin(-pi/2) = 2, giving the final result of 2/pi * L/W.

Essentially, as we increase the number of allowable angles we are approximating an integral of the cosine function (times constants) from -pi to pi, which is where the pi creeps in. The angles don’t need to be strictly evenly spaced for this to work—if they are independent randomly selected from the uniform distribution then it will also work, as you’re then performing a monte carlo integration.


> If I were hosting illegal malicious actors doing this stuff on my home servers and refused to even say who was doing it I would 100% get my door kicked down by the FBI. But some persons, corporate persons, are more equal than others.

If you refused to tell some random person who asked? No, you wouldn’t. If you refused to respond to a legal authority—a court-issued subpoena, for example—then there would be consequences.

As far as cloudflare is concerned you’re just a random person asking. They have no legal obligation to provide you with information.


They have a legal obligation to provide a working abuse contact address. I guess you're saying that it is working when they say, "go away." and yeah, I can see that point of view.

But it also means that any domain fronted by cloudflare won't actually have contact information for the owner of the domain required by their legal contact with ICANN as a registrar.


I'm a statistician. My wife does basic (biological) science. Almost every time she asks my advice on an experiment I want to tell her to 10x the sample size. But the academic community has certain ideas about how big sample sizes should be, and trying to use radically larger samples runs into all sorts of barriers ranging from ethics concerns (for animal experiments) to funding.

At the end of the day there's only so much you can learn from a sample size of 12. I'm not sure it's more ethical to have a bunch of wasted experiments with 12 mice each where you don't learn anything than to use 100 mice and actually have statistical power to identify something other than the hugest effect sizes.


Lack of appropriate funding leads to cutting corners to the point that some results may not be worth the price of the paper to describe them. I had a passing experience with epigenetics. Even experiments with basically free of ethics issues cell lines could be screwed up by using single end, too short sequencing reads. Combined with too low coverage, less than perfect controls it gives the input data I which the state of the art peak callers will just throw the towel. So the "trick" is to use some way more forgiving peak caller and get a rather crappy results. Using the outdated human genome assembly (hg19), and old genome mapping programs just puts an extra cherry on the cake...


Also, careful breeding to retain as much genetic diversity as possible is important to avoid collapse in small populations. Even if small local pockets survive, if each pocket is only able to inbreed with itself that will cause problems.

Our local children's museum is part of a network of sites working to restore red wolf [1] populations. Every few years they get new wolves as the coordinators move young wolves around to optimize mating pairs.

[1] https://en.wikipedia.org/wiki/Red_wolf


> Please don't use Hacker News for political or ideological battle. It tramples curiosity.

https://news.ycombinator.com/newsguidelines.html


I don’t like when people use this without all the important context. Showing other quotes from the same person that is giving subjective advice on topics, lends to curiosity around how to filter their advice.

> It's is a complex and hard question, but the principles we apply to it have been around for a long time and are consistent with the site guidelines. If they weren't, we'd change the latter. > > I've explained all of this many times. If you, or anyone, would like to know how we approach the question, you could start here: > > https://hn.algolia.com/?dateRange=all&page=0&prefix=false&so... [1]

[1]: https://news.ycombinator.com/item?id=47373246


In my opinion, it's relevant to Card's credibility. If he shows poor judgement in one area, why would I want to listen to his opinion on something else, even something which is considered to be in his wheelhouse? Poor judgement is poor judgement.


Everyone shows poor judgement in one area.

It sucks that he has those ideas, but he's a good writer.


He's a popular writer, but hardly good. No one is going to be reading his books in 50 years, let alone 200.


That's an extremely high bar. But to the extent that critics and awards are the metric we have, he is an objectively good writer.


Oh course it's a high bar! Why should anyone care about this work outside of the time of their release? It's modern culture but it ain't gonna be passed down anytime soon.


If the bar is that people will continue reading their books in 200 years, than which fiction writers of the last few decades would go into your list of "good"?


I don't know because you're asking me something that is impossible for a human that only lives to ~80 years.

Try asking better questions if you want better responses.


Take your best guess, obviously. The question is fine.

You're already confidently predicting a negative future for Ender's Game, so you're clearly not adverse to predictions.


I don't know that you can establish objectively if someone is a good writer. He's an acclaimed, award-winning writer, sure.


The not-so-short story Ender's Game was great. The novel Ender's Game was awful. I hope someone told him it was too long, too repetitive, and too Gary Stu. I wish he had taken that feedback to heart.


The novella is the only version I've read. I came away both not understanding why a longer, novel-length version would exist, and with no interest in reading anything even slightly worse than that from the same author (which I'm given to understand describes most of his other work).

The novella was an alright time, though.


Everyone is very, very wrong about something. By your logic we should ignore everyone about everything.


This but unironically!


it did not make me wish to engage in political or ideological battle, i found it an interesting reflection of a complicated person's thought process. So, any battling is on you (you no doubt have a lot of company: "don't you dare feed us raw meat, we'll jump up and down in our cages and spill the poop buckets")


We can take the AI out of the question entirely and ask how many other humans you personally as a driver would be willing to mow down to avoid your own death—driving off a bridge, say.

I would suggest that all but the most narcissistic would have some limit to how many pedestrians they would be willing to run over to save their own lives. The demand that the AI have no such limit—“that the AI will prioritize my life and safety over literally any other concern”—is grotesque.


I'm surprised this targets TeX rather than lilypond, which AFAIK is the gold standard for free (as in beer and speech) music engraving.

I checked, and lilypond also offers features for Gregorian chant notation [1]. Has anyone used both and is able to compare?

[1] https://lilypond.org/doc/v2.25/Documentation/notation/typese...


Good question, they considered it at some point:

> Lilypond is a very good tool, but the part on Gregorian chant is not maintained and very deep modifications would be needed to perfectly align the notes and text.

Source: https://gregorio-project.github.io/gregoriotex/index.html


Circa 2024[1]:

> Note, however, that there are some serious flaws in LilyPond regarding Gregorian notation (especially the non-modern version), and right now there is nobody who works on improving that...

1: https://lists.gnu.org/archive/html/lilypond-user/2024-10/msg...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: