Hacker Newsnew | past | comments | ask | show | jobs | submit | abathur's commentslogin

I guess, but have you actually encountered a teacher grading an assignment solely based on word count?

I certainly wish more teachers encouraged parsimony and penalized fluff and bullshittery, but I'd be surprised to find them doing it outside of some narrow cases where the point is just to make you write something at all.

Tthey generally want to encourage their students to engage with the topic at a certain level and practice the thinking needed to research, structure, and implement an argument of a certain length. They want you to put at least 5 pounds of idea in the 5-10 pound idea bag.

If you're convinced you've hacked word economy and satisfied the assignment except for this goshdarnpeskyminimumwordcount, you're probably misunderstanding the lesson the instructor is willing to read through a bunch of bad writing to impart and cheating yourself.


That's actually the trick. If you assign word count, MLA style, grammar, you just have to look for the errors. You don't have to engage with the ideas at all, or provide conversational feedback - just cryptic notes in the margins, like "???" or "awk"


I feel like we need a name for css-the-syntax (and maybe -the-semantics) as separate from css-the-body-of-rules/functions/units/etc-defined-by-csswg.

There's juice in it, but it's hard to talk about and survey other uses without just searching GH for code using css parsers and just see what kind of shenanigans people are up to.

I've been playing around with a weird thing that's kinda like a template engine, but driven by a mix of a lightweight node-based markup language, css selectors for expressing what goes into the template, and a css-alike for controlling exactly how all of these parts come together.


I think it's already pretty clearly separated in the standard:

https://www.w3.org/TR/selectors-3/

and this is what the DOM spec references, too (albeit at level 4):

https://dom.spec.whatwg.org/#selectors

So the common name "CSS selector" is already correct, or simply "selector"?

"DOM selector" would be nicer maybe and not contain "CSS", but also I'm not even sure if it's better, because selectors in static CSS or selectors used with different DOM engine (XML parser, PHP DOM API, whatever) and static markup outside of JS engine means we're talking about a diferent DOM than the one exposed to JS.

Also, some special selectors are directly tied to browser rendering and navigation, such as :hover or ::target-text.

So yes, maybe a minimal subset of the query syntax could benefit from a name that's not tied to the browser and CSS.


For the generic selector naming I'd suggest "cascade selector/selectors" as that gives a hint of the origins and describes the actual function of it pretty well.


I see what you mean, but what would this do except add more churn?

Words sometimes have misleading aspects, but I don't see any practical problem with the current usage of the word "selector" in web dev. The CSS part is often omitted when it's implicit.

The spec separates selectors cleanly into its own module already, and there are already implementations that don't rely on HTML rendering.

Any rename by commitee wouldn't stick anyway, and the origin of this selector spec is CSS, doesn't prevent other uses.

When you bring in "cascading", you already go close to the CSS / rendering aspect, because that's the most common use case for cascading?

Selectors don't cascade, rules do.


CSS containing Stylesheet couples it pretty tightly to styling things, though. What about CSL, as in Cascading Selector Language?


But can you trust that the things they say aren't just laundered AI blogspam?


Well I trust that the things my friends say aren't laundered AI blogspam. And if they trust the things their friends say, I can likely trust that too.


Does crapping on the average school's deep well of expertise for evaluating how effectively AI software solutions address their problems somehow fix the underlying problem (that the cost of catching cheaters is significantly higher than the cost of cheating)?

(This is roughly the same problem as evaluating software that only does an approximation of what it claims to do.)

(Aside: AI-based variations on this theme are in the early stages of proliferating across our society. They're being developed by many people using this forum and being sold to our schools, businesses, governments, and other organizations with little regard to whether they actually do what they claim.)


> that the cost of catching cheaters is significantly higher than the cost of cheating

This is tackling the problem from the wrong direction. The right direction would be to make it harder to cheat in the first place. For example: if the student submits an essay, and that student is able to coherently and accurately answer any questions asked about the essay in a face-to-face conversation, then that student is probably the genuine author of that essay.


I agree with you that a face-to-face q&a is a reasonably good way to detect low-effort cheating, but I'll still quibble a bit:

- I don't think this lowers the cost of detection as much as you imagine. You still need to know the paper better than the student and have to sacrifice already tight instruction/planning/grading time to have all of these conversations. Even if you catch enough to successfully deter most, it likely means not covering something else. It won't be too hard to catch low-effort cheaters who can't be bothered to read the paper, but you're on the low-leverage side of an arms race with the remaining students. You have experience on your side and they can't know what you'll ask, but they outnumber you and can certainly read the paper and use LLMs to quiz them on it. You have to invest your effort without knowing how each student prepared, so you'll spend about as much effort on every low-effort cheat as you do on the highest-effort cheat you are prepared to catch.

- Not sure it is "from the wrong direction" since both approaches raise the cost of cheating and lower the cost of detecting it.

- While this does avoid encouraging students to dumb down their work, it does still raise the cost of not-cheating. Unless you surprise the students with these conversations, the ones that care most will still anxiously prepare.


I don't disagree with you that a reasonable way to cope with the current problems is to ensure everything that "counts" is done in a controlled environment, but pedagogy and its goals are vast.

There are things you learn from spending several days structuring a 20-page argument that you will not learn (and cannot assess) from oral examination or a 5-paragraph essay written in a blue book.


If you have spent several days structuring a 20 page argument in October on any topic you'll have learnt a great deal about the subject matter. When you get to the exam hall in, say, May it will stand to you.

That knowledge will show up in the blue book vis-a-vis the other exam candidates.


Sure--yes--the student will learn something if they actually wrote a 20-page paper on some given topic. But how are you going to evaluate their ability to compose the 20-page argument?

I would prefer not to be confrontational here, but I am having a hard time imagining that you've deeply considered the pedagogy of how to teach and evaluate students on squishy skills like this.

Knowing a bunch of facts about something is a world apart from structuring a compelling in-depth argument about it.


In the simplest case, where we'll say the exam question was precisely the topic of the 20 page paper, the candidate would be golden. Of course, it's unlikely in a 3 hr. exam that you'll be asked to write a 20 page response; but in edited form, you could definitely produce three cogent pages about some particular aspect of the original paper - if you've done the work. If you truly wrote the 20 page paper, you can surely produce three literate, cogent, responsive and topical pages.


I think schools need to set up additional, new proctoring sessions for this type of work. This will likely be something they have to hire for. A student can come and work for four hours, then hand in their in-progress draft and leave, then return later to finish it. (And please for the love of god, let students do this on offline computers, don't make them handwrite everything!)


What stops a student from going home and asking chatGPT to write about few bullet points about how to answer and coming back the next day doing what chatGPT told him to do?


They'd need a very good memory. You can't make cheating impossible, before ChatGPT they could have asked a friend for help.


This assumes that the assignments and the exam cover the same material. That's not always the case.


That would be really poor course design :-)


There are many disciplines in which students work on effectively distinct projects.

For example, the life-changingly-well-designed newswriting course I took in college assigned every single student a different story to spend several weeks reporting out so that we wouldn't all be out harassing the same poor people for interviews.


Genuinely interested. What was the final like? This seems more in the experimental science (ok, journalism) category. I may have to adjust my thinking to be more expansive and also include things like "vocational".


Grammar and AP style rules, iirc. (I may not. It's been enough years now. I did try and fail to find the syllabus in my box of five-star notebooks. We mostly used reporters notebooks for this class, and I took it over the summer. The materials are probably in a plastic bag somewhere...)


But there's no reason to expect that work to be graded. It should be a learning exercise which trains skills later tested under exam conditions.


Many students will simply not do these assignments. They should but they won’t. Continuous assessment partly solves this.


I think both of those experiments do a good job of demonstrating utility on a certain kind of task.

But this is cherry-picking.

In the grand scheme of the work we all collectively do, very few programming projects entail something even vaguely like generating an Nth HTML parser in a language that already has several wildly popular HTML parsers--or porting that parser into another language that has several wildly popular HTML parsers.

Even fewer tasks come with a library of 9k+ tests to sharpen our solutions against. (Which itself wouldn't exist without experts trodding this ground thoroughly enough to accrue them.)

The experiments are incredibly interesting and illuminating, but I feel like it's verging on gaslighting to frame them as proof of how useful the technology is when it's hard to imagine a more favorable situation.


> "it's hard to imagine a more favorable situation"

Granted, but this reads a bit like a headline from The Onion: "'Hard to imagine a more favourable situation than pressing nails into wood' said local man unimpressed with neighbour's new hammer".

I think it's a strong enough example to disprove "they're an interesting phenomenon that people have convinced themselves MUST BE USEFUL ... either through ignorance or a sense of desperation". Not enough to claim they are always useful in all situations or to all people, but I wasn't trying for that. You (or the person I was replying to) basically have to make the case that Simon Willison is ignorant about LLMs and programming, is desperate about something, or is deluding himself that the port worked when it actually didn't, to keep the original claim. And I don't think you can. He isn't hyping an AI startup, he has no profit motive to delude him. He isn't a non-technical business leader who can't code being baffled by buzzwords. He isn't new to LLMs and wowed by the first thing. He gave a conference talk showing that LLMs cannot draw pelicans on bicycles so he is able to admit their flaws and limitations.

> "But this is cherry-picking."

Is it? I can't use an example where they weren't useful or failed. It makes no sense to try and argue how many successes vs. failures, even if I had any way to know that; any number of people failing at plumbing a bathroom sink don't prove that plumbing is impossible or not useful. One success at plumbing a bathroom sink is enough to demonstrate that it is possible and useful - it doesn't need dozens of examples - even if the task is narrowly scoped and well-trodden. If a Tesla humanoid robot could plumb in a bathroom sink, it might not be good value for money, but it would be a useful task. If it could do it for $30 it might be good value for money as well even if it couldn't do any other tasks at all, right?


> Granted, but this reads a bit like a headline from The Onion: "'Hard to imagine a more favourable situation than pressing nails into wood' said local man unimpressed with neighbour's new hammer".

Chuffed you picked this example to ~sneer about.

There's a near-infinite list of problems one can solve with a hammer, but there are vanishingly few things one can build with just a hammer.

> You (or the person I was replying to) basically have to make the case that Simon Willison is ignorant about LLMs and programming, is desperate about something, or is deluding himself that the port worked when it actually didn't, to keep the original claim.

I don't have to do any such thing.

I said the experiments were both interesting and illuminating and I meant it. But that doesn't mean they will generalize to less-favorable problems. (Simon's doing great work to help stake out what does and doesn't work for him. I have seen every single one of the posts you're alluding to as they were posted, and I hesitated to reply here because I was leery someone would try to frame it as an attack on him or his work.)

> Is it? I can't use an example where they weren't useful or failed.

  https://en.wiktionary.org/wiki/cherry-pick

  (idiomatic) To pick out the best or most desirable items
  from a list or group, especially to obtain some advantage
  or to present something in the best possible light. 

  (rhetoric, logic, by extension) To select only evidence which supports an argument, 
  and reject or ignore contradictory evidence. 
> any number of people failing at plumbing a bathroom sink don't prove that plumbing is impossible or not useful. One success at plumbing a bathroom sink is enough to demonstrate that it is possible and useful - it doesn't need dozens of examples - even if the task is narrowly scoped and well-trodden.

This smells like sleight of hand.

I'm happy to grant this (with a caveat^) if your point is that this success proves LLMs can build an HTML parser in a language with several popular source-available examples and thousands of tests (and probably many near-identical copies of the underlying HTML specs as they evolve) with months of human guidance^ and (with much less guidance) rapidly translate that parser into another language with many popular source-available answers and the same test suite. Yes--sure--one example of each is proof they can do both tasks.

But I take your GP to be suggesting something more like: this success at plumbing a sink inside the framework an existing house with plumbing provides is proof that these things can (or will) build average fully-plumbed houses.

^Simon, who you noted is not ignorant about LLMs and programming, was clear that the initial task of getting an LLM to write the first codebase that passed this test suite took Emil months of work.

> If a Tesla humanoid robot could plumb in a bathroom sink, it might not be good value for money, but it would be a useful task. If it could do it for $30 it might be good value for money as well even if it couldn't do any other tasks at all, right?

The only part of this that appears to have been done for about $30 was the translation of the existing codebase. I wouldn't argue that accomplishing this task for $30 isn't impressive.

But, again, this smells like sleight of hand.

We have probably plumbed billions of sinks (and hopefully have billions or even trillions more to go), so any automation that can do one for $30 has clear value.

A world with a billion well-tested HTML parsers in need of translation is likely one kind of hell or another. Proof an LLM-based workflow can translate a well-tested HTML parser for $30 is interesting and illuminating (I'm particularly interested in whether it'll upend how hard some of us have to fight to justify the time and effort that goes into high-quality test suites), but translating them obviously isn't going to pay the bills by itself.

(If the success doesn't generalize to less favorable situations that do pay the bills, this clearly valuable capability may be repriced to better reflect how much labor and risk it saves relative to a human rewrite.)


> "Yes--sure--one example of each is proof they can do both tasks."

Therefore LLMs are useful. Q.E.D. The claim "people who say LLMs are useful are deluded" is refuted. Readers can stop here, there is no disagreement to argue about.

> "But I take your GP to be suggesting something more like: this success at plumbing a sink inside the framework an existing house with plumbing provides is proof that these things can (or will) build average fully-plumbed houses."

Not exactly; it's common to see people dismiss internet claims of LLMs being useful. Here[1] is a specific dismissal that I am thinking of where various people are claiming that LLMs are useful and the HN commenter investigated and says the LLMs are useless, the people are incompetent, and others are hand-writing a lot of the code. No data is provided for use the readers to make any judgement one way or the other. Emil taking months to create the Python version could be dismissed this way as well, assuming a lot of hand-writing of code in that time. Small scripts can be dismissed with "I could have written that quickly" or "it's basically regurgitating from StackOverflow".

Simon Willison's experiment is a more concrete example. The task is clearly specified, not vague architecture design. The task has a clear success condition (the tests). It's clear how big the task is and it's not a tiny trivial toy. It's clear how long the whole project took and how long GPT ran for, there isn't a lot of human work hiding in it. It ran for multiple hours generating a non-trivial amount of work/code which is not likely to be a literal example regurgitated from its training data. The author is known (Django, Datasette) to be a competent programmer. The LLM code can be clearly separated from any human involvement.

Where my GP was going is that the experiment is not just another vague anecdote, it's specific enough that there's no room left for dismissing it how the commenter in [1] does. It's untenable to hold the view that "LLMs are useless" in light of this example.

> (repeat) "But I take your GP to be suggesting something more like: this success at plumbing a sink inside the framework an existing house with plumbing provides is proof that these things can (or will) build average fully-plumbed houses."

The example is not proof that these things can do anything else, but why would you assume they can't do tasks of similar complexity? Through time we've gone from "LLMs don't exist" to "LLMs exist as novelties and toys (GPT-1 2018)" to "LLMs might be useful but might not be". If things keep progressing we will get to "LLMs are useful". I am taking the position that we are past that point, and I am arguing that position. We are definitely into the time "they are useful". Other people have believed that for a long time. Not just useful for that task, but for tasks of that kind of complexity.

Sometime between GPT-1 babbling (2018) and today (Q4 2025) the GPTs and the tooling improved from not being able to do this task to yes being able to do this task. Some refinement, some chain of thought, some enlarged context, some API features, some CLI tools.

Since one can't argue that LLMs are useless by giving a single example of a failure, to hold the view that LLMs are useless, one would need to broadly dismiss whole classes of examples by the techniques in [1]. This specific example can't be dismissed in those ways.

> "If the success doesn't generalize to less favorable situations that do pay the bills"

Most bill-paying code in the world is CRUD, web front end, business logic, not intricate parsing and computer science fundamentals. I'm expecting that "AI slop" is going to be good enough for managers no matter how objectionable programmers find it. If I order something online and it arrives, I don't care if the order form was Ruby on Rails emailing someone who copied the order docs into a Google Spreadsheet using an AI generated If This Then That workflow. and as long as the error rate and credit card chargeback rate are low enough, nor will the company owners. Even though there are tons of examples of companies having very poor systems and still being in business, I don't have any specific examples so I wouldn't argue this vehemently - but the world isn't waiting for LLMs to be as 'useful' as HN commenters are waiting for, before throwing spaghetti at the wall and letting 'Darwinian Natural Selection' find the maximum level of slop the markets will tolerate.

----

On that note, a pedantic bit about cherry-picking: there's a difference between cherry-picking as a thing, and cherry-picking as a logical fallacy / bad-faith argument. e.g. if someone claims "Plants are inedible" and I point to cabbage and say it proves the claim is false, you say I'm cherry-picking cabbage and ignoring poisonous foxgloves. However, foxgloves existing - and a thousand other inedible plants existing - does not make edible cabbage stop existing. Seeing the ignored examples does not change the conclusion "plants are inedible" is false, so ignoring those things was not bad. Similarly "I asked GPT5 to port the Linux kernel to Rust and it failed" does not invalidate the html5 parser port.

Definition 2 is bad form; e.g. saying "smoking is good for you, here is a study which proves it" is a cherry-picking fallacy because if the ignored-studies were seen, they would counter the claim "smoking is good for you". Hiding them is part of the argument, deceptively.

"LLMs are useless and only a deluded person would say otherwise" is an example of the former; it's countered by a single example of a non-deluded person showing an LLM doing something useful. It isn't a cherry-picking fallacy to pick one example because no amount of "I asked ChatGPT to port Linux to Rust and it failed" makes the HTML parser stop existing and doesn't change the conclusion.

[1] https://news.ycombinator.com/item?id=45560885


I've been tasked with doing a very superficial review of a codebase produced by an adult who purports to have decades of database/backend experience with the assistance of a well-known agent.

While skimming tests for the python backend, I spotted the following:

    @patch.dict(os.environ, {"ENVIRONMENT": "production"})
    def test_settings_environment_from_env(self) -> None:
        """Test environment setting from env var."""
        from importlib import reload

        import app.config

        reload(app.config)

        # Settings should use env var
        assert os.environ.get("ENVIRONMENT") == "production"
This isn't an outlier. There are smells everywhere.


If it is so obvious to you that there is a smell here then an agent would have caught it. Try it yourself.



This is the kind of dismissive sneer the HN guidelines advise against.

You can write dev docs for humans and still want machine readability (without caring about whether some LLM can make sense of the docs).

Machine readability is how you repurpose your own documentation in different contexts. If your documentation it isn't machine readable it might as well be in a .doc(x) file.


There's quite a lot of pricing data available for the energy market and it might be possible to approximate battery profitability by rerunning normal and long-tail history.

See https://www.ercot.com/mktinfo/prices and https://www.ercot.com/gridmktinfo/dashboards and https://www.ercot.com/gridmktinfo/dashboards/energystoragere... for example.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: