Hacker Newsnew | past | comments | ask | show | jobs | submit | samgutentag's commentslogin

"We built it, it worked sometimes, so we threw it all out" is either the most honest engineering blog post I've read in a while or the plot summary of every project I've ever worked on.

It feels like the teams getting AI workflows right are the ones willing to iterate toward simplicity, figuring out what they're uniquely good at and letting the rest of the ecosystem handle the rest.

That loop of build, learn, simplify is quietly producing better products than "build the whole thing" ever did.


Tangential but reminds me of the backlash Clair Obscur got after they had AI assets in their early development. https://www.polygon.com/clair-obscur-expedition-33-indie-gam...

Feels like a nobrainer at this point to "build fast with AI, prove its use, then tear parts of it down and make it good"


Just wanted to flag the use of the little "jump back to where I was reading" links on the footnotes is a feature I'll be implementing and using on every footnote I ever write for the rest of my life now. Thank you!


"Law Enforcement Officer"


These days, it needs to be very clearly an example project/feature/whatever the take home is.

If it even remotely smells like "real work" I'd be inclined to pass. I don't need to do work for free, or use my tokens for them either


"Generate an SVG of a pelican riding a bicycle" is the new "but can it run Crysis"


mitochondria is the powerhouse of the cell


how the heck did i even post this on this thread? god ive got too many tabs open


"English spelling has a reputation. And it’s not a good one." - never have i ever agreed with anything more

different hill, but one I would die on is: as the letter "c" should make the "ch" sound, the letter "c" serves no purpose not already handled by "s" or "k" otherwise


https://guidetogrammar.org/grammar/twain.htm

  For example, in Year 1 that useless letter "c" would be dropped to be replased either by "k" or "s", and likewise "x" would no longer be part of the alphabet.

  The only kase in which "c" would be retained would be the "ch" formation, which will be dealt with later.

  Year 2 might reform "w" spelling, so that "which" and "one" would take the same konsonant, wile Year 3 might well abolish "y" replasing it with "i" and iear 4 might fiks the "g/j" anomali wonse and for all.

  Jenerally, then, the improvement would kontinue iear bai iear with iear 5 doing awai with useless double konsonants, and iears 6-12 or so modifaiing vowlz and the rimeining voist and unvoist konsonants.

  Bai iear 15 or sou, it wud fainali bi posibl tu meik ius ov thi ridandant letez "c", "y" and "x" -- bai now jast a memori in the maindz ov ould doderez -- tu riplais "ch", "sh", and "th" rispektivli.

  Fainali, xen, aafte sam 20 iers ov orxogrefkl riform, wi wud hev a lojikl, kohirnt speling in ius xrewawt xe Ingliy-spiking werld.


> fiks the "g/j" anomali wonse and for all

But we'd still be arguing about how to pronounce "ᵹif"


We'll just make it g'jif


I remember a version which ends with how we'll end up speaking German.


The nice thing about this passage is it reflects the extent of Twain's non-rhotic dialect -- he keeps the R in "year"/"years", "orthographical", and "world" but drops it in "after", "letters", and "dodderers". So only dropped in final unstressed syllables of multi-syllable words.


Recommend X for the ‘sh’ sound, as it is pronounced that way in languages like Portuguese. Y is a common typographical substitute for theta/thorn, as in “ye olde shoppe.”


Or X -> ch, as in Greek, and footballers called Xavi?


Xavi is catalan (shorter form of the name Xavier) and in Catalan "x" has exactly the "sh" sound. To get the "ch" sound you need to use "tx". And yes, most people - even natives - pronounce Xavi badly, due to Castillan influence on Catalan, and the lack of the "sh" sound in Castillan.


> most people - even natives - pronounce Xavi badly, due to...the lack of the "sh" sound in Castillan

Catalans seem to pronounce "caixa" fine, so I think they _could_ say "Shabi"... But this does back up your larger point about "x" -> "sh" in Catalan.


Yes Catalans haver no problem with the "x" :) but it's just with the name that is mispronounced due to the Castillan overlap. I think that "caixa" with the "i" before the "x" makes it easier also for Castillans (although it's funny to hear them pronounce it). There is also the fact that both speakers have serious issues with words starting with "s" + consonant, so my theory is that "shavi" is also affected, while "chavi" is far easier.


I wonder how the castilians pronounce "xaile" now.


There's no /x/ phoneme in modern English, so it's unneeded in English spelling.


By the way the source was a Mr Shield's letter to the Economist rather than Twain https://web.archive.org/web/20200311221105/http://www.letter...


There are a lot of things Mark Twain didn't say.


On the whole, most things that have been said were not said by Twain.


I'm convinced that this is Just The Right Thing To Do. Like ridiculously strong benefits, and practically no drawbacks at all.


English's spelling irregularities help with disambiguating homophones:

  cent / sent / scent
  ceiling / sealing
  cite / sight / site
  colonel / kernel
  carrot / karat
  cue / queue


Which, of course, does not help things like polish polish (made in Warsaw) and to produce produce (pull apples out of a bag). However you look at it, when they set up English words and spelling there was large quantities of alcohol involved.


Also read (future tense) and read (past tense) being pronounced differently despite the same spelling.


And present present (to pull a gift out of, well, a bag)


Only in writing. The disambiguation is already needed when spoken and the context does this.


If you look up these words in the dictionary, the same word with the same spelling very often has several different definitions that are often unrelated because homographs (same spelling, but different meaning) are super-common in English. Dictionaries don't account for newer or more niche meanings of words either.

How is it that you can say these words without confusion?

Language is context sensitive and you understand the word based on the context around it. Likewise, you understand homographs based on the context. Because of this, spelling isn't as important as it might appear.


On paper, yes. But not when someone speaks. If you used a homophone while speaking, the listener would be able to distinguish which variant the talker intended based on context. I would argue this is enough of a reason for written text as well.


Some other languages do the same with diacritics.

Most don't bother because context is nearly always sufficient.


And cause confusion with needless heterographs?

practice / practise licence / license


"Ch" is a strange hill to die on. "Ch" has a mostly consistent pronunciation (eg chair, touch, chain, choke, recharge, etc) that no other letter combination does.

Exceptions to this are generally loan words, particularly from French (eg chaise, which sounds more like "sh"). Others are harder to explain. "Lichen" springs to mind. Yes it technically comes from Latin but we're beyond the time range to truly consider it a loan word.

There are also some "ch" words of Greek origin (IIRC) that could simply be replaced with "c" or "k" (eg chemistry, school).

"Kh" on the other hand I think is entirely loan words, particularly from Arabic. Even then we have names like "Achmed" that would more consistently be written as "Akhmen". "Khan" is obviously a loan word but I think time has largely reduced the pronunciation to "karn" rather than "kharn" if it ever was that.

But I can't think of a single "kh" word that pronounced like "ch" in "chair".

"Sh" doesn't seem to crossover with any of these pronunciations.


In Dutch and German Ch is pronounced as 'g'.


> It’s full of silent letters, as in numb, knee, and honour. A given sound can be spelled in multiple ways (farm, laugh, photo), and many letters make multiple sounds (get, gist, mirage).

that last one is hardly fair - gist and mirage are french words. might as well complain about the silent letters in rendezvous or faux pas.


Almost every English word is French, except for the most important ones.


Touche


Call me a douche, but the e in "touche" is silent, whereas that in "touché" is voiced.


I was lazy, I didn't do the accent.


The food is French, the animal is Anglo Saxon. At least English lacks compound words or whatever German calls those 30-character constructions.


> At least English lacks compound words or whatever German calls those 30-character constructions.

Not entirely true. English, as any other Germanic language, still likes to compound words to produce a new meaning, the main difference is that, as opposed to most other Germanic languages, spaces are usually retained in writing. But this is just a spelling difference, the underlying process is the same.

See https://en.m.wikipedia.org/wiki/Compound_(linguistics)


Does that mean, that "compound word" counts as a single word? And how do I distinguish between "a" "compound" "word" and "a" "compound word"?


Depends on your definition of a word and how it relates to writing. It's not such a simple question, actually.

Let's consider "scheepskapitein", "Schiffskapitän" and "ship captain". All three are formed the exact same way and mean roughly the same thing, but it's customary in Dutch and German to spell it without a space and in English it's considered correct to have a space in between. Note, that there are no spaces in speech, it's simply a writing convention. So, how many words are there in this example?


I don't know, I think German laymen have a unambiguous understanding. "der Schiffskapitän" = 2, "des Schiffes Kapitän" = 3

Sure, linguists can dissect everything and should, but how does the English laymen perceive it?


"Cattle labeling meat labeling supervision task transfer act" is just as bad as Rinderkennzeichnungsfleischetikettierungsüberwachungsaufgabenübertragungsgesetz, English just gets to use spaces where German doesn't. The underlying construction is the same. (I definitively got that translation wrong)


Usually English will try to come up with a single, Latin-or-Greek-derived word for compound ideas like this, which is another bad habit.

So surgery is full of -ectomies instead of -cut-outs.


Medicine terms in German also use Latin or Greek, since this is the subject language, so this is a bad example.


English gets to use a sentence. It can be reworded any number of ways. I did a bit of quick googling and the clearest English I came up with for `Regulation (EC) No 1760/2000` is "Requirements for the Labelling of Minced Beef" which is a lot easier to process than Rinderkennzeichnungsfleischetikettierungsüberwachungsaufgabenübertragungsgesetz. The reason we split code over lines is the same reason we split sentences into words. Easier for the brain to parse.

I wonder do German brains work on a much longer context window because of the language?


> I wonder do German brains work on a much longer context window because of the language?

Maybe, but more due to the spelling of numbers and long sentences. Compound words are not an example of this, since Germans can parse these words just fine as different things. It just means that the lowest "tokenization" in everyday use is not the word, but subcomponents of them.

Do English native speakers "tokenize" expressions in words? Do you see it as '(labelling) (of) (minced)' or '(label)l(ing) (of) (minc)(ed)' ?

I can't speak for most Germans, but the algorithm I think I use is just greedy from left to right. This is also consistent with how mistokenization in common puns works, so I think this is common.

In primary school we trained to recognize syllable boundaries. Is that just a German thing, or is this common in other countries? You need to know these for spelling and once you know these, separating word components becomes trivial.


a) the title of the regulation is not equivalent to the law (unsurprisingly), onestay42's translation is clunky but a lot closer

b) the official title of the law was "Gesetz zur Übertragung der Aufgaben für die Überwachung der Rinderkennzeichnung und Rindfleischetikettierung", so how again is it that English "gets to use a sentence" and German doesn't? German has the choice depending on context, sometimes having one word is convenient.


I'm not a German speaker. Why would someone use such a long word as a convenience?


I am. It is a semantic difference. Single entities get referred to by a single word. If you use a word group to describe it, it means you don't consider it a single "thing", but a "system" described by the relations of single "things".

The composed word also has a specific meaning that the same words with space between doesn't. For example "das rote Kraut" – "red herb" and "das Rotkraut" – "red cabbage". Also suppose "red cabbage" was grown in abnormal conditions, so it doesn't have the color pigments, it is still "red cabbage", but not "red" "cabbage". This is awkward to state in English, but no problem in German.


Maybe in speech they are similar, but not in writing. The underlying construction is as different as it can be. English puts " " between words, and German does not.


In Danish knee is 'knæ' and the K is pronounced very clearly. It's interesting that English speaking people have forgotten how to pronounce K before N, so the Danish king Knud became Canute.


But which "ch" sound? "Ch" as in "church" is just "tsh". "Ch" as in "charade" is just "sh".


Seconding this. C should be the ʃ sound, and then TC should be the "ch" in "church." The fact that there's no one letter for ʃ is the real tragedy.


Post-alveolar affricates are phonemic in English and deserve their own characters.

(To put it another way, most native speakers treat "ts" as two sounds but not "ch")

Luckily there are other wasted characters, like "x" and "q".


I imagine integrals would make a loud static-y burst of noise.


It's not an integral sign. It's U+0283 LATIN SMALL LETTER ESH.


No, crackle is the 5th derivative, not the integral.


I've played around with respelling quite a bit; one of the most difficult adaptations is forcing yourself to correctly use "dh" (few-but-common words, "thy", "either", "teethe") vs "th" (most words, "thigh", "ether", "teeth").

j -> dzh is more weird than anything.

Vowels, of course, are a cause of war between dialects; nobody can even agree how many there are.


I kinda wish English avoided Xh type digraphs because they screw up common borrowings like Thai. Sure, that's not strictly phonemic in English, but I think realistically given how readily English adopts foreign words into it without completely nativizing them phonemically, any orthography should strive to reflect that, meaning that combinations like "th" should have their obvious meanings that can be inferred by native speakers even if such a sequence never occurs in native words.

Esperanto has a nice trick where they reserve "x" as a modifier letter, so if you can't use diacritics you write "cx", "sx", "jx" etc; but it does not have a sound value of its own and can never occur by itself. We could extend this to "tx" and "dx" with obvious values, and also to vowels - "a" for /æ/ vs "ax" for /ɑ/, "i" for "ɪ" vs "ix" for /i/ etc. Using "j" the way it is today feels somewhat wasteful given how rare it is. In the x-system it would probably be best represented by "gx", and then we could have a saner use for "j" like all other Germanic languages do. Which would free up "y" so we could use it for the schwa.

One thing that occurred to me the other day is that "x" is also a diacritic, so we could just say that e.g. "sx" and "s͓" are the same thing. Then again from a purely utilitarian perspective a regular dot serves just fine and looks neater (and would be a nice homage to Old English even if ċ and ġ are really just a modern convention).

Vowels, yeah... I think it's pretty much impossible to do a true phonemic orthography for English vowels that is not dialect-specific. As in, either some dialects will have homographs that are not homonyms, or else other dialects will not have the ability to "write it as you speak it" because they'll need to use different letters for the same (to them) sound. In the latter case, it would become more of a morphological orthography. Which would still be a massive improvement if it's at least consistent.

OTOH if you look at General American specifically, and treat [ə] and [ʌ] as stress-dependent allophones, then you can get away with 9 vowel characters in total (ɪiʊuɛəoæɑ). That's pretty easy with diacritics.


Bring back þ!


The problem with þ is that it dates from a time when /ð/ vs /θ/ were allophones. That is, none of the minimal pairs I listed above existed (mostly due to more words having additional syllables - often, inflections at the end).


"ð" was also a thing at that time, so we can bring them both back and use them to distinguish.


I completely agree with you. I've taken an amateurish interest in linguistics over the past couple of years, and I've often thought that it might be a fun exercise to come up with a phonetic alphabet for the English language. Use the letter 'c' to represent /ch/, 'x' to represent /sh/, etc.

Maybe as a fun pet project someday!


Words of Latin origin are identifiable at a glance, and homonymic collisions are thereby avoided

-- Caeser, seizer of the day


"Caesar" was pronounced in Latin with a hard C, which is preserved in German ("Kaiser").


And v is pronounced like w, but people look at you funny if you pronounce vice versa "wikay wersa".


I made someone with a veni vidi vici tattoo when I told him that.


Changing "cube" to "kube" would just look like it's pronounced "koob" (e.g. rube, tube, lube), so we swap a minor spelling aggravation for a minor pronunciation edge case. unless you want to go full kyube but we're not putting that on the table.


Well, it would be a step backward in the right direction to go with spelling it 'kube' and pronouncing it 'koob'. That would hew to the original Greek. We'd also bring cybernetic back closer to kubernetes. And circle to kuklos. (Side note: It's another spelling "error" that we use 'y' in English to transliterate the Greek upsilon, which looks like 'Y' when capitalized, but is really a better match to 'u'. Hence, hyper and hypo instead of huper and hupo (like super and sub).)


kyube or kyoob would definitely be the way to go.

It's funny you use "tube" as an example though, as in my British accent I pronounce that as "chube", whereas I believe many Americans would use a "t" sound for that word. Not sure how you settle on a spelling in those cases.


Regional variations are available! I think the BBC would have had it pronounced tyoob. And don't Americans pronounce it "subway"?


In the north of England it is still commonly 'tyoob'.


Most Americans sadly never get to ride one anyway.


No but they do eat at them.


Why would it? "u" generally doesn't follow this pattern in English after "k" any more so than it does after "c".

That aside, what you describe is a distinction between yod-dropping and lack thereof, and whether and where it happens is highly dialect dependent.


This is an issue because vowel letters/digraphs are much more inconsistent than consonant letters/digraphs.


> as the letter "c" should make the "ch" sound

What’s the ch sound? My intuition from German class is that ch represents a throaty hhhh. Somehow that got spoiled into k in most English words.

Every c in Pacific Ocean is pronounced differently. C is a silly letter.


> My intuition from German class is that ch represents a throaty hhhh

If you mean the standard German from Germany, there a two variants. At the end of a syllable it is like you described (kind of throaty hhhh). For the beginning of syllables think of sh and open your mouth.


> My intuition from German class is that ch represents a throaty hhhh

It varies between dialects. Swiss German speakers tend to stick out to Germans because we pronounce the ch in a much scratchier way than is accepted in Standard German.


Trunk detects, quarantines, and eliminates flaky tests from your code base. Works with any language, any test runner, and any CI provider.


I totally get this side of things. I see the benefits of Agentic coding for small tasks, minor fixes, or first drafts. That said, I don't understand the pseudo-tribalism around specific interfaces to what amounts to only a few models under the hood and worry about what its doing for (or not doing for) junior devs.

Also, if we could get AI tooling to do the reviews for us reliably, I'd be a much happier developer.


The product makes notes from the transciptions, so this animation isnt wildly off base to have


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: