Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I worked in a research capacity in the voice assistant org of a big tech company until very recently. There was a lot of panic when ChatGPT came out, as it became clear that the vast bulk of the org's modeling work and research essentially had no future. I feel bad for some of my colleagues who were really specialized in specific NLP technology niches (e.g. building NLU ontologies) which have been made totally obsolete by these generalized LLMs.

Personally - I'm moving to more of a focus on analytical modeling. There is really nothing interesting about deep learning to me anymore. The reality is that any new useful DL models will be coming out of mega-teams in a few companies, where improving output through detailed understanding of modeling is less cost effective than simply increasing data quality and scale. Its all very boring to me.



“ Seeking an improvement that makes a difference in the shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation. “

http://www.incompleteideas.net/IncIdeas/BitterLesson.html


This is a great read. It is accurate to what I felt the last time I trained a CNN -- it's not fun, and I don't get to feel clever. My brain isn't wired to give me a dopamine hit when the training does its job. It's just a, "wait, that's it?"

We will always want to do the discovery ourselves, and I can see why fighting that instinct is a challenge for those in the field.


Isn't the CNN a discovery in itself? Without it, we'd be following the bitter lesson and "leveraging computation" to throw more data / compute at an MLP.

Clearly someone felt that there'd be a better inductive bias and attempted something else, and now CNNs are what's used "in the long run".


I've seen many interpretations of this article and I'm curious as to the mainstream CS reading of it.

One could look at the move from linear models to non-linear models or the use of ConvNets (yes I know ViTs exist, to my knowledge the base layers are still convolution layers) as 'leveraging human knowledge'. Only after those shifts were made did the leveraging of computation help. It would seem to me that the naive reading of that quote only rings true between breakthroughs.


See also the GPT-4 technical report, page 37.

https://images.app.goo.gl/vRP8368Z17zW2hvC9



What's analytical modeling?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: