Really? What hardware was llama3 trained on? now realize there’s 15 other models...

Really? What hardware was llama3 trained on?

now realize there’s 15 other models you haven’t heard of trained on the same brand’s hardware that didn’t work out / weren’t impressive enough to release.

you are literally describing Jevons paradox in action - the more efficient and practical you make the tech, the more of it we will consume. And sure, inference is easy, but training is still uniformly done on nvidia hardware, and the “software” is an incredibly non-trivial moat. AMD has been trying for literally 15 years and they’re on their third or fourth ground-up attempt to displace it… the most recent (ROCm) being over 5 years old at this point and still utterly non-competitive. Leave it to HN to imply that millions of person-hours of ecosystem building can basically be replicated in a long weekend.

(also, reminder that ollama3 has a 70b model too… and it’s a lot better than the 8b one still!)