But OpenAI appears to have some sort of data moat. I doubt their model is the best in the world, but more/better data generally beats better model, and GPT-4 definitely beats Claude, Bard, Bernie and the rest probably because they curated the best quality and largest data set. Maybe that moat doesn't last long but perhaps they have exclusive rights to some of that dataset through commercial agreements that could be a more durable moat.
> But OpenAI appears to have some sort of data moat.
I'm willing to bet dollars to doughnuts that Google and Facebook have at least one, possibly 2 or more orders of magnitude more latent training data to work with - not including Googles search index.
My uninformed opinion is that Google and Meta's ML efforts are fragmented - with lots of serious effort going into increasing existing revenue streams with LLMs and the like being treated as a hobby or R&D projects. OpenAI is putting all its effort into a handful of projects that go into a product they sell. The dynamic and headcounts will change if the LLM market grows into billions
> My uninformed opinion is that Google and Meta's ML efforts are fragmented -
It seems more likely that at Google at least they just fell into the classic innovator's dilemma in which they were stuck trying to apply innovation to their current business models in an attempt at incremental innovation instead of seeking an entirely different customer and market.
I got the impression that Google was running Bard on a smaller model with presumably cheaper inference costs. I imagine the unit economics of both Bard and Chat GPT are negative right now and Google is trying to stay in the game without lighting too much money on fire.
Google and Facebook are not interested in pooling all their resources in order to build the next big thing. They are just interested in doing “enough” so that people keep using their platforms. The race is about how often a day every person on the planet spends on either google or Facebook/instagram. It’s about who is “the homepage of the internet”. They just need to be good enough so that traffic doesn’t move off to chat gpt.
I'm sure some people at google and meta were screaming at the top of their lungs to jump on the ai bandwagon before chatgpt - but you know how things work in large companies.
They're not as good at innovating, that's why they acquire startups all the time. It's a blood transfusion
Facebook.com already has decades-worth of natural language text and audio/video from uploads and "live" sessions. That is a deep pool, and wide too because Facebook probably has content in all currently-spoken natural languages, with the exception of those exclusively used by uncontacted peoples. That is a data moat.
I'd bet that there are any number of submarine startups out there sitting on top of full downloads of Common Crawl, archive.org, etc. who are only too happy to let OpenAI be the first penguin off the iceberg.
If OpenAI survives all the legal challenges, they'll just click "Go" and be in business in weeks to months.
If OpenAI gets smacked down, they haven't lost much.
There are probably also some submarine operations that are already doing/have done the training. If OpenAI gets bankrupted for copyright violations, we'll just never hear from those.
The cost is peanuts compared to the potential profit. Apple/Google/Facebook could absolutely eat the costs for a skunkworks project to do that training and just sit quietly waiting on the yes/no from legal.
>submarine startups out there sitting on top of full downloads of Common Crawl, archive.org, etc. who are only too happy to let OpenAI be the first penguin off the iceberg.
Not sure any startup would be ok with sitting on a potentially gamechanging model.
I was using Bard and chatGPT in parallel, but lately I just default to Bard. To me it's a better model with much more accurate answers, while chatGPT just gives you bombastic words.