Just adding to the praise: I have a little test case I've used lately which was to identify the cause of a bug in a Dart library I was encountering by providing the LLM with the entire codebase and description of the bug. It's about 360,000 tokens.
I tried it a month ago on all the major frontier models and none of them correctly identified the fix. This is the first model to identify it correctly.
360k tokens = how many lines of code approximately ?
and also, if its an open source lib are you sure there's no mentions of this bug anywhere on the web?
Not a huge library, around 32K LoC and no mention of the bug on the web - I was the first to encounter it (it’s since been fixed) unless the training data is super recent.
Impressive. I tend to think it managed to find the bug by itself which is pretty crazy without being able to debug anything. Then again I haven't seen the bug description, perhaps the description makes it super obvious where the problem lies.
How do you use the model so quickly? Google AI Studio? Maybe I've missed how powerful that is.. I didn't see any easy way to pass it a whole code base!
Interesting, I've been asking it to generate some Dart code, and it makes tons of mistakes, including lots of invalid code (static errors). When pointing out the mistakes, it thanks me and tells me it won't make it again, then makes it again on the very next prompt.
I tried it a month ago on all the major frontier models and none of them correctly identified the fix. This is the first model to identify it correctly.