Yes it just collapses eventually — never stabilizes.
The training process is flawed, I suspect it has to do with the fact that some weights blow up over time, you can see in “weights” tab.
But at around 4K avg score you should see it solve the env almost every time.
Just a demo :) optimized for speed over stability.
Reward structure: Step: -1 Dot: +100 Win: +1000
so ~4k is max theoretical score on 6x6.
maybe because it doesn't understand "done"? perfect play is impossible, random variance will cause scores to drop even if the model plays well and "wins". feels like it would get stuck in a loop trying to improve what can't be improved.
The optimizer doesn't need to understand anything it's just an iterated mathematical construct. The author simply didn't bother to implement the necessary details to ensure numerical stability.
Alternatively it might be a problem with the scoring model in the end game.
That is what I thought op was saying when he used the word "understood". No need to jump on people using every day language that is still easily understood in context IMO.
Not really. LLVM does support compute mode SPIR-V, and _very_ recently graphics-mode SPIR-V which is being dogfooded by the highly WIP HLSL front-end in Clang (I couldn't get a trivial fragment shader to build with it a few weeks ago). Clang does not support compiling graphics-mode SPIR-V from C or C++.
This seems similar to the Vulkan Clang Compiler (which is not in-tree of LLVM), although it has some interesting differences such as implementing shader functions with new specifiers like `__hcc_vertex` rather than repurposing attributes for it like `[[clang::annotate("shady::entry_point::vertex")]]`.
Though my understanding is that it's a complete hackjob stuck on an old version of LLVM, and not in a remotely upstream-able state and breaks other parts of the LLVM toolchain completely. I think there was talk about trying to clean that up at the same time as bringing the baseline version forward, but I have no idea about it's progress.
In projecting ARC challenge progress with a naive regression from the latest cycle of improvement (from 34% to 54%), it seems that a plausible estimate as to when the 85% target will be reached is sometime between late 2025 & mid 2026.
Supposing ARC challenge target is reached in the coming years, does this update your model of 'AI risk'? // Would this cause you to consider your article on 'The implausibility of intelligence explosion' to be outdated?
This roughly aligns with my timeline. ARC will be solved within a couple of years.
There is a distinction between solving ARC, creating AGI, and creating an AI that would represent an existential risk. ARC is a stepping stone towards AGI, so the first model that solves ARC should have taught us something fundamental about how to create truly general intelligence that can adapt to never-seen-before problem, but it will likely not itself be AGI (due to be specialized in the ARC format, for instance). Its architecture could likely be adapted into a genuine AGI, after a few iterations -- a system capable of solving novel scientific problems in any domain.
Even this would not clearly lead to "intelligence explosion". The points in my old article on intelligence explosion are still valid -- while AGI will lead to some level of recursive self-improvement (as do many other systems!) the available evidence just does not point to this loop triggering an exponential explosion (due to diminishing returns and the fact that "how intelligent one can be" has inherent limitations brought about by things outside of the AI agent itself). And intelligence on its own, without executive autonomy or embodiment, is just a tool in human hands, not a standalone threat. It can certainly present risks, like any other powerful technology, but it isn't a "new species" out to get us.
ARC as a stepping-stone for AGI? For me, ARC has lost all credibility. Your white paper that introduced it claimed that core knowledge priors are needed to solve it, yet all the systems that have any non-zero performance on ARC so far have made no attempt to learn or implement core knowledge priors. You have claimed at different times and in different forms that ARC is protected against memorisation-based Big Data approaches, but the systems that currently perform best on ARC do it by generating thousands of new training examples for some LLM, the quintessential memorisation-based Big Data approach.
I too, believe that ARC will soon be solved: in the same way that the Winograd Schema Challenge was solved. Someone will finally decide to generate a large enough dataset to fine-tune a big, deep, bad LLM and go to town, and I do mean on the private test set. If ARC was really, really a test of intelligence and therefore protected against Big Data approaches, then it wouldn't need to have a super secret hidden test set. Bongard Problems don't and they still stand undefeated (although the ANN community has sidestepped them in a sense, by generating and solving similar, but not identical, sets of problems, then claiming triumph anyway).
ARC will be solved and we won't learn anything at all from it, except that we still don't know how to test for intelligence, let alone artificial intelligence.
The worst outcome of all this is the collateral damage to the reputation of symbolic program synthesis which you have often name-dropped when trying to steer the efforts of the community towards it (other times calling it "discrete program search" etc). Once some big, compensating, LLM solves ARC, any mention of program synthesis will elicit nothing but sneers. "Program synthesis? Isn't that what Chollet thought would solve ARC? Well, we don't need that, LLMs can solve ARC just fine". Talk about sucking out all the air from the room, indeed.
Thanks, that's the detail I was looking for on the training. It's amazing results like this can be achieved at such a low costs! I thought this kind of work was out of reach for the GPU poor.
The part about the continuous control still seems weird to me though. If anyone understands that then very interested to hear more.
Qt is a large framework and also includes e.g. a large network module which supported async event based communication long before there was Go. But it's unlikely that a Go application uses this, because Go has its own network library and even native language support for asynchronous communication (buffered channels). But Qt also has cross-platform user interface features even with 3D and OpenGl support, which might be useful for people using Go on the desktop.
reply