Hacker Newsnew | past | comments | ask | show | jobs | submit | c1b's commentslogin

You can enable it in settings; works on my older iPhone.

Yes it just collapses eventually — never stabilizes. The training process is flawed, I suspect it has to do with the fact that some weights blow up over time, you can see in “weights” tab.

But at around 4K avg score you should see it solve the env almost every time.

Just a demo :) optimized for speed over stability.

Reward structure: Step: -1 Dot: +100 Win: +1000 so ~4k is max theoretical score on 6x6.


maybe because it doesn't understand "done"? perfect play is impossible, random variance will cause scores to drop even if the model plays well and "wins". feels like it would get stuck in a loop trying to improve what can't be improved.


The optimizer doesn't need to understand anything it's just an iterated mathematical construct. The author simply didn't bother to implement the necessary details to ensure numerical stability.

Alternatively it might be a problem with the scoring model in the end game.


That is what I thought op was saying when he used the word "understood". No need to jump on people using every day language that is still easily understood in context IMO.

feels like it would get stuck in a loop trying to improve what can't be improved.

That is the point, there is nothing on an intention that we cannot improve, the goal here is no more than 1 unique iteration of the same path


daniel youre a legend, thanks for all you do!

one question, I see perf comparisons here are done on an L4, but isn't this SKU very rare? Im used to T4 at that tier


Thanks!! Oh Colab provides L4s - but the benchmarks are similar for T4!

In fact Unsloth is the only framework afaik that fits in a t4 for finentuning with reasonable sequence lengths!


So o1 pro is CoT RL and o3 adds search?


How does o3 know when to stop reasoning?


It thinks hard about it


It has a bill counter.


does this also already exist on llvm toolchain?


Not really. LLVM does support compute mode SPIR-V, and _very_ recently graphics-mode SPIR-V which is being dogfooded by the highly WIP HLSL front-end in Clang (I couldn't get a trivial fragment shader to build with it a few weeks ago). Clang does not support compiling graphics-mode SPIR-V from C or C++.

This seems similar to the Vulkan Clang Compiler (which is not in-tree of LLVM), although it has some interesting differences such as implementing shader functions with new specifiers like `__hcc_vertex` rather than repurposing attributes for it like `[[clang::annotate("shady::entry_point::vertex")]]`.


The Microsoft HLSL compiler for dx12+ is based on LLVM and supports spir-v output https://github.com/microsoft/DirectXShaderCompiler

Though my understanding is that it's a complete hackjob stuck on an old version of LLVM, and not in a remotely upstream-able state and breaks other parts of the LLVM toolchain completely. I think there was talk about trying to clean that up at the same time as bringing the baseline version forward, but I have no idea about it's progress.


The progress is that DirectX will also support SPIR-V, going forward.

https://devblogs.microsoft.com/directx/directx-adopting-spir...


There seems to be documentation about it:

https://llvm.org/docs/SPIRVUsage.html


Hi Francois, I'm a huge fan of your work!

In projecting ARC challenge progress with a naive regression from the latest cycle of improvement (from 34% to 54%), it seems that a plausible estimate as to when the 85% target will be reached is sometime between late 2025 & mid 2026.

Supposing ARC challenge target is reached in the coming years, does this update your model of 'AI risk'? // Would this cause you to consider your article on 'The implausibility of intelligence explosion' to be outdated?


This roughly aligns with my timeline. ARC will be solved within a couple of years.

There is a distinction between solving ARC, creating AGI, and creating an AI that would represent an existential risk. ARC is a stepping stone towards AGI, so the first model that solves ARC should have taught us something fundamental about how to create truly general intelligence that can adapt to never-seen-before problem, but it will likely not itself be AGI (due to be specialized in the ARC format, for instance). Its architecture could likely be adapted into a genuine AGI, after a few iterations -- a system capable of solving novel scientific problems in any domain.

Even this would not clearly lead to "intelligence explosion". The points in my old article on intelligence explosion are still valid -- while AGI will lead to some level of recursive self-improvement (as do many other systems!) the available evidence just does not point to this loop triggering an exponential explosion (due to diminishing returns and the fact that "how intelligent one can be" has inherent limitations brought about by things outside of the AI agent itself). And intelligence on its own, without executive autonomy or embodiment, is just a tool in human hands, not a standalone threat. It can certainly present risks, like any other powerful technology, but it isn't a "new species" out to get us.


ARC as a stepping-stone for AGI? For me, ARC has lost all credibility. Your white paper that introduced it claimed that core knowledge priors are needed to solve it, yet all the systems that have any non-zero performance on ARC so far have made no attempt to learn or implement core knowledge priors. You have claimed at different times and in different forms that ARC is protected against memorisation-based Big Data approaches, but the systems that currently perform best on ARC do it by generating thousands of new training examples for some LLM, the quintessential memorisation-based Big Data approach.

I too, believe that ARC will soon be solved: in the same way that the Winograd Schema Challenge was solved. Someone will finally decide to generate a large enough dataset to fine-tune a big, deep, bad LLM and go to town, and I do mean on the private test set. If ARC was really, really a test of intelligence and therefore protected against Big Data approaches, then it wouldn't need to have a super secret hidden test set. Bongard Problems don't and they still stand undefeated (although the ANN community has sidestepped them in a sense, by generating and solving similar, but not identical, sets of problems, then claiming triumph anyway).

ARC will be solved and we won't learn anything at all from it, except that we still don't know how to test for intelligence, let alone artificial intelligence.

The worst outcome of all this is the collateral damage to the reputation of symbolic program synthesis which you have often name-dropped when trying to steer the efforts of the community towards it (other times calling it "discrete program search" etc). Once some big, compensating, LLM solves ARC, any mention of program synthesis will elicit nothing but sneers. "Program synthesis? Isn't that what Chollet thought would solve ARC? Well, we don't need that, LLMs can solve ARC just fine". Talk about sucking out all the air from the room, indeed.


Wow, you're the most passionate hater of ARC that I've seen. Your negativity seems laughably overblown to me.

Are there benchmarks that you prefer?


This might be useful to you: if you want to have an interesting conversation, insulting your interlocutor is not the best way to go about it.


I don't think they are insulting anyone, I think they're just asking for numbers.


What numbers?


CSGO model is only 1.5 gb & training took 12 days on a 4090

https://github.com/eloialonso/diamond/tree/csgo?tab=readme-o...


Thanks, that's the detail I was looking for on the training. It's amazing results like this can be achieved at such a low costs! I thought this kind of work was out of reach for the GPU poor.

The part about the continuous control still seems weird to me though. If anyone understands that then very interested to hear more.


What is this for exactly? Go should be used serverside, and Qt looks like its for UIs on embedded devices?


Qt is a large framework and also includes e.g. a large network module which supported async event based communication long before there was Go. But it's unlikely that a Go application uses this, because Go has its own network library and even native language support for asynchronous communication (buffered channels). But Qt also has cross-platform user interface features even with 3D and OpenGl support, which might be useful for people using Go on the desktop.


Its simple -- you see a justine.lol url, you click immediately


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: