The original claim was that the new image generation is direct multimodal output, rather than a second model. People provided evidence from the product, including outputs of the model that indicate it is likely using a tool. It's very easy to confirm that that's the case in the API, and it's now widely discussed elsewhere.
It's possible the tool is itself just gpt4o, wrapped for reliability or safety or some other reason, but it's definitely calling out at the model-output level
> It's possible the tool is itself just gpt4o, wrapped for reliability or safety or some other reason, but it's definitely calling out at the model-output level
That's probably right. It allows them to just swap it out for DALL-E, including any tooling/features/infrastructure hey have built up around image generation, and they don't have to update all their 4o instances to this model which, who knows, may be not be ready for other tasks anyway or different enough to warrant testing before a rollout, or more expensive, etc.
Honestly it seems like the only sane way to roll it out if it is a multimodal descendant of 4o.
>EDIT: And googling the tool name I see it's already been widely discussed on twitter and elsewhere
I am so confused by this thread.