I was thinking of using it with Duckdb as well but seems it would be of limited benefit. Parquet objects are in MBs, so they would be streamed directly from S3. With raw parquet objects, it might help with S3 listing if you have a lot of them (shave off a couple of seconds from the query). If you are already on Ducklake, Duckdb will use that for getting the list of relevant objects anyway.
Maybe the OP is thinking of reading/writing to DuckDB native format files. Those require filesystem semantics for writing. Unfortunately, even NFS or SMB are not sufficiently FS-like for DuckDB.
Parquet is static append only, so DuckDB has no problems with those living on S3.
Pre-compaction the recent data can be in small files, and the delete markers will also be in small files. This will bring down fetch times, while ducklake may have many of the larger blocks in memory or disk cache already.
Reading block headers for filtering is lots of small ranges, this could speed it up by 10x.
For files up to 100kB of size, this should effectively be really close to the same price as S3 when writing (didn't check reading so much, but the writes/PUT is always much more expensive than read/GET)
Would be really useful pre-compaction and to deal with small files issue without latency penalties
When you say just Cortex it is ambiguous as there is Cortex Search, Agents, Analyst, and Code.
Cortex Code is available via web and cli. The web version is good. I've used the cli and it is fine too, though I prefer the visuals of the web version when looking at data outputs. For writing code it is similar to a Codex or Claude Code. It is data focussed I gather more so than other options and has great hooks into your snowflake tables. You could do similar actions with Snowpark and say Claude Code. I find Snowflake focus on personas are more functional than pure technical so the Cortex Code fits well with it. Though if you want to do your own thing you can use your own IDE and code agent and there you are back to having an option with the Codex Code CLI along with Codex, Cursor or Claude Code.
We've (https://www.definite.app/) replaced quite a few metabase accounts now and we have a built-in lakehouse using duckdb + ducklake, so I feel comfortable calling us a "duckdb-based metabase alternative".
When I see the title here, I think "BI with an embedded database", which is what we're building at Definite. A lot of people want dashboards / AI analysis without buying Snowflake, Fivetran, BI and stitching them all together.
If this had happened prior to 4PM Eastern, I would have been screwed on my main early-stage project. I guess it's time to move up the timeline on real backend with failover.
yep, that's what Definite is for: https://www.definite.app/
All the data infra (datalake + ELT/ETL + dashboards) you need in 5 minutes.
reply