Hey HN!
We just launched[0] our NFT search engine at https://fnftf.io.
NFT frauds, copies, forgeries, etc are rampant. As one example check out a search[1] for a random Bored Ape that shows about a dozen visually indistinguishable copies of the Ape. You can also expand the results to see the various ummmm, “remixes” of the image submitted for search (and play with various filters, etc).
With that out of the way here’s how we built FNFTF:
- Blockchain... We run our own node infrastructure because the various node providers get really expensive really quickly when you need to index chains like we do. We have a custom Node backend written in TypeScript that handles this indexing (both historical and realtime) with HTTP and WebSocket connections to our various nodes. I can spend a lot of time talking about how challenging this first step is…
- We fetched and cataloged all of the on chain data and the metadata and actual content (wherever it may be). We do this in realtime as new NFTs are minted, sold, whatever. This has plenty of challenges too.
- We add the media content (currently all image and video formats) to our database via our perceptual hashing implementation.
- Search and comparison is the tricky one... Every so often we build an approximate nearest neighbors index for all of the content in the perceptual hashing database. This is then loaded in memory.
- The actual search comes in multiple passes. We first take submitted content and generate an abbreviated perceptual hash for it. We search the ANN index to get a first pass of results using various standard distance approaches. We then filter that first pass through higher resolution perceptual hashes to increasingly filter the results and generate distance scores for percentage of content match scoring.
- The backend for the hash and search steps is python powered by FastAPI.
- The API frontend is a Cloudflare worker in Bundled mode. We currently use about 6ms of CPU time so we have plenty of room there.
- The fnftf.io page is Next with a lot of React components generated statically and served via Cloudflare Pages
- Speaking of Cloudflare, we use Workers to fetch the image results from our backend storage to reisze, re-compress, and add our watermark. This is crucial because we want to provide result images but we definitely don’t want to further enable scammers.
- We cache search results in a CF Workers KV store for speedier follow-up searches and to enable search sharing on social media, etc. In terms of caching it’s not terribly effective because it relies on matching a hash of the search but it’s good for the sharing aspect.
- Our browser extension[2] is absolute bare bones and enables two click searches directly from about a dozen NFT marketplaces. All it does is get the image URL and launch a new tab to fnftf.io with the image URL as a parameter. Then we fetch that URL and do our thing.
All in we have about 40TB of data, growing by the day as we index new content and add blockchains.
I’m the sole founder at Tovera and only full time employee. I’d love to hear what the HN community thinks about FNFTF.io!
[0] https://www.producthunt.com/posts/fight-nft-fraud
[2] https://fnftf.io/?results=00722cd4e7124f8aab052e31b14e301e37...
[2] https://chrome.google.com/webstore/detail/tovera/gcghgjemlna...
Is this satire or no? I seriously can't tell, this reads like something from a 90's hacker zine astroturf article directed at Unix vendors...