Yorhel was also the creator/developer of ncdu among many other open source projects. He was a big open source advocate. The sites he hosted (vndb.org and manned.org) have automated database dumps and source code fully available. Recommend to check out his website https://dev.yorhel.nl/
Archive.today's attack on https://gyrovague.com is still on-going btw. It started just over two months ago. Some IPs get through normally but for example finnish residential IPs get stuck on endless captchas. The JS snippet that starts spamming gyrovague appears after solving the first captcha.
I'm not a web developer, but I've picked up some bits of knowledge here and there, mostly from troubleshooting issues I encounter while using websites.
I know there are a number of headers used to control cross-site access to websites, and the linked blog post shows archive.today's denial-of-service script sending random queries to the site's search function. Shouldn't there be a way to prevent those from running when they're requested from within a third-party site?
You can't completely prevent the browser from sending the request—after all, it needs to figure out whether to block the website from reading the response.
However, browsers will first send a preflight request for non-simple requests before sending the actual request. If the DDOS were effective because the search operation was expensive, then the blog could put search behind a non-simple request, or require a valid CSRF token before performing the search.
> I know there are a number of headers used to control cross-site access to websites
Mostly these headers are designed around preventing reading content. Sending content generally does not require anything.
(As a kind of random tidbit, this is why csrf tokens are a thing, you can't prevent sending so websites test to see if you were able to read the token in a previous request)
This is partially historical. The rough rule is if it was possible to make the request without javascript then it doesn't need any special headers (preflight)
One side publishes words, the other DDoSes. One side could just ignore the other and go about their business, the other cannot. One is using force, which naturally leads to resistance and additional attention, the other is not.
Both sides look like they have been bullied in the past and not found their way out of reproducing the pattern yet.
Words can have influence and can come from a place of authority, which does carry responsibility. Words of a president are very different from words published on a random blog by some random person, and different yet again from words published by a newspaper. Some presidents words are opinion, the same words in different context are commands and not acting on them comes at a price.
Context matters. Which is why also different rules apply, and laws exist to guard these rules. DDoS is not an acceptable response in any jurisdiction, no matter what triggered them. We’re not in the Middle Ages, even if some behave like we are. Violence does not justify violence. Unjust action does not justify unjust responses.
Ah yes I can see the misunderstanding: I meant “acceptable” in a broader sense than just legally, but I can see how the use of “jurisdiction” implies law. It was not my intention to just reference the legality, but more in terms of what is considered “violence” by the society, where law is one level you can look at to get an idea.
Then, again: one persons illegal actions do not warrant another persons illegal actions. That’s not how society works, and not how law works.
> The blog is still online and only exists as a part of a harassment campaign targeting archive.today
The blog has a lot of more posts on random topics. Why do you imply that the owner of the bloh is part of a harassment campaign and "only" that is the reason for this years old blog to exist?
There are only two posts about archive.today on the blog, and one of them only exists because archive.today started DDoSing them. I fail to see how you could consider the entire blog to be a "harassment campaign", especially considering that the original blog post isn't even negative, it ends with a compliment towards archive.today's creator.
Okay, there's one filler post I missed. I'm sure it took a lot of time to write the 16739382nd post explaining what the various things on a boarding pass mean.
Attack? Did we read the same one article? One article is clearly defensive. The other is a piece of investigative journalism about who and how the site is run.
This is a weird way of saying that you wish gyrovague updated more frequently. You could just say “Big fan of his writing, I’d love it if he posted more” if your only complaint is that there aren’t enough recent blog posts on that website
The blog author is in Finland, so it's covered by the Article 8 right to privacy of the ECHR. The exact implementation is country dependent, I don't know how it works in Finland but in the UK we just extended the common law tort of "Breach of confidence" to it.
That is very surprising to me. As far as I know, in Finland details of your income are publicly available, but someone reposting publicly available information is illegal.
While I would it also better to a bit redact names and details mentioned in the original article in hindsight, I hardly find real defamation. I guess you want to provide random unproven evidence if someone is target of various foreign law enforcement and commercial sites.
In the article they even call for donations to archive.today . As far as I read the tone of the post is full of admiration. Funny thing is that IMHO the rather childish JavaScript attack gives credibility to the post after all.
In all this I somehow hope that we see a legal solution to all this major global copyright crisis that has been reinforced by LLM training. (If you want conspiracy theory: that I guess would be easy monetization for archive these days selling their snapshots)
Thinking about it, I think we might need better platform rules, maybe even regulations on this. There seems to be pretty much no line of defense, which might explain the rather desperate DoS. If you take anonymity as a right, discussion like ours here on HN are dangerous as well, as they easily make otherwise difficult to find knowledge easily visible. So while a single fan page might go unnoticed, in case of doxing amplification is also a problem. Just my spontaneous thought.
Edit: one afterthought. The story about hacking together a response to the GDPR takedown request quoting press rights and freedom of speech using an LLM shows actually the deeper problem. Actually rights come with obligations (at least ethical ones). At least in Europe press standards are typically rather aware of doxing risks. While actually celebraties also successfully use legal defenses, i still think the defenses for activist are weak balancing interest here (at least if you made something of public interest)
> The crucial context here is that archive.today provides a useful public service for free.
So public services should DDoS is your argument?
> Jani Patokallio runs gyrovague.net in order to harass people who provide useful public services.
I scrolled pretty far through the blog and didn't find anything of that sort. Just a bunch of travel stuff.
Now I'm curious what sort of "harassment" you hallucinated in the sites that were previously targeted by archive.today's DDoS attacks.
The person who runs archive.today decided to involve me, and every other visitor, in their dispute. They decided to use us to hurt someone else. That's a pretty big sin in my book.
By this logic, the Code Green worm is ethical; forcing a security patch upon users who didn’t install one is obviously Not Evil. And that’s why operating systems aren’t wrong to force security updates on their users using invisible phone-home systems that the users aren’t aware of: it’s a small sin that is entirely justified self defense for the users and the device maker. Clearly we should all be updated to iOS 26 without our consent.
The ‘small sin’ of wielding your userbase as a botnet is only palatable for HN’s readers because the site provides a desirable use to HN’s readers. If it were, say, a women’s apparel site that archived copies of Vogue etc. (which would see a ton of page views and much more effective takedown efforts!) and pointed its own DDoS of this manner at Hacker News, HN would be clamoring for their total destruction for unethical behavior with no such ‘it’s just a evil for so much good’ arguments.
Maintaining ethical standards in the face of desire for the profits of unethical behavior is something tech workers are especially untrained to do. Whether with Palantir or Meta or Archive.today, the conflict is the same: Is the benefit one derives worth compromising one’s ethics? For the unfamiliar, three common means of avoiding admitting that one’s ethics are compromised: “it’s not that bad”, “ethics don’t apply to that”, and “that’s my employer’s problem”. None of those are valid excuses to tolerate a website launching DDoS attacks from our browsers.
archive.today has a documented history of altering the archived content, as such they immediately lose the veil of protection of a service of "public good" in my books.
Just my 2 ¢, not that it really matters anymore in this current information-warfare climate and polarization. :/
What do you want me to address? I'm just pointing out that there are no great archival services, and the only real alternative to archive.today is worse.
>Pay attention to this type of behavior, folks. It's revealing
People are painting this as a mutually exclusive ideological decision. Yet two things can be true:
1) The act of archive.today archiving stories (and thus circumventing paywalls) is arguably v low level illegal (computer miss-use/unauthorized access/etc) but it is up for interpretation whether a) the operator or the person requesting the page carries the most responsibility b) whether it's enforceable in third party countries neither archive.today or the page requester reside in
2) DDoSing a site that writes something bad about you is fundamentally wrong (and probably illegal too)
No, pschastain has malware on their computer. I just hit a ratelimit on another account I was using, and decided it'd be funny if I replied from their own account.
Not really sure if circumventing paywalls is that unlawful across the world, but basically copying and pasting an entire web page is just clear and simple copyright violation.
I know it's petty. But don't act surprised when you find your garbage strewn all over your lawn next morning after you flipped off your neighbor the fourth time.
Besides the article about archive.today, which doesn't expose much, I see one about Clash of Clans, and a random crypto product. Those are not 'public services', not sure how you can put these in the same bundle?
Archive today being free doesn’t excuse them using their audience to DDoS someone they don’t like or excuse them from modifying archive content. Also documenting who funds a service is in the public interest.
> Jani Patokallio runs gyrovague.net in order to harass people who provide useful public services.
I mean...investigating who runs secretive yet popular websites is a useful public service, generally called "journalism". And your comments in this thread could be seen as an attempt to harass Jani.
I do not, to be clear, think you're doing anything morally wrong, but I'm also not sure I see how you can draw a bright line between your actions and Jani's. By the rather stretched logic and loose standards you've been using in these comments, it seems like you run your HN account to harass people who provide useful public services, no?
To be clear, if I have JavaScript blocked for archive.today (which is my default with NoScript; and really there is no site functionality that really needs JS on the user's end), then I don't participate in the DDOS, right?
I've been getting the endless captcha on my Finnish residential IPs, but I've also been getting that (or outright timeouts) when using VPNs, so I cannot use the site altogether. I wish there were alternatives.
While you article is insightful. Can the blog author please redact the actual names and nicks from your orginal blog post (including the exact places where to find the information). As this was discussed below. While I think you had good intentions, but it might be good to also reflect on the rights of that person not be identified.
Edit: I misread the comment initially as from someone with more insight. However, I guess it is obvious that anyone can see the JavaScript and participates involuntarily in the DoS.
I recently had the idea of somehow integrating Everything's folder size index to explorer and after failing to do it with claude code, I found out that Windhawk + Better file sizes does just that. I would have expected at least some performance degradation but in fact it was the opposite and made it feel much snappier. A huge QoL improvement to explorer that I've now installed to all my Windows PCs. Note that you need the alpha version (1.5) of Everything for best performance.
The weird thing is that there was nothing new in that blog post. And on top of that it couldn't conclusively say who the owner of archive.today is, so no one still knows.
I've not seen any evidence of them editing archived pages BUT the DDOSing of gyrovague.com is true and still actively taking place. The author of that blog is Finnish leading archive.today to ban all Finnish IPs by giving them endless captcha loops. After solving the first captcha, the page reloads and a javascript snippet appears in the source that attempts to spam gyrovague.com with repeated fetches.
Yes I have Finnish IP and just before I wrote that post I tested it to make sure it was still happening.
I assume it must be a blanket ban on Finnish IPs as there has been comments about it on Reddit and none of my friends can get it to work either. 5 different ISPs were tried. So at the very least it seems to affect majority of Finnish residential connections.
This is quite an interesting question. For a single datapoint, I happen to have access to a VPN that's supposedly in Finland, and connecting through that didn't make any captcha loop appear on archive.today. The page worked fine.
Now it's obviously possible that my VPN was whitelisted somehow, or that the GeoIP of it is lying. This is just a singular datapoint.
It’s also pretty common for VPNs to have exit nodes physically located in different counties to where they report those IPs (to GeoIP databases) as having originated from.
This seems like a massive improvement for openly available local ASR. Even the 300M model outperforms whisper-large-v3 according to the paper's benchmarks.
This model is actually expected to be bad for popular languages, just like previous MMS it is not accurate at all, it wins by supporting something rare well but never had good ASR accuracy even for Swedish etc. It is more a research thing than a real tool. Unlike Whisper.
And moreover, you can not tune those models for practical applications. The model is originally trained on very clean data, so lower layers are also not very stable for diverse inputs. To finetune you have to update the whole model, not just upper layers.
In section 5.7.5, they fine-tune for "11 low-resource languages, with between 5-10 hours of training data and at least 1 hour of validation splits." "CTC
fine-tuning takes ≈1 hour of walltime on 32 GPUs for the 300M scale." If that's too expensive, you also have the option of supplying additional context for the LLM-based model (section 5.5).
As for "very clean data," see section 5.7.4: "Omnilingual + OMSF ASR was intentionally curated to represent naturalistic (i.e., often noisy) audio conditions, diverse speaker identities, and spontaneous, expressive speech."
I can't tell if anthropic is serious about "model welfare" or if it's just a marketing ploy. I mean isn't it responding negatively because it has been trained that way? If they were serious, wouldn't the ethical thing be to train the model to respond neutrally to "harmful" queries?
VibeVoice-Large is the first local TTS that can produce convincing Finnish speech with little to no accent. I tinkered with it yesterday and was pleasantly surprised at how good the voice cloning is and how it "clones" the emotion in the speech as well.
reply