Hacker Newsnew | past | comments | ask | show | jobs | submit | stuffoverflow's commentslogin

Yorhel was also the creator/developer of ncdu among many other open source projects. He was a big open source advocate. The sites he hosted (vndb.org and manned.org) have automated database dumps and source code fully available. Recommend to check out his website https://dev.yorhel.nl/

There's a Yomitan dictionary (by bee) of VNDB using those dumps: https://github.com/bee-san/VNDB-Character-Names-by-Bee

Wow a few years ago I wanted to resurrect some of my old code, to do essentially what ncdu does

Then I found ncdu, and haven’t looked back since. So it saved me a lot of time

Thank you and RIP


ncdu is a lifesaver, RIP.

I've forgotten about ncdc. That was the one we used back in university.

Archive.today's attack on https://gyrovague.com is still on-going btw. It started just over two months ago. Some IPs get through normally but for example finnish residential IPs get stuck on endless captchas. The JS snippet that starts spamming gyrovague appears after solving the first captcha.

I'm not a web developer, but I've picked up some bits of knowledge here and there, mostly from troubleshooting issues I encounter while using websites.

I know there are a number of headers used to control cross-site access to websites, and the linked blog post shows archive.today's denial-of-service script sending random queries to the site's search function. Shouldn't there be a way to prevent those from running when they're requested from within a third-party site?


You can't completely prevent the browser from sending the request—after all, it needs to figure out whether to block the website from reading the response.

However, browsers will first send a preflight request for non-simple requests before sending the actual request. If the DDOS were effective because the search operation was expensive, then the blog could put search behind a non-simple request, or require a valid CSRF token before performing the search.


> I know there are a number of headers used to control cross-site access to websites

Mostly these headers are designed around preventing reading content. Sending content generally does not require anything.

(As a kind of random tidbit, this is why csrf tokens are a thing, you can't prevent sending so websites test to see if you were able to read the token in a previous request)

This is partially historical. The rough rule is if it was possible to make the request without javascript then it doesn't need any special headers (preflight)


[flagged]


One side publishes words, the other DDoSes. One side could just ignore the other and go about their business, the other cannot. One is using force, which naturally leads to resistance and additional attention, the other is not.

Both sides look like they have been bullied in the past and not found their way out of reproducing the pattern yet.


Words can have bad consequences. We‘ll see what will happen to Banksy after Reuters published words.

[flagged]


Words can have influence and can come from a place of authority, which does carry responsibility. Words of a president are very different from words published on a random blog by some random person, and different yet again from words published by a newspaper. Some presidents words are opinion, the same words in different context are commands and not acting on them comes at a price.

Context matters. Which is why also different rules apply, and laws exist to guard these rules. DDoS is not an acceptable response in any jurisdiction, no matter what triggered them. We’re not in the Middle Ages, even if some behave like we are. Violence does not justify violence. Unjust action does not justify unjust responses.


>DDoS is not an acceptable response in any jurisdiction

Who the fuck cares about what the law says? Seriously. The service archive.today offers is illegal in any case.

This is just a fundamentally bizarre context to be bringing up the law in.


Ah yes I can see the misunderstanding: I meant “acceptable” in a broader sense than just legally, but I can see how the use of “jurisdiction” implies law. It was not my intention to just reference the legality, but more in terms of what is considered “violence” by the society, where law is one level you can look at to get an idea.

Then, again: one persons illegal actions do not warrant another persons illegal actions. That’s not how society works, and not how law works.


> The blog is still online and only exists as a part of a harassment campaign targeting archive.today

The blog has a lot of more posts on random topics. Why do you imply that the owner of the bloh is part of a harassment campaign and "only" that is the reason for this years old blog to exist?


Because all the content in the past 4+ years is about archive.today?

Not true: https://gyrovague.com/2025/02/23/anatomy-of-a-boarding-pass-...

There are only two posts about archive.today on the blog, and one of them only exists because archive.today started DDoSing them. I fail to see how you could consider the entire blog to be a "harassment campaign", especially considering that the original blog post isn't even negative, it ends with a compliment towards archive.today's creator.


> all the content in the past 4+ years is about archive.today

But it's not? This was published between the two posts about archive.today: https://gyrovague.com/2025/02/23/anatomy-of-a-boarding-pass-...


Okay, there's one filler post I missed. I'm sure it took a lot of time to write the 16739382nd post explaining what the various things on a boarding pass mean.

They have posted twice in four years. Once doing some digging into who runs archive today, and a second time to respond to a ddos attack.

Writing about being ddos'd seems eminently reasonable. So if you elide that, you are talking about a single article in four years.

It's genuinely nothing.


The purpose of a thing is what it does.

> The purpose of a thing is what it does.

What is the purpose of the DDoS JS in the archive website then? Not DDoS?


I'm sure it's DDoS, just like the purpose of gyrovague.com is to attack archive.today

Easy stuff, no?


Attack? Did we read the same one article? One article is clearly defensive. The other is a piece of investigative journalism about who and how the site is run.

Neither of those is an attack.


Of course attempting to dox someone is an attack.

> Of course attempting to dox someone is an attack.

That's not how the judicative system works.


That might be true! But I don't think anyone in this story did that.

This is a weird way of saying that you wish gyrovague updated more frequently. You could just say “Big fan of his writing, I’d love it if he posted more” if your only complaint is that there aren’t enough recent blog posts on that website

You think DDoS (which is illegal btw) is okay as long as you don't like the target?

Considering the site itself is an illegal archive of websites, I think its obvious most of us don't treat what's 'legal' as a guide to whats 'moral'.

Harassment an doxing are both illegal.

Doxxing is illegal? I am against it but if it's republishing public info I don't think it can be illegal in the US unless there is an intent element.

The blog author is in Finland, so it's covered by the Article 8 right to privacy of the ECHR. The exact implementation is country dependent, I don't know how it works in Finland but in the UK we just extended the common law tort of "Breach of confidence" to it.

That is very surprising to me. As far as I know, in Finland details of your income are publicly available, but someone reposting publicly available information is illegal.

While I would it also better to a bit redact names and details mentioned in the original article in hindsight, I hardly find real defamation. I guess you want to provide random unproven evidence if someone is target of various foreign law enforcement and commercial sites. In the article they even call for donations to archive.today . As far as I read the tone of the post is full of admiration. Funny thing is that IMHO the rather childish JavaScript attack gives credibility to the post after all. In all this I somehow hope that we see a legal solution to all this major global copyright crisis that has been reinforced by LLM training. (If you want conspiracy theory: that I guess would be easy monetization for archive these days selling their snapshots)

Defamation? No.

Doxing? Yes.

It's clear that the person running archive.today does not actively publicize their identity.

> As far as I read the tone of the post is full of admiration

Exactly like an unhinged fan stalking a celebrity.


Totally agreed. Thanks for raising awareness.

Thinking about it, I think we might need better platform rules, maybe even regulations on this. There seems to be pretty much no line of defense, which might explain the rather desperate DoS. If you take anonymity as a right, discussion like ours here on HN are dangerous as well, as they easily make otherwise difficult to find knowledge easily visible. So while a single fan page might go unnoticed, in case of doxing amplification is also a problem. Just my spontaneous thought.

Edit: one afterthought. The story about hacking together a response to the GDPR takedown request quoting press rights and freedom of speech using an LLM shows actually the deeper problem. Actually rights come with obligations (at least ethical ones). At least in Europe press standards are typically rather aware of doxing risks. While actually celebraties also successfully use legal defenses, i still think the defenses for activist are weak balancing interest here (at least if you made something of public interest)


I get the endless captcha with a Southern California ip. Something emus either very broken or malicious.

Why is archive today attacking that website?

The linked blog contains a story about who funds archive today and they presumably don’t like being exposed.

Thanks. I am so confused by this social drama, I feel like I am getting too old for this.

It’s truly weird and unhinged the extent to which two rando Internet People are willing to grief each other.

Parasocialweb 2.0 I suppose.

You mean just to keep their secrets hidden they hurt others?

Like most companies or state ?

As an individual, keeping their identity private is the only way to prevent oppression.


well that exposing is hurting more than 2 for sure

[flagged]


> The crucial context here is that archive.today provides a useful public service for free.

So public services should DDoS is your argument?

> Jani Patokallio runs gyrovague.net in order to harass people who provide useful public services.

I scrolled pretty far through the blog and didn't find anything of that sort. Just a bunch of travel stuff. Now I'm curious what sort of "harassment" you hallucinated in the sites that were previously targeted by archive.today's DDoS attacks.


Should providing a public service absolve all sins?

So far, the only sin archive.today has been accused of is retaliating against a guy attempting to dox them.

That's a pretty small sin in my book. To be written off as wildly unsuccessful but entirely justified self defense.

DDoSing gyrovague.com is silly, not evil.

The content on gyrovague.com which targets archive.today is evil, plain and simple.


The person who runs archive.today decided to involve me, and every other visitor, in their dispute. They decided to use us to hurt someone else. That's a pretty big sin in my book.

By this logic, the Code Green worm is ethical; forcing a security patch upon users who didn’t install one is obviously Not Evil. And that’s why operating systems aren’t wrong to force security updates on their users using invisible phone-home systems that the users aren’t aware of: it’s a small sin that is entirely justified self defense for the users and the device maker. Clearly we should all be updated to iOS 26 without our consent.

The ‘small sin’ of wielding your userbase as a botnet is only palatable for HN’s readers because the site provides a desirable use to HN’s readers. If it were, say, a women’s apparel site that archived copies of Vogue etc. (which would see a ton of page views and much more effective takedown efforts!) and pointed its own DDoS of this manner at Hacker News, HN would be clamoring for their total destruction for unethical behavior with no such ‘it’s just a evil for so much good’ arguments.

Maintaining ethical standards in the face of desire for the profits of unethical behavior is something tech workers are especially untrained to do. Whether with Palantir or Meta or Archive.today, the conflict is the same: Is the benefit one derives worth compromising one’s ethics? For the unfamiliar, three common means of avoiding admitting that one’s ethics are compromised: “it’s not that bad”, “ethics don’t apply to that”, and “that’s my employer’s problem”. None of those are valid excuses to tolerate a website launching DDoS attacks from our browsers.


archive.today has a documented history of altering the archived content, as such they immediately lose the veil of protection of a service of "public good" in my books.

Just my 2 ¢, not that it really matters anymore in this current information-warfare climate and polarization. :/


> archive.today has a documented history of altering the archived content

Wow, I had no idea. Thanks.


Archive.org has an even worse history of this, FWIW.

It allows website owners and third parties to tamper with archived content.

Look here, for example: https://web.archive.org/web/20140701040026/http://echo.msk.r...

Archive.today is by far the best option available.


What does this example show? It shows „ad blocker detected“ for me.

Archived page from 2014 gets tampered with by this javascript from 2022: https://web.archive.org/web/20220912152218/http://echobanner...

Unless you're very technical, web.archive.org is completely untrustworthy


Deflection rather than addressing the actual accusation

Pay attention to this type of behavior, folks. It's revealing


What do you want me to address? I'm just pointing out that there are no great archival services, and the only real alternative to archive.today is worse.

>Pay attention to this type of behavior, folks. It's revealing

What does it reveal?


Lmao, did you just start bickering with yourself?

Or, wow, you just revealed your second account.


Yea, reading through the page, these two accounts have been sounding exactly the same. I suppose it is in line with the childish behavior of AT.

[flagged]


Reported you to mods via email.

Oh great, I might have to click "New Identity" in Tor Browser.

People are painting this as a mutually exclusive ideological decision. Yet two things can be true:

1) The act of archive.today archiving stories (and thus circumventing paywalls) is arguably v low level illegal (computer miss-use/unauthorized access/etc) but it is up for interpretation whether a) the operator or the person requesting the page carries the most responsibility b) whether it's enforceable in third party countries neither archive.today or the page requester reside in

2) DDoSing a site that writes something bad about you is fundamentally wrong (and probably illegal too)


Not just something, it is PII i.e. doxxing

[flagged]


[flagged]


[flagged]


No, pschastain has malware on their computer. I just hit a ratelimit on another account I was using, and decided it'd be funny if I replied from their own account.

Sure, Jan.

He wasn't lying, someone got into my account here. The mods got after it pretty quickly, kudos to them, definitely appreciated.

> So far, the only sin archive.today has been accused of is retaliating against a guy attempting to dox them.

I think you're missing that circumventing paywalls is unlawful in most parts of the world.


Respectfully, it's not, in most parts of the world.

> I think you're missing that circumventing paywalls is unlawful in most parts of the world.

And a necessity if you want to archive the content correctly, also necessary if you want the archives to be publicly available.


Not really sure if circumventing paywalls is that unlawful across the world, but basically copying and pasting an entire web page is just clear and simple copyright violation.

I know it's petty. But don't act surprised when you find your garbage strewn all over your lawn next morning after you flipped off your neighbor the fourth time.

Besides the article about archive.today, which doesn't expose much, I see one about Clash of Clans, and a random crypto product. Those are not 'public services', not sure how you can put these in the same bundle?

Archive today being free doesn’t excuse them using their audience to DDoS someone they don’t like or excuse them from modifying archive content. Also documenting who funds a service is in the public interest.

>Also documenting who funds a service is in the public interest.

Not really, no. It's not unlikely to result in the service ceasing to exist.


> Jani Patokallio runs gyrovague.net in order to harass people who provide useful public services.

I mean...investigating who runs secretive yet popular websites is a useful public service, generally called "journalism". And your comments in this thread could be seen as an attempt to harass Jani.

I do not, to be clear, think you're doing anything morally wrong, but I'm also not sure I see how you can draw a bright line between your actions and Jani's. By the rather stretched logic and loose standards you've been using in these comments, it seems like you run your HN account to harass people who provide useful public services, no?


I don't think your logic stands up to the most basic analysis:

It's unarguably easy to demonstrate the public benefit generated by archive.today, we use the links here on HN to bypass paywalls every single day.

Please demonstrate any public benefit generated by gyrovagues blog post.


To be clear, if I have JavaScript blocked for archive.today (which is my default with NoScript; and really there is no site functionality that really needs JS on the user's end), then I don't participate in the DDOS, right?

I've been getting the endless captcha on my Finnish residential IPs, but I've also been getting that (or outright timeouts) when using VPNs, so I cannot use the site altogether. I wish there were alternatives.

While you article is insightful. Can the blog author please redact the actual names and nicks from your orginal blog post (including the exact places where to find the information). As this was discussed below. While I think you had good intentions, but it might be good to also reflect on the rights of that person not be identified.

Edit: I misread the comment initially as from someone with more insight. However, I guess it is obvious that anyone can see the JavaScript and participates involuntarily in the DoS.


I recently had the idea of somehow integrating Everything's folder size index to explorer and after failing to do it with claude code, I found out that Windhawk + Better file sizes does just that. I would have expected at least some performance degradation but in fact it was the opposite and made it feel much snappier. A huge QoL improvement to explorer that I've now installed to all my Windows PCs. Note that you need the alpha version (1.5) of Everything for best performance.


I was recently pointed to your comment. I literally created this yesterday: https://github.com/sm18lr88/win-folder-size

What are the chances.


The weird thing is that there was nothing new in that blog post. And on top of that it couldn't conclusively say who the owner of archive.today is, so no one still knows.


I've not seen any evidence of them editing archived pages BUT the DDOSing of gyrovague.com is true and still actively taking place. The author of that blog is Finnish leading archive.today to ban all Finnish IPs by giving them endless captcha loops. After solving the first captcha, the page reloads and a javascript snippet appears in the source that attempts to spam gyrovague.com with repeated fetches.


> I've not seen any evidence of them editing archived pages

There is evidence of this in the article you're commenting on.


How do you know that? Did you see it (do you have a Finnish IP?)?


Yes I have Finnish IP and just before I wrote that post I tested it to make sure it was still happening.

I assume it must be a blanket ban on Finnish IPs as there has been comments about it on Reddit and none of my friends can get it to work either. 5 different ISPs were tried. So at the very least it seems to affect majority of Finnish residential connections.


> just before I wrote that post I tested it to make sure it was still happening

That's awesome. I wish everyone made sure of their facts. Thanks.


This is quite an interesting question. For a single datapoint, I happen to have access to a VPN that's supposedly in Finland, and connecting through that didn't make any captcha loop appear on archive.today. The page worked fine.

Now it's obviously possible that my VPN was whitelisted somehow, or that the GeoIP of it is lying. This is just a singular datapoint.


As another datapoint with Finnish IP from Mullvad VPN: CAPTCHA loop and indeed after solving first CAPTCHA this can be found in page source:

setInterval(function(){fetch("https://gyrovague.com/tag/"+Math.random().toString(36).subst...",{ referrerPolicy:"no-referrer",mode:"no-cors" });},1400);


It’s also pretty common for VPNs to have exit nodes physically located in different counties to where they report those IPs (to GeoIP databases) as having originated from.


VPNs usually don't tell you much about residential experiences.


Sanctions in this context mean visa restrictions (travel ban to US). So not financial sanctions. Just thought it would be a good thing to clarify.


This seems like a massive improvement for openly available local ASR. Even the 300M model outperforms whisper-large-v3 according to the paper's benchmarks.


Not sure, I recorded 3 seconds of voice (a single sentence) and the hf demo misrecognized about half of the words.


This model is actually expected to be bad for popular languages, just like previous MMS it is not accurate at all, it wins by supporting something rare well but never had good ASR accuracy even for Swedish etc. It is more a research thing than a real tool. Unlike Whisper.


And moreover, you can not tune those models for practical applications. The model is originally trained on very clean data, so lower layers are also not very stable for diverse inputs. To finetune you have to update the whole model, not just upper layers.


In section 5.7.5, they fine-tune for "11 low-resource languages, with between 5-10 hours of training data and at least 1 hour of validation splits." "CTC fine-tuning takes ≈1 hour of walltime on 32 GPUs for the 300M scale." If that's too expensive, you also have the option of supplying additional context for the LLM-based model (section 5.5).

As for "very clean data," see section 5.7.4: "Omnilingual + OMSF ASR was intentionally curated to represent naturalistic (i.e., often noisy) audio conditions, diverse speaker identities, and spontaneous, expressive speech."


I can't tell if anthropic is serious about "model welfare" or if it's just a marketing ploy. I mean isn't it responding negatively because it has been trained that way? If they were serious, wouldn't the ethical thing be to train the model to respond neutrally to "harmful" queries?


"Protection against malicious use" isn't as cool as "model welfare". I'm renaming my authentication function to "examineCrest()".


VibeVoice-Large is the first local TTS that can produce convincing Finnish speech with little to no accent. I tinkered with it yesterday and was pleasantly surprised at how good the voice cloning is and how it "clones" the emotion in the speech as well.


Academictorrents has monthly dumps of all reddit submissions and comments even after the API restrictions.



Interesting. You don’t have to be an academic to access these I guess?


They have magnet links and torrent files right there on the pages, so no.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: