More

qwename · on June 20, 2020

Other than using a stronger hashing algorithm that produces longer hashes, would there be any advantage in storing two or more separate hashes of an object? The extra hashes could be from a different hash function, or a hash of the reversed bits/bytes.

I wonder about the difficulty of producing collisions for a single 256-bit hash function versus two 128-bit hash functions, four 64-bit hash functions and so on.

boneitis · on June 20, 2020

It looks like you've already got the jist of this from the second and third link in a grandchild post. Anyway!

For a bit of a hands-on view, there is a terrific cryptography coding exercise that explores this idea[0], and you take it all the way in actually breaking the construction (although the exercise is concerned with collisions, rather than preimages). The linked exercise has you homebake your own de-fanged hash functions, so that you can run solution code within your lifetime (as well as giving you a closer view of the machinery).

Of course, the exercise applies to fairly raw, Merkle-Damgard-like constructions.

What I took away from the challenge was that I would feel better using a single, longer hash than two shorter, concatenated hashes, because you are bottlenecking your construction to your stronger hash (if they differ in strength).

An auditor or attacker with white box access can attack your smaller hash (if your two hash functions differ in bit-size) to generate a 2^(n/2)-way collision (that is, if it's feasible to collide the hash at all, there is a clever trick to generate a massive number of messages that all collide to the same output), then birthday attack the bigger hash using the messages from that collision-pool. (Importantly, this mention of "n" specifically refers to the bit-size of the larger hash, a.k.a., the bottleneck.)

Although there might be something about it at first glance that makes the messages feel not random enough (in order for the birthday principle to hold), it's far easier to tell you to just do the exercise in order to grok why it works than for me to figure out how to explain it :)

This exercise is the first of an awesome triple-whammy[1][2] of enlightening hash function challenges.

[0] https://cryptopals.com/sets/7/challenges/52 [1] https://cryptopals.com/sets/7/challenges/53 [2] https://cryptopals.com/sets/7/challenges/54

TicklishTiger · on June 20, 2020

That reminds me of my phone. It is kinda special. Instead of an eleven digit number, I have eleven one digit phone numbers. When I write them down, I usually just writen them all in one row. So the written version of my eleven phone numbers look just like a normal eleven digit phone number.

qwename · on June 20, 2020

I will try to show the difference between one 256-bit hash function and two 128-bit hash functions.

Let h_0(x) be a 256-bit hash function. Let h_1(x) and h_2(x) be 128-bit hash functions.

Let h_3(x) = h_1(x) + h_2(x), h_4(x) = h_2(x) + h_1(x), where '+' is the concatenation operator. Thus h_3(x) and h_4(x) are also 256-bit hash functions.

Given an object x_1, find x_2 that satisfies one of the following conditions:

(1) h_0(x_1) = h_0(x_2)

(2) h_3(x_1) = h_3(x_2) AND h_4(x_1) = h_4(x_2)

Finding a collision for a single 256-bit hash function would be trying to solve (1), but finding collisions for two 128-bit hash functions would be trying to solve (2).

Also note that (2) is not equivalent to finding collisions for two 256-bit hash functions.

I think the misconception comes from the fact that the multiple hashes form an unordered set, not an ordered concatenation.

If there's any mistake in my logic please feel free to point them out.

Edit: (2) can be simplified into h_1(x_1) = h_1(x_2) AND h_2(x_1) = h_2(x_2), as concatenation of the hashes does not turn h_3 and h_4 into "real" 256-bit hash functions.

eknkc · on June 20, 2020

This makes sense. However the implementation detail of h_0(x) could also be foo(x) + bar(x) or whatever. At the end, you have a function that returns a fixed number of bytes.

I doubt any has function that we use today changes the algorithm halfway through its operation though. So your proposal might have a benefit in security by obscurity way.

qwename · on June 20, 2020

I suppose I missed the assumption that h_0, h_1 and h_2 cannot be trivially decomposed into concatenations of other hash functions, whereas h_3 and h_4 can.

Also, there's no obscurity since h_1 and h_2 are the actual targeted hash functions, h_3 and h_4 are just to show that concatenation doesn't change the actual problem or make it harder.

SftwreEngnr · on June 20, 2020

What?

sophiebits · on June 20, 2020

A function that concatenates the results of two 64-bit hash functions _is_ a 128-bit hash function.

Just as a string of individual digits makes a larger number.

hobofan · on June 20, 2020

I mean, technically yes, but it it would be a 128-bit hash function with the security properties of a 64-bit hash function, so it offers little advantage over just using a 64bit hash (which I think was also the point you were trying to make, I think?).

However, that doesn't really address the original question on how much harder cracking 2x64bit hash would be than cracking a single 128bit hash would be.

My best guess there would be that it's really quantify as you start opening up more dimensions besides the number of bits. The gain would mostly come from protection against other properties from one of the algorithms like a potentially hidden backdoor or a undiscovered mathematical weakness. So as long as the strength of the individual hash function holds up, it probably makes sense to diversify between hash functions. E.g. SHA3-256 + BLAKE3-256, probably offers better long-term security properties than using SHA3-512.

dependenttypes · on June 20, 2020

> but it it would be a 128-bit hash function with the security properties of a 64-bit hash function

This is not true. Consider two hash functions f and g

    f(x) = md5(x)[0..63]
    g(x) = md5(x)[64..127]

and a third function

   h(x) = f(x) || g(x)

where || is concat

So no, concating multiple smaller hash functions is not any weaker than using a single big one.

hobofan · on June 20, 2020

So your point is that if you take the output of the _same_ 128bit hash function twice, split it into 64bit parts and then put it back together, you still have the full properties of the 128bit hash function? Well, no shit.

I have to admit that I'm not the greatest cryptography whiz, but I can't image that this holds up for _independent_ hash functions, where you should be able to more cheaply run a preimage attack against two 64bit hash functions than one 128bit hash function.

gruez · on June 20, 2020

>I have to admit that I'm not the greatest cryptography whiz, but I can't image that this holds up for _independent_ hash functions, where you should be able to more cheaply run a preimage attack against two 64bit hash functions than one 128bit hash function.

performance is probably an issue. SHA256 has 256 bits and SHA1 has 160 bits (1.6x more bits), but SHA256 isn't 1.6x slower, it's only 38% slower. benchmarks used: https://www.cryptopp.com/benchmarks.html

Back to the original question of "how secure are 2x 64bit hashes compared to 1x 128 bit hash?", I can't imagine how it could be any more secure, considering that if it were more secure, you could just make your 128 bit hash function be the concatenation of the two 64 bit hash functions. It might be equally secure, but I'm not sure why you'd use it over a properly designed 128 bit hash function.

dependenttypes · on June 20, 2020

> where you should be able to more cheaply run a preimage attack against two 64bit hash functions than one 128bit hash function.

Try doing that in my example.

freemint · on June 22, 2020

Your hash functions are contrustructed in a way such that the concatenation has the security properties of a 128 bit hash function (because your construction is equivalent md5). I am not sure that results holds for all concatenations of hash functions. Or for SHA concatenations.

I appreciate you making me think deeper about hash functions. Nice construction.

edoceo · on June 20, 2020

My guess: To paths to the same number string are the same number. ie: two 64 bit numbers is a 128 bit number in disguise.

hannob · on June 20, 2020

Hash security is a complicated beast. There were some research results that concatenating multiple hash functions isn't much more secure than the best of the concatenated functions. It's not a good way to produce more secure hashes.

Also please note that length is only one measurement of hash security. The fact that SHA1 is weak has more reasons. If it was a good hash function it would still have a security of 80 bits, which would not be a comfortable security margin, but still kinda not really broken for real. But further weaknesses reduced the attack complexity.

Likewise if you think about SHA256 this is not just the 256-bit-instead-of-160-bit-version of SHA1. It's a different function (although based on simliar constructions) without the weaknesses of SHA1.

qwename · on June 20, 2020

After finding the right search terms (concatenating hash functions), I found a few stackexchange discussions about this, which lead me to other methods like truncating a stronger hash function[1], and chaining hash functions[2]. Apparently TLS already concatenates MD5 and SHA1 [2][3].

Given that the article is about collision attacks and not preimage resistance, that was my main thought when thinking of the issue. I'll leave it to the experts to figure out what's the best for cryptographic hash functions.

[1] https://crypto.stackexchange.com/questions/9435/is-truncatin...

[2] https://crypto.stackexchange.com/questions/270/guarding-agai...

[3] https://en.wikipedia.org/wiki/Cryptographic_hash_function#Co...

hannob · on June 20, 2020

> Apparently TLS already concatenates MD5 and SHA1 [2][3].

Luckily TLS no longer does such things. This was a bad workaround in old versions of TLS. Which they then replaced by a "you can use secure or insecure hash functions, you choose" in TLS 1.2 (which is hard to excuse - at the time TLS 1.2 was written the weaknesses in SHA1 and MD5 were well known). In TLS 1.3 finally they did the right thing and only support secure hash functions.

qwename · on June 12, 2020

I tried translating with Baidu and it comes up with "宣传" and "鼓吹". The latter seems more appropriate for propaganda that puts something in a positive light, but doesn't apply to other types of propaganda. So I think a good translation would have to depend on the context.

qwename · on June 12, 2020

Maybe because there might exist propaganda against the Chinese government? If Twitter has evidence for Chinese propaganda on a US-based site that's inaccessible to the Chinese population, wouldn't it be reasonable to suspect that US propaganda that targets the Chinese government exists too?

The above is just my opinion, but I think we're detracting from the main issue in the parent comment, that the term "WuMao" can be considered a derogatory term. Tell me if I'm wrong, but many terms started off neutral, but turned derogatory over time. The example that comes to mind is the N-word that rhymes with aggro, I'm not even sure if it's okay to use that so I'm self-censoring.

ardy42 · on June 12, 2020

> Maybe because there might exist propaganda against the Chinese government? If Twitter has evidence for Chinese propaganda on a US-based site that's inaccessible to the Chinese population, wouldn't it be reasonable to suspect that US propaganda that targets the Chinese government exists too?

This is making the false assumption that "propaganda" is symmetrical and equivalent. From the CCP perspective, "propaganda against the Chinese government" would be to dispute its false and self-serving account of history or to suggest the Chinese people should enjoy Western-style human rights -- basically anything that questions the ultimate authority and dominance of the party.

See: https://www.nytimes.com/2013/08/20/world/asia/chinas-new-lea... and https://www.chinafile.com/document-9-chinafile-translation.

threeseed · on June 12, 2020

1) Twitter is accessible by Chinese nationals in Hong Kong, Taiwan and abroad which is whom they are trying to sway.

2) US doesn't need to spread propaganda through Twitter. Trump administration will routinely just come straight out and accuse China of doing X or Y.

s_y_n_t_a_x · on June 12, 2020

We're all replying to the article about the Chinese propaganda Twitter has evidence of, do you have evidence of US propaganda to submit into the discussion?

Regardless, I don't see how more wrong it is to call someone a "WuMao" than it is to call them a "trump supporter", they should both be allowed.

qwename · on June 12, 2020

No I don't have evidence, and I was going to include that I don't have evidence in my comment, but didn't. I was merely stating a possibility in answer to your question. Maybe you should reply to my second paragraph instead.

Edit: I don't know about the intricacies of political correctness, but somehow "trump supporter" feels ok, maybe because it has two words and "supporter" balances "trump". Personally I think "50 cent army" isn't as bad as "WuMao" even though they mean the same thing, go figure.

s_y_n_t_a_x · on June 12, 2020

The essence of my analogy was both are terms that denote the ideology of a person, something they CHOOSE, not are born into like race.

If you believed your government was good, why would you care about a label that targeted you as a supporter for them?

qwename · on June 12, 2020

Calling someone a "WuMao" is saying that the person is PAID to support the Chinese government, regardless of their actual opinions, and the target of support isn't in the term itself. Calling someone a "trump supporter" doesn't include the part about getting paid.

There seems to be a negative connotation with getting paid to support something, that you're just in it for the money and don't actually mean it.

Of course there are many nuances to the whole issue of labels which I don't have the knowledge or time to go into.

s_y_n_t_a_x · on June 12, 2020

I agree with the paid part adding an extra amount of bad connotation, that was an oversight in my analogy.

Regardless, being called any form of a shill should not be censored and the root of it's usage comes from a legitimate cause, especially when you're not using the phrase at a specific person, just an event like the GP did.

Using the phrase the way he did, he didn't target anyone but the CCP.

yorwba · on June 12, 2020

> being called any form of a shill should not be censored

It doesn't need to be censored, but calling people shills just for disagreeing makes a civilized discussion less likely.

s_y_n_t_a_x · on June 12, 2020

The GP didn't call anyone a shill, he called the group of bots shills. I don't see why we're talking about the word he chose to call the bots, not the bots themselves. He didn't attack anyone, he was referencing the high number of CCP bots found by Twitter.

qwename · on July 24, 2017

Currently, SCMP is not accessible in China.

qwename · on Dec 18, 2016

Relevant paper: "Polaris: Faster Page Loads Using Fine-grained Dependency Tracking", http://web.mit.edu/ravinet/www/polaris_nsdi16.pdf

naasking · on Dec 19, 2016

Argh, don't people Google names before using them? Polaris was HP's virus-safe Windows project:

http://www.hpl.hp.com/techreports/2004/HPL-2004-221.html

manarth · on Dec 19, 2016

The quintessential hardest problem in IT: naming things.

Polaris is also a missile. And a star. And a PowerPC port of Solaris.

We're always going to have name duplication. At least it's fairly easy to disambiguate with a Google search.

Ar-Curunir · on Dec 19, 2016

So... completely unrelated to this project's aim?

qwename · on Dec 8, 2016

I think there will always be resistance to change, so perhaps there should be a visible indication that some idea is being tested out, with an option for users to disable it.

dang · on Dec 8, 2016

That works for software changes but not community standards, which is what we were experimenting with here.

Perhaps the lesson is simply: don't experiment with community standards.

qwename · on Dec 9, 2016

I mistakenly thought of the change as a software change, since I've yet to use the flag button.

It might be useful to tie changes in community standards to something concrete and trackable. Taking the detox week as an example, there would be a way to flag something and specify "detox week" as a reason.

That being said, can you give an example of changes in community standards on this site which effects cannot be tracked in software?

EDIT: Clarified scope of "changes in community standards" to this site.

dang · on Dec 9, 2016

> Can you give an example of changes in community standards on this site which effects cannot be tracked in software?

Story quality, thread quality, community satisfaction...

nitrogen · on Dec 8, 2016

I'd hope that at least some experimentation is still possible. Wouldn't a lack thereof lead to stagnation? I have a hard time believing that, as great as it is, HN has found the global maximum of community design.

dredmorbius · on Dec 9, 2016

Experiment, please, but wisely and with clear communications.

The 'Net needs more models tested, but also far more nuance than most sites seem to show.

qwename · on Dec 7, 2016

This came up in the openbsd-misc mailing list yesterday[1]. Someone also mentioned the link that elchief posted.

[1] https://marc.info/?l=openbsd-misc&m=148105687011923&w=2

qwename · on Dec 6, 2016

Last year I reversed engineered a (36KiB) Windows PE binary[1] to obtain C++ source code that compiled to almost the exact same bytes. What stumped me was an undocumented section that contained metadata, which I gave up on figuring out.

Assuming you compile with the same options and in the same environment, the machine instructions should be the same.

EDIT: The compiler in question though is whatever Visual Basic 6.0 uses, so I wouldn't call it modern today.

[1] https://github.com/qwename/quirky-protector

throwaway7767 · on Dec 6, 2016

One of the longstanding issues the Tor project had with their reproducible builds was that the windows builds, built with Visual Studio, would always have a small piece of data in the same place that differed for each build. It took months to figure it out, but it turned out to be a bug in the MS compiler where they were including some uninitialized memory from the compiler into the resulting binary. IIRC it was fixed after being reported to Microsoft.

Could be the same or similar issue you are facing, given the age of the compiler.

I don't have a reference to link to though, Roger Dingledine told me this in person at a conference when discussing issues related to reproducible builds. But there should be an issue in the tor bug tracker.

qwename · on Dec 6, 2016

If you are using GNU screen, you can also see how each "frame" is drawn by looking through the scrollback buffer.

qwename · on Dec 5, 2016

Did you consult the plugin author before blogging about the vulnerability? Is there a reason why you blogged about it a year later? I am guessing that you waited until the old version of the plugin wasn't (widely) used any more.