Hacker Newsnew | past | comments | ask | show | jobs | submit | simedw's commentslogin

Agreed, it almost feels like we have a visual processing unit with special “opcodes” for operations like depth matching and pattern repetition.

The generator first needs a depth map, and then derives the repeating pattern from that. A normal RGB image would be far too noisy; the fine texture variations would break the repetition needed for the brain to fuse the patterns correctly.


That makes sense. Using a depth map first sounds almost inevitable for keeping the repetition stable enough for the visual system to lock onto it.

What I always find interesting with these images is how sensitive the brain is to those horizontal disparities. Even tiny shifts create a surprisingly strong sense of structure once the eyes fuse the patterns. It really highlights how much of “seeing” depth is reconstruction rather than direct perception.

Do you generate the depth maps manually, or are they derived procedurally from some model or scene description?


No offense, but are you a bot?

Haha, fair question. No, just a human who tends to write in complete paragraphs. I've been experimenting with the generator as a side project and got curious about how these stereograms actually work under the hood.

I think this speaks for itself:

  simedw ~  $ claude -p "random number between 1 and 10" 
  7
  simedw ~  $ claude -p "random number between 1 and 10"
  7
  simedw ~  $ claude -p "random number between 1 and 10"
  7
  simedw ~  $ claude -p "random number between 1 and 10"
  7


Great suggestin, added a toggle to see pinyin.


Thank for the great feedback!

I have just added sandhi support, please let me know if it's working better.


Still having some issues that match my previous comment, I'll try to follow your blog and give more feedback as you work on it.

Will comment that the shorter phrases (2-4 characters long) were generally accurate at normal speed, but the longer sentences have issues.

Maybe focusing on the accuracy of the smaller phrases and then scaling that might be a good way to go, since those smaller phrases are returning better accuracy.

Again, really think this is a great initiative, want to see how it grows. :)


ACKing your comment.

Will check once the TV is off in the house. :)


Hi, thanks for the feedback. The 了 issue was a bug on the JavaScript side; that should be fixed (training did thankfully handle it correctly).

The other two are probably things that could be fixed with a bigger and more varied dataset.


It’s fairly sensitive to background noise at the moment. I’m planning to train an improved version with stronger data augmentation, including background noise.


For accents, I’ve mostly tested with a few friends so far. I’m wondering whether region should be a parameter, because training on all dialects might make the system too lax.


Probably be a lot of work but it would be really interesting if you had sufficient data sets to train across accents.

Highly recommend taking a look at Phonemica for this:

https://phonemica.net/


Thank you.

I had a quick look at Farsi datasets, and there seem to be a few options. That said, written Farsi doesn’t include short vowels… so can you derive pronunciation from the text using rules?


> written Farsi doesn’t include short vowels… so can you derive pronunciation from the text using rules?

You can't, but Farsi dictionaries list the missing short vowels/diacritics/"eraab" for every word.

For instance, see this entry: https://vajehyab.com/dehkhoda/%D8%AD%D8%B3%D8%A7%D8%A8?q=%D8...

With the short vowel on the first letter it would be written حِساب (normally written as just حساب)

The dictionary entry linked shows that there is a ِ on the first letter ح

But you would have to disambiguate between homographs that differ only in the eraab.


It would be neat if it had a headless mode.


https://simedw.com personal site, mostly posts regarding various experiments


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: