psaccounts's comments

psaccounts · 2026-01-08T19:40:04 1767901204

I published a video that explains Self-Attention and Multi-head attention in a different way -- going from intuition, to math, to code starting from the end-result and walking backward to the actual method.

Hopefully this sheds light on this important topic in a way that is different than other approaches and provides the clarity needed to understand Transformer architecture. It starts at 41:22 in the below video.

https://youtu.be/6jyL6NB3_LI?t=2482

psaccounts · 2025-09-24T18:42:18 1758739338

This video tutorial provides an intuitive, in-depth breakdown of how an LLM learns language and uses that learning to generate text. Key concepts shown below are covered in a way that is both broad and deep, while still keeping the material accessible without losing technical rigor:

* Historical context for LLMs and GenAI

* Training an LLM -- 100K overview

* What does an LLM learn during training?

* Inferencing an LLM -- 100K overview

* 3 steps in the LLM journey from pre-training to serving

* Word Embeddings -- representing text in numeric format

* RMS Normalization -- the sound engineer of the Transformer

* Benefits of RMS Normalization over Layer Normalization

* Rotary Position Encoding (RoPE) -- making the Transformer aware of token position

* Masked Self-Attention -- making the Transformer understand context

* How RoPE generalizes well making long-context LLMs possible

* Understanding what Causal Masking is (intuition and benefit)

* Multi-Head Attention -- improving stability of Self Attention

* Residual Connections -- improving stability of learning

* Feed Forward Network

* SwiGLU Activation Function

* Stacking

* Projection Layer -- Next Token Prediction

* Inferencing a Large Language Model

* Step by Step next token generation to form sentences

* Perplexity Score -- how well did the model does

* Next Token Selector -- Greedy Sampling

* Next Token Selector -- Top-k Sampling

* Next Token Selector -- Top-p/Nucleus Sampling

* Temperature -- making an LLM's generation more creative

* Instruction finetuning -- aligning an LLM's response

* Learning going forward

psaccounts · on June 20, 2016

Could you share which 3 static analysis tools you use?

We found Coverity to be prohibitively expensive; but unfortunately there isn't any real alternative!

psaccounts · on Jan 29, 2012

It is a shame that most people are unaware of the efficacy of Homeopathy to treat ADD, ADHD and autism-spectrum disorders. Do some research and you'll see how hundreds of thousands of people have been cured of these ailments using Homeopathy alone.

wahnfrieden · on Jan 29, 2012

Do some research and you'll find homeopathy is bunk. There's no science behind it, studies have shown it completely ineffective, and besides, it's just clean water.

psaccounts · on Jan 29, 2012

You are saying this out of the research you have done yourself, right? And yes, I have done the research and also have experienced it first hand -- something that Stanford doctors could not do (and an army of 3 different specializations nonetheless) was cured using Homeopathy. Is this placebo effect?

marshray · on Jan 29, 2012

Give it up. It doesn't matter if you're right or wrong, or it cured your cancer or whatever, you'll never get past the skeptics.

WalterSear · on Jan 29, 2012

This is a lie. Perhaps the poster is lying to themselves, but it is a lie nonetheless.

mikeash · on Jan 29, 2012

The placebo effect can be a wonderful thing.

Detrus · on Jan 29, 2012

With the rampant abuse of market power by the drug companies with Ritalin and anti depressants, I wonder if homeopathy could be a helpful competitor. Magic water can't cause harm and takes some money away from you so you can't use it for something harmful.

I don't remember the names, but a whole generation of anti depressants was shown to be completely useless and harmful. The initial studies were fabricated and/or got into journals unethically. The various decade long studies on Ritalin also show it to be largely useless and harmful.

Homeopathy is offensive to people who respect science, but it's a noble lie in the current market conditions.

estevez · on Jan 30, 2012

>The various decade long studies on Ritalin also show it to be largely useless and harmful.

Twenty seconds on Google Scholar would lead me to the opposite conclusion. Your source?

Detrus · on Jan 31, 2012

My source is this very article we're commenting on, citing those decade long studies.

estevez · on Feb 1, 2012

If that's the case then I disagree with your characterization; I don't see any passage that refers to decade long studies that show that Ritalin is "harmful" in the long-term.

sek · on Jan 30, 2012

This is why i love HN, this post is exactly where is has to be.

nn2 · on Jan 29, 2012

the association of witch doctors recommends ... water!

psaccounts · on May 9, 2011

The implementation seems to use Log-Structured Merge Trees.

The only paper on this data structure seems to be: http://goo.gl/CVF1l

This paper is poorly written and quite honestly not useful to implement an LSM tree. Does anyone know of a better paper than this one?

br1 · on May 9, 2011

Read "Cache-Oblivious Streaming B-trees" http://supertech.csail.mit.edu/cacheObliviousBTree.html It' LSM with faster searches.

psaccounts · on May 9, 2011

Thanks for the pointer. Are there any known open-source implementations of the same?

br1 · on May 9, 2011

Not that I know of. Also, they have a patent, but that didn't stop Acunu from reimplementing and improving the algorithm.

timr · on May 9, 2011

http://labs.google.com/papers/bigtable.html

psaccounts · on May 9, 2011

But it doesn't really describe the LSM data structure!

timr · on May 9, 2011

It describes the approach used here (cf. all of the discussion of the tablet system). The few differences are documented:

http://code.google.com/p/leveldb/source/browse/trunk/doc/imp...

psaccounts · on May 8, 2011

Is there a more useful technical paper on Fractal Trees? Better yet is there a open source implementation of the same?

jbapple · on May 9, 2011

Cache-oblivious streaming B-trees:

http://supertech.csail.mit.edu/papers/sbtree.pdf

I don't know for certain any open-source implementation, but I have heard COLAs are used in HBase.

psaccounts · on April 1, 2010

What would be the licensing model if the product is open source, but is NOT free (i.e., customers are charged for an open-source product in which BDB is used)?

gojomo · on April 1, 2010

You need to do your own analysis, not trusting some random guy on the internet (eg, me). But:

(1) If people need to pay to use, it's not "open source" by the "open source definition" of the OSI: <http://www.opensource.org/docs/osd>;

(2) Contacting Oracle/BDB will let you know what they think your obligations are.

psaccounts · on Feb 3, 2010

Perl followed by C

psaccounts · on Dec 17, 2008