Hacker Newsnew | past | comments | ask | show | jobs | submit | psaccounts's commentslogin

I published a video that explains Self-Attention and Multi-head attention in a different way -- going from intuition, to math, to code starting from the end-result and walking backward to the actual method.

Hopefully this sheds light on this important topic in a way that is different than other approaches and provides the clarity needed to understand Transformer architecture. It starts at 41:22 in the below video.

https://youtu.be/6jyL6NB3_LI?t=2482


This video tutorial provides an intuitive, in-depth breakdown of how an LLM learns language and uses that learning to generate text. Key concepts shown below are covered in a way that is both broad and deep, while still keeping the material accessible without losing technical rigor:

* Historical context for LLMs and GenAI

* Training an LLM -- 100K overview

* What does an LLM learn during training?

* Inferencing an LLM -- 100K overview

* 3 steps in the LLM journey from pre-training to serving

* Word Embeddings -- representing text in numeric format

* RMS Normalization -- the sound engineer of the Transformer

* Benefits of RMS Normalization over Layer Normalization

* Rotary Position Encoding (RoPE) -- making the Transformer aware of token position

* Masked Self-Attention -- making the Transformer understand context

* How RoPE generalizes well making long-context LLMs possible

* Understanding what Causal Masking is (intuition and benefit)

* Multi-Head Attention -- improving stability of Self Attention

* Residual Connections -- improving stability of learning

* Feed Forward Network

* SwiGLU Activation Function

* Stacking

* Projection Layer -- Next Token Prediction

* Inferencing a Large Language Model

* Step by Step next token generation to form sentences

* Perplexity Score -- how well did the model does

* Next Token Selector -- Greedy Sampling

* Next Token Selector -- Top-k Sampling

* Next Token Selector -- Top-p/Nucleus Sampling

* Temperature -- making an LLM's generation more creative

* Instruction finetuning -- aligning an LLM's response

* Learning going forward


Could you share which 3 static analysis tools you use?

We found Coverity to be prohibitively expensive; but unfortunately there isn't any real alternative!


It is a shame that most people are unaware of the efficacy of Homeopathy to treat ADD, ADHD and autism-spectrum disorders. Do some research and you'll see how hundreds of thousands of people have been cured of these ailments using Homeopathy alone.


Do some research and you'll find homeopathy is bunk. There's no science behind it, studies have shown it completely ineffective, and besides, it's just clean water.


You are saying this out of the research you have done yourself, right? And yes, I have done the research and also have experienced it first hand -- something that Stanford doctors could not do (and an army of 3 different specializations nonetheless) was cured using Homeopathy. Is this placebo effect?


Give it up. It doesn't matter if you're right or wrong, or it cured your cancer or whatever, you'll never get past the skeptics.


This is a lie. Perhaps the poster is lying to themselves, but it is a lie nonetheless.


The placebo effect can be a wonderful thing.


With the rampant abuse of market power by the drug companies with Ritalin and anti depressants, I wonder if homeopathy could be a helpful competitor. Magic water can't cause harm and takes some money away from you so you can't use it for something harmful.

I don't remember the names, but a whole generation of anti depressants was shown to be completely useless and harmful. The initial studies were fabricated and/or got into journals unethically. The various decade long studies on Ritalin also show it to be largely useless and harmful.

Homeopathy is offensive to people who respect science, but it's a noble lie in the current market conditions.


>The various decade long studies on Ritalin also show it to be largely useless and harmful.

Twenty seconds on Google Scholar would lead me to the opposite conclusion. Your source?


My source is this very article we're commenting on, citing those decade long studies.


If that's the case then I disagree with your characterization; I don't see any passage that refers to decade long studies that show that Ritalin is "harmful" in the long-term.


This is why i love HN, this post is exactly where is has to be.


the association of witch doctors recommends ... water!


The implementation seems to use Log-Structured Merge Trees.

The only paper on this data structure seems to be: http://goo.gl/CVF1l

This paper is poorly written and quite honestly not useful to implement an LSM tree. Does anyone know of a better paper than this one?


Read "Cache-Oblivious Streaming B-trees" http://supertech.csail.mit.edu/cacheObliviousBTree.html It' LSM with faster searches.


Thanks for the pointer. Are there any known open-source implementations of the same?


Not that I know of. Also, they have a patent, but that didn't stop Acunu from reimplementing and improving the algorithm.



But it doesn't really describe the LSM data structure!


It describes the approach used here (cf. all of the discussion of the tablet system). The few differences are documented:

http://code.google.com/p/leveldb/source/browse/trunk/doc/imp...


Is there a more useful technical paper on Fractal Trees? Better yet is there a open source implementation of the same?


Cache-oblivious streaming B-trees:

http://supertech.csail.mit.edu/papers/sbtree.pdf

I don't know for certain any open-source implementation, but I have heard COLAs are used in HBase.


What would be the licensing model if the product is open source, but is NOT free (i.e., customers are charged for an open-source product in which BDB is used)?


You need to do your own analysis, not trusting some random guy on the internet (eg, me). But:

(1) If people need to pay to use, it's not "open source" by the "open source definition" of the OSI: <http://www.opensource.org/docs/osd>;

(2) Contacting Oracle/BDB will let you know what they think your obligations are.


Perl followed by C


C


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: