I don't know what you mean by that. We know what's going on under the hood always: linear algebra, the attention mechanism etc.
To my first approximation all "Chain of thought" means is that instead of having to prompt the model to discuss everything in text and then decide at the end[1], now it sort of automatically does that so you don't need to prompt it.
[1] Which used to bring about very substantial improvements in performance on some tasks
I think it was clear from context that "under the hood" wasn't referring to the math but rather to the contents of the trace. What's written (often?) isn't what's actually being "thought" about. The trace is a trained output similar to the final output, which is to say that it's fake. There are research papers on the topic, particularly that models can be trained to print other arbitrary stuff during the "thinking" phase instead.
You can easily see this for yourself by carefully walking through a given trace with a critical eye. Here's an example from myself a few days ago. https://news.ycombinator.com/item?id=47623324
Yeah now I get what you're saying. Yes the trace isn't what's actually happening. What's actually happening is just the attention mechanism etc. The model doesn't "think" in human language, it thinks in linear algebra. The thing is that before chain of thought it used to be necessary to get the model to output some language because that's the only thing it had to attach processing to (so if you wanted more processing you needed to get it to generate more text). Whereas now we get the model to generate some text that is a simulcrum on the thought that it might hypothetically be doing but in actual practise chain of thought is just something they get the model to do by training it in a certain way.
Although people repeatedly say this, NYT did not in fact dox Slate Star Codex. He revealed his own information because he said they were going to reveal his name based on a draft of the article he says he saw. The verge apparently reported that no draft had been written and the NYT was still in news gathering stage. Who knows what the truth of that is, but factually he released the information.
> The New York Times published an article about the blog in February 2021, three weeks after Alexander had publicly revealed his name.
This isn't a math thing[1], it's a theoretical computing model (ie instead of a Turing machine or lambda calculus, you can use this instead) that you might study as part of studying computation theory or other bits of theoretical computer science.
[1] or not pure maths anyway. It's applied maths like all computer science.
is theoretical computer science (turing machines and automata theory, lambda calculus, complexity theory, computability, devidability, etc) pure maths? or applied maths indeed?
tree calculus is theoretical computer science for sure.
and that, computer science, in its beginnings at least, until the 1950s or so, was a field of mathematics, like algebra, or analysis or logic. all of which have pure maths parts and applied maths parts, don't they?
long story short, I don't think theoretical computer science is "applied maths", it to the contrary can be deep in pure maths land.
> First of all, the "difference" between P and Q would be the same independently of whether P, Q, or some other distribution is the "true" distribution.
I don't think this is the case in general because in D_{KL}(P||Q) the model is weighting the log probability ratio by P(x) whereas in D_{KL}(Q||P) it's weighting by Q(x).
So let's think it through with an example. Say P is the true probability of frequencies of English words and Q is the output of a model that's attempting to estimate this.
Say the model overestimates the frequency of some uncommon word (eg "ancillary"). D_{KL}(P||Q) weights by P(x), the actual frequency, so the divergence will be small, but since the model thinks the frequency of that word is high, when we take D_{KL}(Q||P) it weights by Q(x), the model estimated frequency, so it will weight that error highly and D_{KL}(Q||P) will be large.
That's why it's not symmetric - it's weighting by the first distribution so the "direction" of the error matters.
You misunderstood what I was saying. I was not suggesting that the KL divergence is symmetric. I was saying that it would be symmetric (and independent of the "truth" of a distribution) if it was interpreted as the quoted measure of "difference" between two distributions. So that proposed interpretation is wrong.
Honestly neither of these things is a remotely important problem in the UK. The whole conversation about immigration in the UK is just a dogwhistle to try to attract racist voters now that since brexit they can’t use conversation about the EU for that.
As everywhere, immigration is not an issue for people who do not have to live in social/cheap rented housing in ghettos and have jobs that require a good education.
For the other half of societies, the half you did not meet in college, it is a problem and they are voting for racist parties because the other parties have not reduced immigration it and they think the racist parties will.
Immigration is actually one of the main issues in the UK and it's not about racism.
Productivity, population growth (and impact on housing and services), population aging (and impact on social care and NHS), societal and cultural changes and conflicts, national identity, etc are all linked to immigration.
The issue is compounded by the fact that successive governments say they want to be tough on immigration, but actually do the opposite. This is what pushes voters to Reform UK and away from the Tories, among other things. Labour is now doing the same (relatively tough talk but no actual effective action).
So you want me to convince myself that immigration is an issue?
You blame immigrants moving into the country causing what problem exactly? Too many NHS workers from foreign countries now, or too much competition for you when applying for roles?
This isn’t a serious contribution to the discussion. The overall level of NHS services would clearly be far lower (non-existent in some cases) without the contribution of immigrants.
Do you believe that foreigners should be allowed to practice as nurses with fake qualifications? Because the NHS does. As they were ACTIVELY working while the NHS knew about the forgery.
Do you have evidence that there is a widespread, institutional fake qualifications problem with native NHS nurses? Please provide evidence. This is what would show that foreign workers in the NHS do not drag down standards on average.
We’re not going to figure out a practical way to improve the NHS with this level of debate. We get that you don’t like foreign nurses, but I’m not going to respond to your rhetorical questions.
You put words in my mouth, that's a dishonest way of arguing :)
I never once said all foreign nurses or bad, nor did I say I dislike them. I pointed out widespread, institutional level fraud that puts patients at risk, exclusively by foreign NHS staff.
It's worrying you can't respond to the argument without strawmans. Is patient safety a concern for you or does politics trump it?
You know that the NHS crucially depends on immigrant doctors and nurses and that their contribution is overwhelmingly positive. Your original comment was a completely transparent attempt to derail the discussion with a single cherry picked example.
> The issue is compounded by the fact that successive governments say they want to be tough on immigration, but actually do the opposite.
There’s a simple explanation for this. Being “tough” on immigration would be bad for the economy, bad for the NHS, and bad for the country as a whole. So once a party is actually in government, they don’t want to do it, whatever they said in order to gain votes during the election campaign.
Or, maybe massively restricting immigration is actually a great idea and the establishment is conspiring to prevent it. Just like with Brexit, right?
High immigration is the cheaper short term solution at the "cost" of other, deeper issues. Restricting immigration is only be "bad" for the economy because systemic issues are not tackled.
The NHS relies on foreign workers. Why? Because salaries and conditions are shit so locals either do not train for those jobs or give up and moce to Australia at some point. It is cheaper to keep it that way.
12% of 25-34 years old are "economically inactive", which means deep systemic issues.
Generally immigration also keep salaries lower and also productivity lower (and that's why the left is actually historically not too keen).
This is difficult to debate seriously because there is always someone to cry "racist".
If you actually put the fixes for those “systemic issues” on the table, it’s obvious that no-one is going to vote for them, so they’re total non-starters. E.g. it would be great to pay nurses more, but no-one is going to vote for the tax hike required to fund it.
If any of this is difficult to debate seriously, that’s because opponents of current immigration policy consistently appeal to the lowest common denominator (people’s prejudices) rather than framing a proper argument. Even in this thread, you can see someone trying (absurdly) to redirect the discussion towards some Nigerian nurses with fake qualifications.
Firstly I think the clarity in general is good. The one piece I think you could do with explaining early on is which pieces of what you are describing are the model of the system and which pieces are the Kalman filter. I was following along as you built the markov model of the state matrix etc and then you called those equations the Kalman filter, but I didn't think we had built a Kalman filter yet.
Your early explanation of the filter (as a method for estimating the state of a system under uncertainty) was great but (unless I missed it) when you introduced the equations I wasn't clear that was the filter. I hope that makes sense.
You’re pointing out a real conceptual issue: where the system model ends and where the Kalman filter begins.
In Kalman filter theory there are two different components:
- The system model
- The Kalman filter (the algorithm)
The state transition and measurement equations belong to the system model. They describe the physics of the system and can vary from one application to another.
The Kalman filter is the algorithm that uses this model to estimate the current state and predict the future state.
I'll consider making that distinction more explicit when introducing the equations. Thanks for pointing this out.
The tutorial actually predates ChatGPT by quite a few years (first published in 2017). Today, I do sometimes use ChatGPT to fix grammar, but I am responsible for the content and it is always mine.
not a big fan of this theory, but as we've seen in other instances, money from the public coffer is 'free', so even at a substantial loss, if the result ends up in the right private account, its still a net win for someone. and net a loss for the public even larger than "I'm suing the government for $50B, oh wait, that's me, I guess I'll just have to pay myself"
It’s definitely made-up. I worked for a Wall Street trading firm in the securities division for 8 years. People lie about comp as a matter of course. Secondly, many/most senior-ish jobs in those firms come with a guaranteed bonus for the first year or couple of years. If they were to fire the person before that period elapsed they would have to pay out the bonus.
I know of many people who were fired the day their guarantee elapsed- I can’t think of a single person on a guarantee who was fired before then. To put this into context, we had a guy on a guarantee on our prop desk who came in for one week after he joined, put on a (massively winning) trade that got him enough to get his guarantee and then literally didn’t set foot in the office for the rest of the year[1] until the day he came in to collect his bonus and resign/be fired.
[1] And it’s not like he was working from home because people in trading were (for compliance reasons) not allowed to work from home unless there was an emergency like a terrorist incident where the trading floor was closed.
It's not a "made-up term", it's shorthand for a well-known argument. Not allowing re-usable arguments is like not allowing the use of libraries in software: It wastes time better spent on moving the frontier forward.
Well, to be honest, those old enough remember when cryptography was considered someting for the military and special services, and considering using encryption would put you under immediate suspicion. Now we can at least argue we need it to protect us from the cyber crime, even if we really have privacy and free speech in mind
No. Firstly the gain is to a certain extent a matter of accounting. The most accurate method of accounting is “mark to market”. So if you have some gold and you think in dollars, then every day you look at how much gold you have and you look at the price of gold in dollars, you multiply the two and the difference between that value and the value you got to the previous day is your “mark to market pnl”.[1] This means you have a very accurate valuation for your asset but the downside of this approach is that your pnl is very volatile as the gold price moves around. This is the approach taken for most assets by most wall st firms. In fact at JPMC and Goldman it’s not stretching a point too far to say mark to market is nearly a religion. In this methodology there is no such thing as “unrealised” pnl.
Another approach is “historical cost” or “cost basis” accounting. In this approach you officially hold assets at the price you bought them, and only realise pnl when you dispose of them. This means you don’t get pnl volatility from marking to market and then you get a big lump of pnl when you sell.[2] Until you sell or otherwise crystalize the pnl, the profit is “unrealised”, which is just an imaginary amount that you may or may not get but you look at in your brokerage statement and smile if it’s green or frown if it’s red. The advantage of this method is you don’t get the pnl volatility and you can wait until an advantageous moment to take the profits. The downside is if you want to, you can deceive yourself by holding these assets at a valuation that is unrealistic and store up pnl pain for the future. This methodology caused a lot of problems in the 2008 crisis with institutions holding bonds at prices that they could never hope to sell them.[3]
“Moving” the gold from NYC to Paris may not (for practical reasons) have involved actually physically taking the bars from one place to another. They may have found a buyer in NYC and then bought some bars on the IME in London and had them delivered to Paris. (This would clearly have required crystalizing the profit if they were holding them at historical cost). It sounds from a brief read of the article as if the bars were in some non-standard format so they may have had them melted down and recast, which would have required an assay and so would have triggered a new valuation, realising the profit. Assuming they were holding them at historical cost, which it sounds like they were.
[1] Technically, if you sell some gold during the day, then the pnl on the portion you sold is “trading pnl” and the pnl on the remainder is “mark to market” but whatever. It’s pretty much the same for the French reserve bank which has gold and thinks in EUR, except they not only have gold MTM pnl but also FX pnl in the EUR/USD rate (because gold prices in USD but they think in EUR).
[2] Or do some other event which requires valuation. There are rules about this kind of thing.
[3] When Lehman collapsed they had bonds marked at 100 that were trading at less than 40 cents. One weekend I’ll never forget I got a call from a very senior partner and was asked to value the European part of that portfolio as part of the US regulators frantic attempts to find a buyer for Lehman before the market opened.
To my first approximation all "Chain of thought" means is that instead of having to prompt the model to discuss everything in text and then decide at the end[1], now it sort of automatically does that so you don't need to prompt it.
[1] Which used to bring about very substantial improvements in performance on some tasks
reply