- Instruction Cache = 32kB, per core
- Data Cache = 32kB, per core
- 256kB Mid-Level cache, per core
- 8MB shared among cores (up to 4)
So I guess the confusion is that Intel moved the L2 cache onto each core (from Nehalem onwards, I think?) and used that opportunity to substantially lower latency for it.
[1] http://www.intel.com/content/www/us/en/processors/xeon/xeon-...