Possibly crazy thought. With wider CPUs needing more ports (read and write) on the register file, would it make sense to use accumulators as registers so basic boolean and math ops could be done locally with a single read port that the alu could tap?
Modern CPUs have a structure called the bypass network that lets different ALUs forward their outputs to another's inputs without having to hit the register file. It's not exactly like a local accumulator but it's something in the same direction.
The idea was to have as many simple ALUs as registers. Results are kept in the ALU/register. All reads are essentially result forwarding. For example a simple RV32I requires 2 read ports and one write port on each register. If we use 2 R/W ports and put an ALU on each register, you reduce from 3 to 2 busses and can also do operations 2 at a time when an instruction clobbers one of its inputs. Or an ALU op along with a load/store.