Cornell and NTT’s Physical Neural Nets Enable Arbitrary Physical System Training

Animats · on May 30, 2021

What they're doing, I think, is compiling a trained neural net into a different form.

(1), training input data (e.g., an image) is input to the physical system, along with parameters.

(2), in the forward pass, the physical system applies its transformation to produce an output.

(3), the physical output is compared to the intended output (e.g., for an image of an ‘8’, a predicted label of 8) to compute the error.

(4), using a differentiable digital model to estimate the gradients of the physical system(s), the gradient of the loss is computed with respect to the controllable parameters.

(5) the parameters are updated according to the inferred gradient.

What they mean by a "physical system" is a series of analog elements with lots of tuning parameters. Like filters. This is a system for setting the tuning parameters. You have to be able to simulate the "physical system", and it has to be mostly differentiable, so you can tune by hill-climbing.

The control systems people ought to like this, because the output is a control system that's made of components with predictable and continuous properties. You want to know that if if does the right thing for an input of 1.0 and 1.5, it doesn't do something totally unexpected for 1.365. This may be a way to get there.

This may be the mechanism behind "muscle memory". Tasks get optimized down to a control system that executes fast, but doesn't retrain easily.

The problems they chose to work on seem strange, but that may reflect their funding or interests. This might be worth trying for, say, quadcopter control. You might be able to train a neural net controller and then hammer it down into a quick little algorithm that can fit in the onboard computer.

(I subscribe to IEEE Control Systems Journal, and I understand maybe 15% of it.)

teruakohatu · on May 30, 2021

The hybrid approach of calculating loss with a physical system and then calculating the gradient using a model narrows the simulation-reality gap. The cost would surley be much, much slower training. If the physical process required, for example, heating and cooling an oven, training would take a very long time.

p1esk · on May 31, 2021

I haven’t looked at the article, but this sounds like the good old “chip in the loop” method. What’s new here?

rich_sasha · on May 30, 2021

Hard to deduce much from the article. Is it that it's a NN where the individual components are physical transforms?

> On the MNIST handwritten digit classification task, the trainable SHG transformations boost the performance of digital operations from roughly 90 percent accuracy to 97 percent.

It is hard to take it seriously, when 97% on MNIST is achievable with the kind of tutorials bundled at the end of PyTorch installation guides - "see you can make a DNN model in 10 lines of code!".

rch · on May 30, 2021

The paper is linked at the bottom of the article: Deep physical neural networks enabled by a backpropagation algorithm for arbitrary physical systems -- https://arxiv.org/abs/2104.13386

> physics-aware training (PAT)... allows us to efficiently and accurately execute the backpropagation algorithm on any sequence of physical input-output transformations, directly in situ.

p1esk · on May 31, 2021

Just looked at the paper - yep, looks like "chip in the loop" training method. This was very popular with analog NN accelerators in the 90s, and a few years ago I remember reading a paper which described an extension where they measured forward pass outputs at every layer on the chip (not just the final layer outputs as it's typically done).

I fail to see much novelty here. I wanted to do that for my phd on hardware aware training (graduated before our chip was ready), but the disadvantages of the method are well-known - it's extremely slow (data has be pass between the chip and a GPU server on every iteration, and that's assuming you have access to each layer outputs.