Or just use finetuning. He mentioned in the comments last time that he was train...

jacquesm · on May 6, 2017

I have tried finetuning extensively, a typical run over a pre-trained set before expanding the number of classes has the loss steadily increasing without any clear indication of how long that would last. Maybe I should let a test run for a couple of days to see if it will eventually converge.

Also, keep in mind that the dataset is still tiny and that a method that works for large numbers of images may very well fail if you only have a few tens to maybe 100 or so images per class.

gwern · on May 6, 2017

For finetuning on additional data, you would have to lower the learning rate because you're only adding a few datapoints and it's almost entirely converged as it is. If your loss is increasing, that suggests overfitting to me via a too-high learning rate

Now, if you're changing the architecture (such as by adding additional categories of pieces), as I said, that's more tricky - what people usually do there is something like lop off the top layers and retrain them from scratch, possibly while freezing the rest of the NN (the assumption there being that the learned filters and lower layers ought to already be sufficient to classify a new category, which is reasonable since the lower layers tend to be learning things like lines and corners, all primitives which should be able to classify yet another square or rectangle etc).

Since this is the obvious response any reader familiar with deep learning would have while reading complaints about how slow your CNN is to train from scratch, it'd be good to discuss it in some detail what sort of finetuning you've tried and how it failed.

jacquesm · on May 7, 2017

I will send you an email with a re-run of my original experiments, they were roughly what you described (take a pre-trained net, remove the last layer and re-connect to a layer with the right number of classes), learning rates I tried were from 1e-6 to 1e-3 and none of those had satisfactory results.

I was about ready to give up on it when I decided to try to bring up a net from scratch and that worked quite well.

jph00 · on May 6, 2017

That's my view too. Freezing more layers at the start can also help a lot with fragile training.

jangerhofer · on May 6, 2017

New-comer to the field here with a few questions.

Do I understand correctly that a checkpoint is just a snapshot of the model at a point in time? i.e. "Here are the probabilities of each outcome given the characteristics I have observed already."

Also, what does "fully converged" signify? Are there points in the course of training the model at which it is more appropriate to "save" progress than at other times?

minimaxir · on May 6, 2017

> Also, what does "fully converged" signify?

In machine learning/deep learning, the decrease in training loss has major diminishing returns as training continues. Eventually, training the model hits a point where the loss barely improves each epoch/iteration. (fun visualization from one of my projects: http://minimaxir.com/img/char-embeddings/epoch-losses.png)

In some cases, the loss can stop improving entirely, or increase.

carbocation · on May 6, 2017

Any recs on choosing TensorFlow vs Keras-on-TensorFlow?

jacquesm · on May 7, 2017

Start with Keras, if you run into something you want to do that is not supported by Keras drop into TensorFlow, they are not mutually exclusive and all of TensorFlow is availble.

carbocation · on May 8, 2017

Fantastic. Thanks to you and minimaxir for the guidance.

minimaxir · on May 6, 2017

Keras does not add much overhead, if any, and yes, it is as easy as everyone claims.