Training a classifier

We train a classifier with parameters on a dataset to output the probability of a datapoint belonging to the class given the feature vector , .
The optimal parameters maximize the probability of the true labels:

When training with mini-batches :


780|center
softmax | cross-entropy loss

780|center

References

alfcnz NYU