Classification: supervised learning where the target value is class label (discrete attribute, e.g. integer, letter, word)
Training a classifier
We train a classifier with parameters on a dataset to output the probability of a datapoint belonging to the class given the feature vector , .
The optimal parameters maximize the probability of the true labels:When training with mini-batches :

