classification

Classification: supervised learning where the target value is class label (discrete attribute, e.g. integer, letter, word)

Training a classifier

We train a classifier $f_{θ}$ with parameters $θ$ on a dataset $D = {(x_{i}, y_{i})}_{i = 1}^{n}$ to output the probability of a datapoint belonging to the class $y$ given the feature vector $x$ , $P_{θ} (y ∣ x)$ .
The optimal parameters maximize the probability of the true labels:
$θ^{*} = ar g θ max E_{(x, y) \in D} [P_{θ} (y ∣ x)])]$
When training with mini-batches $B \subset D$ :
$θ^{*} = ar g θ max E_{B \subset D} (x, y) \in B \sum P_{θ} (y ∣ x)$

780|center
softmax | cross-entropy loss

780|center

References

alfcnz NYU

Max Wolf's Second Brain

Explorer

classification

References

Graph View

Backlinks