SVM Remark
-
The original Vision for SVM: Maximize the distance of points to the separation hyper plane and put them in different sides. We can formulate the problem in the following form:
, where $d_\beta ( \mathbf{x} )$ is distance of feature $\mathbf{x}$ to the separation plane with paramter $\beta$.
-
But it is sometimes harder to solve the min-max style optimization problems. Hence we construct a proxy problem to find the solution for $\beta$.
, where $d_\beta ( \mathbf{x} )$ is computed based on $1/ \mid \beta \mid $. Thus, we can translate the problem into
-
It is sometimes inveitable that some error will occur. Thus we introduce the idea of soft margin and modify the loss function for this problem. We want to optimize the loss function:
when $C$ is huge, the tolerance for error classificaitonis low. Alternatively, we can also consider loss function with the following form:
Under this scenario, if $\lambda$ is huge, then the marin should be big and the tolerance for error classification is high.
-
For the loss function, we can consider a so-called hinge loss, which is
. It get the name because the graph of the function looks like a hinge. Finally, the output loss function is
. Because $y_i(\beta\mathbf{x}-\alpha)$ should be larger than $1$, this equation will penalize points that wrongly calssified.