SVM Remark

November 15, 2018 - 2 minute read - Category: Ml

The original Vision for SVM: Maximize the distance of points to the separation hyper plane and put them in different sides. We can formulate the problem in the following form:
$\max_\beta \left(\min_{y_i=1} d_\beta(\mathbf{x}_i)+\min_{y_i=-1} d_\beta(\mathbf{x}_i)\right) \qquad \text{s.t} \quad y_i(\beta \mathbf{x}_i-\alpha)\ge0$
, where $d_\beta ( \mathbf{x} )$ is distance of feature $\mathbf{x}$ to the separation plane with paramter $\beta$.
But it is sometimes harder to solve the min-max style optimization problems. Hence we construct a proxy problem to find the solution for $\beta$.
$\max_\beta \sum d_\beta(\mathbf{x}_i) \qquad \text{s.t} \quad y_i(\beta \mathbf{x}_i-\alpha)\ge1$
, where $d_\beta ( \mathbf{x} )$ is computed based on $1/ \mid \beta \mid $. Thus, we can translate the problem into
$\min \beta^T\beta \qquad \text{s.t} \quad y_i(\beta \mathbf{x}_i-\alpha)\ge1$
It is sometimes inveitable that some error will occur. Thus we introduce the idea of soft margin and modify the loss function for this problem. We want to optimize the loss function:
$L(\beta,\alpha)=\frac{1}{2}\beta^T\beta + C\sum \mathcal{l}(y_i(\beta\mathbf{x}-\alpha)-1)$
when $C$ is huge, the tolerance for error classificaitonis low. Alternatively, we can also consider loss function with the following form:
$L(\beta,\alpha)=\frac{1}{2}\lambda\beta^T\beta + \sum \mathcal{l}(y_i(\beta\mathbf{x}-\alpha))$
Under this scenario, if $\lambda$ is huge, then the marin should be big and the tolerance for error classification is high.
For the loss function, we can consider a so-called hinge loss, which is
$\mathcal{l}(x)=\max(0,1-z)$
. It get the name because the graph of the function looks like a hinge. Finally, the output loss function is
$L(\beta,\alpha)=\frac{1}{2}\lambda\beta^T\beta + \sum \max(0,1-y_i(\beta\mathbf{x}-\alpha))$
. Because $y_i(\beta\mathbf{x}-\alpha)$ should be larger than $1$, this equation will penalize points that wrongly calssified.