Perceptron is a very simple binary classification online learning algorithm, dating from the 1950s1. Such an algorithm tries to classify input (here, points in the plane) into one of two categories. It receives the correct answer each time, and aims to minimize the total number of mistakes compared to a hypothesis class. Here, the hypothesis class is the set of halfplanes. Perceptron thus aims to perform as well at classifying the input points as any halfplane, which simply assigns points to one category if they are in the halfplane, and to the other if they are out of the halfplane. In the so-called realizable case, the true classifications really do come from a halfplane in this manner. Even in the realizable case, there is no online classification algorithm which can be guaranteed to make only finitely many mistakes given an infinite stream of inputs, even if all the points lie within some radius R of the origin2.
Perceptron’s simplicitly lends it to analytic study. It can be shown that if all the points it is asked to classify (1) lie within some radius R of the origin and (2) stay separated from the boundary of the halfplane by some distance d, then it will make finitely many mistakes3.
The following demo of perceptron was made for a reading group following the notes of Shai Shalev-Schwartz4.
Using the demo: You can click to input points x for the algorithm to classify, or send in random points. It classifies the points by maintaining a halfplane with normal vector5 w and reporting which side of the halfplane the points fall. The true classification is given by the blue halfplane (which the algorithm does not have access to). You can change u, the normal vector to the blue halfplane, by dragging it or typing in coordinates. After attempting to classify each point, perceptron gets the feedback y which is +1 when x is in the blue halfplane and -1 otherwise. It updates its guess, the (normal vector for) purple halfplane, based on this feedback and the process repeats. The bound shown here is from Theorem 3.9 of Shalev-Schwartz’s notes6.
By the way, how does the perceptron algorithm update based on the feedback? When it guessed the classification of x correctly, it does nothing. When it made a mistake, it updates w as follows. If x was truly in the blue halfplane, it updates w to w - x. If x was outside, it updates w to w + x. And that's all. The only memory perceptron has is encoded in w, as the sum of all the updates.
- wc which represents the component of w,
- xc which represents the component of x,
- y which has a value of +1 when x truly is unshaded, and -1 for when x truly is shaded,
- T, which counts the number of points placed so far,
- cost which counts the number of mistakes so far,
- and R which is the norm of the largest point input yet.
- Frank Rosenblatt, 1957 technical report (pdf) ↩
- OL p. 169 ↩
- See Theorem 3.9 of OL, or Mohri, Rostamizadeh 2013 for a recent proof of new bounds. ↩
- Shai Shalev-Schwartz, Online Learning Survey (OL). OLsurvey.pdf. The perceptron algorithm itself is on p. 170. ↩
- Here, the associated halfplane to a normal vector v is everything behind the line through the origin which is perpendicular to v. ↩
- This is upper bound on the number of mistakes Perceptron can make in terms of the inner products <x,u> for each point x given, largest norm of the points x, and the norm of u. This bound is computed whenever a new point x is placed and is displayed below the demonstration, along with some other stats. Note that the bound only holds if you do not change u during the process, which (along with the existence of such a u governing the true classification) corresponds to the realizable case. Choosing a sequence of x increasingly close to the line dividing each side of the halfplane such that Perceptron makes a mistake each time will cause the bound to diverge (this should happen when sending in random points for long enough). ↩