A simple 2 x 2 Neural Network
with Linear Algebra, by Hand

Han de Bruijn, retired Engineer,
Theoretical Physicist by education
E-mail: umumenu@gmail.com


ABSTRACT

An extremely simple single-layer feedforward 2 x 2 neural network is the subject of this article. Because I feel it is important to understand some essential features of neural networks without the help of a computer. The network at hand can be completely described, mathematically, by elementary linear algebra. A working example with two inputs and one output is leading to the general case. A counter example with two outputs instead of one is presented as well. It is concluded that the network with one output has learning capability and the network with two outputs has not. The behaviour of the first network can be formulated in geometric terms: all points on a straight line through two given points in the input plane give the desired output. There are no other inputs that do the job. The network with two outputs, on the contrary, is not able to make any generalization. It does not learn from experience, so to speak. It's kind of surprising that the more intelligent network is characterized by a singular matrix, and the dumber network by a regular matrix of weights.

CONTENTS

  1. The Network
  2. Linear Algebra
  3. Working Example
  4. CounterExample
  5. General Case
  6. Conclusions
  7. Reference

The Network

In a book on Neural Networks [0] at page 21 we find this picture:

For the sake of simplicity, we replace it by a $2\times 2$ version.

   

with $i=1,2$ : two different instances of supervised learning.

Linear Algebra

The relationship between inputs and outputs is determined by a matrix of weights $w_{ij}$ : $$ \begin{bmatrix} y_{i1} \\ y_{i2} \end{bmatrix} = \begin{bmatrix} w_{11} & w_{12} \\ w_{21} & w_{22} \end{bmatrix} \begin{bmatrix} x_{i1} \\ x_{i2} \end{bmatrix} $$ Everything in the sequel is done by hand. Worked out for $i=1,2$ : $$ y_{11} = w_{11}x_{11} + w_{12}x_{12} \\ y_{12} = w_{21}x_{11} + w_{22}x_{12} \\ y_{21} = w_{11}x_{21} + w_{12}x_{22} \\ y_{22} = w_{21}x_{21} + w_{22}x_{22} $$ Consider the matrix elements $w_{ij}$ as new unknowns, then we find that $$ \begin{bmatrix} y_{11} \\ y_{21} \end{bmatrix} = \begin{bmatrix} x_{11} & x_{12} \\ x_{21} & x_{22} \end{bmatrix} \begin{bmatrix} w_{11} \\ w_{12} \end{bmatrix} $$ $$ \begin{bmatrix} y_{12} \\ y_{22} \end{bmatrix} = \begin{bmatrix} x_{11} & x_{12} \\ x_{21} & x_{22} \end{bmatrix} \begin{bmatrix} w_{21} \\ w_{22} \end{bmatrix} $$ Therefore only one matrix needs to be inverted in order to find the weights $w_{ij}$ of the neural network for two inputs $\vec{x}_k$ and two outputs $\vec{y}_k$: $$ \begin{bmatrix} w_{k1} \\ w_{k2} \end{bmatrix} = \begin{bmatrix} x_{11} & x_{12} \\ x_{21} & x_{22} \end{bmatrix}^{-1} \begin{bmatrix} y_{1k} \\ y_{2k} \end{bmatrix} $$

Working Example

$$ \mbox{Desired output:} \quad \begin{bmatrix} y_{11} \\ y_{12} \end{bmatrix} = \begin{bmatrix} y_{21} \\ y_{22} \end{bmatrix} = \begin{bmatrix} 1 \\ 2 \end{bmatrix} \\ \mbox{Training set of inputs:} \quad \begin{bmatrix} x_{11} \\ x_{12} \end{bmatrix} = \begin{bmatrix} 0 \\ 1 \end{bmatrix} \quad ; \quad \begin{bmatrix} x_{21} \\ x_{22} \end{bmatrix} = \begin{bmatrix} 2 \\ 3 \end{bmatrix} $$ The inverse matrix is $$ \begin{bmatrix} 0 & 1 \\ 2 & 3 \end{bmatrix}^{-1} = \begin{bmatrix} -3/2 & 1/2 \\ 1 & 0 \end{bmatrix} \quad \mbox{because} \quad \begin{bmatrix} 0 & 1 \\ 2 & 3 \end{bmatrix} \begin{bmatrix} -3/2 & 1/2 \\ 1 & 0 \end{bmatrix}= \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} $$ $$ \begin{bmatrix} w_{11} \\ w_{12} \end{bmatrix} = \begin{bmatrix} -3/2 & 1/2 \\ 1 & 0 \end{bmatrix} \begin{bmatrix} y_{11} \\ y_{21} \end{bmatrix} = \begin{bmatrix} -3/2 & 1/2 \\ 1 & 0 \end{bmatrix} \begin{bmatrix} 1 \\ 1 \end{bmatrix} = \begin{bmatrix} -1 \\ 1 \end{bmatrix} $$ $$ \begin{bmatrix} w_{21} \\ w_{22} \end{bmatrix} = \begin{bmatrix} -3/2 & 1/2 \\ 1 & 0 \end{bmatrix} \begin{bmatrix} y_{12} \\ y_{22} \end{bmatrix} = \begin{bmatrix} -3/2 & 1/2 \\ 1 & 0 \end{bmatrix} \begin{bmatrix} 2 \\ 2 \end{bmatrix} = \begin{bmatrix} -2 \\ 2 \end{bmatrix} $$ $$ \mbox{Therefore:} \quad \begin{bmatrix} w_{11} & w_{12} \\ w_{21} & w_{22} \end{bmatrix} = \begin{bmatrix} -1 & 1 \\ -2 & 2 \end{bmatrix} $$ Thus the matrix of the weights $w$ is singular. It is good to remember this, because it explains a lot.
First check if the given inputs correspond with the desired output: $$ \begin{bmatrix} 1 \\ 2 \end{bmatrix} = \begin{bmatrix} -1 & 1 \\ -2 & 2 \end{bmatrix} \begin{bmatrix} 0 \\ 1 \end{bmatrix} = \begin{bmatrix} 1 \\ 2 \end{bmatrix} $$ $$ \begin{bmatrix} 1 \\ 2 \end{bmatrix} = \begin{bmatrix} -1 & 1 \\ -2 & 2 \end{bmatrix} \begin{bmatrix} 2 \\ 3 \end{bmatrix} = \begin{bmatrix} 1 \\ 2 \end{bmatrix} $$ However, many other inputs also lead to the desired output: $$ \begin{bmatrix} 1 \\ 2 \end{bmatrix} = \begin{bmatrix} -1 & 1 \\ -2 & 2 \end{bmatrix} \begin{bmatrix} 6 \\ 7 \end{bmatrix} = \begin{bmatrix} 1 \\ 2 \end{bmatrix} $$ For arbitrary inputs we have: $$ \begin{bmatrix} -1 & 1 \\ -2 & 2 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} -x_1+x_2 \\ 2(-x_1+x_2) \end{bmatrix} = \begin{bmatrix} 1 \\ 2 \end{bmatrix} $$ $$ \mbox{Or:}\quad -x_1+x_2=1 \quad \Longrightarrow \quad x_2=1+x_1 $$ Thus infinitely many inputs $\vec{x}$ with $x_2=1+x_1$ generate the desired output $\vec{y}=(1,2)$ .

CounterExamples

All other inputs do not give the desired $\vec{y}=(1,2)$, such as in: $$ \begin{bmatrix} 1 \\ 2 \end{bmatrix} = \begin{bmatrix} -1 & 1 \\ -2 & 2 \end{bmatrix} \begin{bmatrix} 1 \\ 0 \end{bmatrix} = \begin{bmatrix} -1 \\ -2 \end{bmatrix} $$
We have investigated the case of two outputs $\vec{y}_i$ that are the same. Now suppose that they are not: $$ \begin{bmatrix} y_{11} \\ y_{12} \end{bmatrix} = \begin{bmatrix} 1 \\ 2 \end{bmatrix} \quad ; \quad \begin{bmatrix} y_{21} \\ y_{22} \end{bmatrix} = \begin{bmatrix} 2 \\ 1 \end{bmatrix} $$ Then it follows that $$ \begin{bmatrix} w_{11} \\ w_{12} \end{bmatrix} = \begin{bmatrix} -3/2 & 1/2 \\ 1 & 0 \end{bmatrix} \begin{bmatrix} 1 \\ 2 \end{bmatrix} = \begin{bmatrix} -1/2 \\ 1 \end{bmatrix} $$ $$ \begin{bmatrix} w_{21} \\ w_{22} \end{bmatrix} = \begin{bmatrix} -3/2 & 1/2 \\ 1 & 0 \end{bmatrix} \begin{bmatrix} 2 \\ 1 \end{bmatrix} = \begin{bmatrix} -5/2 \\ 2 \end{bmatrix} $$ $$ \Longrightarrow \quad \begin{bmatrix} w_{11} & w_{12} \\ w_{21} & w_{22} \end{bmatrix} = \begin{bmatrix} -1/2 & 1 \\ -5/2 & 2 \end{bmatrix} $$ The matrix of weights is regular now, instead of singular, with non-zero determinant $=3/2$. This means that there is a unique output with each input and vice versa.
For the test inputs and outputs we have: $$ \begin{bmatrix} -1/2 & 1 \\ -5/2 & 2 \end{bmatrix} \begin{bmatrix} 0 \\ 1 \end{bmatrix} = \begin{bmatrix} 1 \\ 2 \end{bmatrix} \quad \Longleftrightarrow \quad \begin{bmatrix} 0 \\ 1 \end{bmatrix} = \begin{bmatrix} 4/3 & -2/3 \\ 5/3 & -1/3 \end{bmatrix} \begin{bmatrix} 1 \\ 2 \end{bmatrix} $$ $$ \begin{bmatrix} -1/2 & 1 \\ -5/2 & 2 \end{bmatrix} \begin{bmatrix} 2 \\ 3 \end{bmatrix} = \begin{bmatrix} 2 \\ 1 \end{bmatrix} \quad \Longleftrightarrow \quad \begin{bmatrix} 2 \\ 3 \end{bmatrix} = \begin{bmatrix} 4/3 & -2/3 \\ 5/3 & -1/3 \end{bmatrix} \begin{bmatrix} 2 \\ 1 \end{bmatrix} $$

General Case

Let's reconsider the network with two inputs and a single output. $$ \begin{bmatrix} w_{k1} \\ w_{k2} \end{bmatrix} = \begin{bmatrix} x_{11} & x_{12} \\ x_{21} & x_{22} \end{bmatrix}^{-1} \begin{bmatrix} y_{1k} \\ y_{2k} \end{bmatrix} = \begin{bmatrix} x_{22}/D & -x_{12}/D \\ -x_{21}/D & x_{11}/D \end{bmatrix} \begin{bmatrix} y_{1k} \\ y_{2k} \end{bmatrix} $$ with $D=x_{11}x_{22} - x_{12}x_{21}$ , $y_{11}=y_{21}=p$ , $y_{12}=y_{22}=q$ . Consequently: $$ \begin{bmatrix} w_{11} \\ w_{12} \end{bmatrix} = \begin{bmatrix} x_{22}/D - x_{12}/D \\ -x_{21}/D + x_{11}/D \end{bmatrix} p $$ $$ \begin{bmatrix} w_{21} \\ w_{22} \end{bmatrix} = \begin{bmatrix} x_{22}/D - x_{12}/D \\ -x_{21}/D + x_{11}/D \end{bmatrix} q $$ $$ \begin{bmatrix} w_{11} & w_{12} \\ w_{21} & w_{22} \end{bmatrix} = \begin{bmatrix} p(x_{22}/D - x_{12}/D) & p(-x_{21}/D + x_{11}/D) \\ q(x_{22}/D - x_{12}/D) & q(-x_{21}/D + x_{11}/D) \end{bmatrix} $$ The matrix, in general, is singular. Check for the training set: $$ \begin{bmatrix} w_{11} & w_{12} \\ w_{21} & w_{22} \end{bmatrix} \begin{bmatrix} x_{11} \\ x_{12} \end{bmatrix} = \begin{bmatrix} p(x_{22}/D - x_{12}/D) & p(-x_{21}/D + x_{11}/D) \\ q(x_{22}/D - x_{12}/D) & q(-x_{21}/D + x_{11}/D) \end{bmatrix} \begin{bmatrix} x_{11} \\ x_{12} \end{bmatrix} = \\ \begin{bmatrix} p(x_{22}/D - x_{12}/D)x_{11} + p(-x_{21}/D + x_{11}/D)x_{12} \\ q(x_{22}/D - x_{12}/D)x_{11} + q(-x_{21}/D + x_{11}/D)x_{12} \end{bmatrix} = \\ \begin{bmatrix} p(x_{11}x_{22} - x_{21}x_{12})/D + p(-x_{12}x_{11} + x_{11}x_{12})/D \\ q(x_{11}x_{22} - x_{21}x_{12})/D + q(-x_{12}x_{11} + x_{11}x_{12})/D \end{bmatrix} = \begin{bmatrix} p \\ q \end{bmatrix} $$ $$ \begin{bmatrix} w_{11} & w_{12} \\ w_{21} & w_{22} \end{bmatrix} \begin{bmatrix} x_{21} \\ x_{22} \end{bmatrix} = \begin{bmatrix} p(x_{22}/D - x_{12}/D) & p(-x_{21}/D + x_{11}/D) \\ q(x_{22}/D - x_{12}/D) & q(-x_{21}/D + x_{11}/D) \end{bmatrix} \begin{bmatrix} x_{21} \\ x_{22} \end{bmatrix} = \\ \begin{bmatrix} p(x_{22}/D - x_{12}/D)x_{21} + p(-x_{21}/D + x_{11}/D)x_{22} \\ q(x_{22}/D - x_{12}/D)x_{21} + q(-x_{21}/D + x_{11}/D)x_{22} \end{bmatrix} = \\ \begin{bmatrix} p(x_{11}x_{22} - x_{21}x_{12})/D + p(x_{22}x_{21} + x_{22}x_{21})/D \\ q(x_{11}x_{22} - x_{21}x_{12})/D + q(x_{22}x_{21} - x_{22}x_{21})/D \end{bmatrix} = \begin{bmatrix} p \\ q \end{bmatrix} $$

Conclusions

Obviously, the only requirement for reproducing the desired outcome with arbitrary inputs $(x_1,x_2)$ is: $$ (x_{22}/D - x_{12}/D)x_1 + (-x_{21}/D + x_{11}/D)x_2 = 1 \\ (x_{22} - x_{12})x_1 + (x_{11} - x_{21})x_2 = (x_{11}x_{22} - x_{12}x_{21}) $$ The equation of a straight line in the $(x_1,x_2) = (x,y)$ plane jumps into mind. So let's devise the following mapping: $$ \begin{cases} x \leftrightarrow x_1 &:& x_1 \leftrightarrow x_{11} &,& x_2 \leftrightarrow x_{21} \\ y \leftrightarrow x_2 &:& y_1 \leftrightarrow x_{12} &,& y_2 \leftrightarrow x_{22} \end{cases} $$ The equation of a straight line through the points $(x_1,y_1)$ and $(x_2,y_2)$ in the $(x,y)$ plane is indeed equivalent with one in the $(x_1,x_2)$ plane: $$ \frac{y-y_1}{y_2-y_1} = \frac{x-x_1}{x_2-x_1} \quad \Longleftrightarrow \quad \frac{x_2-x_{12}}{x_{22}-x_{12}} = \frac{x_1-x_{11}}{x_{21}-x_{11}} $$ $$ (x_2-x_{12})(x_{21}-x_{11}) - (x_1-x_{11})(x_{22}-x_{12}) = 0 \\ (x_{22} - x_{12})x_1 + (x_{11} - x_{21})x_2 = (x_{11}x_{22} - x_{12}x_{21}) $$ Supervised learning for this simple neural network, with two inputs $(1) = (x_{11},x_{12})$ , $(2) = (x_{21},x_{22})$ and a desired output $(y_1,y_2)$ results in a generalization which is a straight line in the $(x_1,x_2)$ plane supported by the two inputs.


The network has learned that all points on that straight line through the two given points give the desired output. There are no other inputs that do the job. With other words, the network has acquired knowledge about the first two postulates of Euclidean Geometry (according to Google AI):

  1. Straight Line Segment: A straight line segment can be drawn joining any two points.
  2. Extension of Lines: Any straight line segment can be extended indefinitely in a straight line.
If we had defined two independent outputs instead of one - like in the second part of our
CounterExamples - then the two inputs can be calculated from the two outputs. There are no other inputs that lead to a desired output. With other words: this network cannot generalize; it has not learned from experience.

Conflicts of Interest: The author declares no conflicts of interest.
Funding: This research received no external funding.

Reference

0. Simon Haykin, Neural Networks, a comprehensive foundation, second edition.