A simple 2 x 2 Neural Network
with Linear Algebra, by Hand

Han de Bruijn, retired Engineer,
Theoretical Physicist by education
E-mail: umumenu@gmail.com

ABSTRACT

An extremely simple single-layer feedforward 2 x 2 neural network is the subject of this article. Because I feel it is important to understand some essential features of neural networks without the help of a computer. The network at hand can be completely described, mathematically, by elementary linear algebra. A working example with two inputs and one output is leading to the general case. A counter example with two outputs instead of one is presented as well. It is concluded that the network with one output has learning capability and the network with two outputs has not. The behaviour of the first network can be formulated in geometric terms: all points on a straight line through two given points in the input plane give the desired output. There are no other inputs that do the job. The network with two outputs, on the contrary, is not able to make any generalization. It does not learn from experience, so to speak. It's kind of surprising that the more intelligent network is characterized by a singular matrix, and the dumber network by a regular matrix of weights.

CONTENTS

The Network

Linear Algebra

Working Example

CounterExample

General Case

Conclusions

Reference

The Network

In a book on Neural Networks [0] at page 21 we find this picture:

For the sake of simplicity, we replace it by a $2\times 2$ version.

Inputs from top to bottom: $(x_{i1},x_{i2})$
Outputs from top to bottom: $(y_{i1},y_{i2})$

with $i=1,2$ : two different instances of supervised learning.

Linear Algebra

The relationship between inputs and outputs is determined by a matrix of weights $w_{ij}$ : $$ \begin{bmatrix} y_{i1} \\ y_{i2} \end{bmatrix} = \begin{bmatrix} w_{11} & w_{12} \\ w_{21} & w_{22} \end{bmatrix} \begin{bmatrix} x_{i1} \\ x_{i2} \end{bmatrix} $$ Everything in the sequel is done by hand. Worked out for $i=1,2$ : $$ y_{11} = w_{11}x_{11} + w_{12}x_{12} \\ y_{12} = w_{21}x_{11} + w_{22}x_{12} \\ y_{21} = w_{11}x_{21} + w_{12}x_{22} \\ y_{22} = w_{21}x_{21} + w_{22}x_{22} $$ Consider the matrix elements $w_{ij}$ as new unknowns, then we find that $$ \begin{bmatrix} y_{11} \\ y_{21} \end{bmatrix} = \begin{bmatrix} x_{11} & x_{12} \\ x_{21} & x_{22} \end{bmatrix} \begin{bmatrix} w_{11} \\ w_{12} \end{bmatrix} $$ $$ \begin{bmatrix} y_{12} \\ y_{22} \end{bmatrix} = \begin{bmatrix} x_{11} & x_{12} \\ x_{21} & x_{22} \end{bmatrix} \begin{bmatrix} w_{21} \\ w_{22} \end{bmatrix} $$ Therefore only one matrix needs to be inverted in order to find the weights $w_{ij}$ of the neural network for two inputs $\vec{x}_k$ and two outputs $\vec{y}_k$: $$ \begin{bmatrix} w_{k1} \\ w_{k2} \end{bmatrix} = \begin{bmatrix} x_{11} & x_{12} \\ x_{21} & x_{22} \end{bmatrix}^{-1} \begin{bmatrix} y_{1k} \\ y_{2k} \end{bmatrix} $$

Working Example

CounterExamples

General Case

Let's reconsider the network with two inputs and a single output. $$ \begin{bmatrix} w_{k1} \\ w_{k2} \end{bmatrix} = \begin{bmatrix} x_{11} & x_{12} \\ x_{21} & x_{22} \end{bmatrix}^{-1} \begin{bmatrix} y_{1k} \\ y_{2k} \end{bmatrix} = \begin{bmatrix} x_{22}/D & -x_{12}/D \\ -x_{21}/D & x_{11}/D \end{bmatrix} \begin{bmatrix} y_{1k} \\ y_{2k} \end{bmatrix} $$ with $D=x_{11}x_{22} - x_{12}x_{21}$ , $y_{11}=y_{21}=p$ , $y_{12}=y_{22}=q$ . Consequently: $$ \begin{bmatrix} w_{11} \\ w_{12} \end{bmatrix} = \begin{bmatrix} x_{22}/D - x_{12}/D \\ -x_{21}/D + x_{11}/D \end{bmatrix} p $$ $$ \begin{bmatrix} w_{21} \\ w_{22} \end{bmatrix} = \begin{bmatrix} x_{22}/D - x_{12}/D \\ -x_{21}/D + x_{11}/D \end{bmatrix} q $$ $$ \begin{bmatrix} w_{11} & w_{12} \\ w_{21} & w_{22} \end{bmatrix} = \begin{bmatrix} p(x_{22}/D - x_{12}/D) & p(-x_{21}/D + x_{11}/D) \\ q(x_{22}/D - x_{12}/D) & q(-x_{21}/D + x_{11}/D) \end{bmatrix} $$ The matrix, in general, is singular. Check for the training set: $$ \begin{bmatrix} w_{11} & w_{12} \\ w_{21} & w_{22} \end{bmatrix} \begin{bmatrix} x_{11} \\ x_{12} \end{bmatrix} = \begin{bmatrix} p(x_{22}/D - x_{12}/D) & p(-x_{21}/D + x_{11}/D) \\ q(x_{22}/D - x_{12}/D) & q(-x_{21}/D + x_{11}/D) \end{bmatrix} \begin{bmatrix} x_{11} \\ x_{12} \end{bmatrix} = \\ \begin{bmatrix} p(x_{22}/D - x_{12}/D)x_{11} + p(-x_{21}/D + x_{11}/D)x_{12} \\ q(x_{22}/D - x_{12}/D)x_{11} + q(-x_{21}/D + x_{11}/D)x_{12} \end{bmatrix} = \\ \begin{bmatrix} p(x_{11}x_{22} - x_{21}x_{12})/D + p(-x_{12}x_{11} + x_{11}x_{12})/D \\ q(x_{11}x_{22} - x_{21}x_{12})/D + q(-x_{12}x_{11} + x_{11}x_{12})/D \end{bmatrix} = \begin{bmatrix} p \\ q \end{bmatrix} $$ $$ \begin{bmatrix} w_{11} & w_{12} \\ w_{21} & w_{22} \end{bmatrix} \begin{bmatrix} x_{21} \\ x_{22} \end{bmatrix} = \begin{bmatrix} p(x_{22}/D - x_{12}/D) & p(-x_{21}/D + x_{11}/D) \\ q(x_{22}/D - x_{12}/D) & q(-x_{21}/D + x_{11}/D) \end{bmatrix} \begin{bmatrix} x_{21} \\ x_{22} \end{bmatrix} = \\ \begin{bmatrix} p(x_{22}/D - x_{12}/D)x_{21} + p(-x_{21}/D + x_{11}/D)x_{22} \\ q(x_{22}/D - x_{12}/D)x_{21} + q(-x_{21}/D + x_{11}/D)x_{22} \end{bmatrix} = \\ \begin{bmatrix} p(x_{11}x_{22} - x_{21}x_{12})/D + p(x_{22}x_{21} + x_{22}x_{21})/D \\ q(x_{11}x_{22} - x_{21}x_{12})/D + q(x_{22}x_{21} - x_{22}x_{21})/D \end{bmatrix} = \begin{bmatrix} p \\ q \end{bmatrix} $$

Conclusions

Obviously, the only requirement for reproducing the desired outcome with arbitrary inputs $(x_1,x_2)$ is: $$ (x_{22}/D - x_{12}/D)x_1 + (-x_{21}/D + x_{11}/D)x_2 = 1 \\ (x_{22} - x_{12})x_1 + (x_{11} - x_{21})x_2 = (x_{11}x_{22} - x_{12}x_{21}) $$ The equation of a straight line in the $(x_1,x_2) = (x,y)$ plane jumps into mind. So let's devise the following mapping: $$ \begin{cases} x \leftrightarrow x_1 &:& x_1 \leftrightarrow x_{11} &,& x_2 \leftrightarrow x_{21} \\ y \leftrightarrow x_2 &:& y_1 \leftrightarrow x_{12} &,& y_2 \leftrightarrow x_{22} \end{cases} $$ The equation of a straight line through the points $(x_1,y_1)$ and $(x_2,y_2)$ in the $(x,y)$ plane is indeed equivalent with one in the $(x_1,x_2)$ plane: $$ \frac{y-y_1}{y_2-y_1} = \frac{x-x_1}{x_2-x_1} \quad \Longleftrightarrow \quad \frac{x_2-x_{12}}{x_{22}-x_{12}} = \frac{x_1-x_{11}}{x_{21}-x_{11}} $$ $$ (x_2-x_{12})(x_{21}-x_{11}) - (x_1-x_{11})(x_{22}-x_{12}) = 0 \\ (x_{22} - x_{12})x_1 + (x_{11} - x_{21})x_2 = (x_{11}x_{22} - x_{12}x_{21}) $$ Supervised learning for this simple neural network, with two inputs $(1) = (x_{11},x_{12})$ , $(2) = (x_{21},x_{22})$ and a desired output $(y_1,y_2)$ results in a generalization which is a straight line in the $(x_1,x_2)$ plane supported by the two inputs.

The network has learned that all points on that straight line through the two given points give the desired output. There are no other inputs that do the job. With other words, the network has acquired knowledge about the first two postulates of Euclidean Geometry (according to Google AI):

Straight Line Segment: A straight line segment can be drawn joining any two points.
Extension of Lines: Any straight line segment can be extended indefinitely in a straight line.

If we had defined two independent outputs instead of one - like in the second part of our CounterExamples - then the two inputs can be calculated from the two outputs. There are no other inputs that lead to a desired output. With other words: this network cannot generalize; it has not learned from experience.

Conflicts of Interest: The author declares no conflicts of interest.
Funding: This research received no external funding.

Reference

0. Simon Haykin, Neural Networks, a comprehensive foundation, second edition.

A simple 2 x 2 Neural Networkwith Linear Algebra, by Hand

Han de Bruijn, retired Engineer, Theoretical Physicist by education E-mail: umumenu@gmail.com