Least Squares Best Fit
$
\def \EN {\quad \mbox{and} \quad}
$
Suppose we have a collection of points in the plane and we want to draw a
straight line through these points, in such a way that the line is a "best fit"
to them. An equation for the straight line can always be set up as follows:
$$
\cos(\theta) (x - p) + \sin(\theta) (y - q) = 0
$$
Here $\theta$ is the angle of the line's normal with the x-axis and $(p,q)$
is an arbitrary point on the line. The distance of an arbitrary point $\vec{r}
= (x_k,y_k)$ to the line is given by the length of the projection of the point's
vector $\vec{r}$ onto the normal $\vec{n}$ of the line. The latter is given by
$\vec{n} = (\cos(\theta),\sin(\theta))$. Hence the length of the projection is:
$$
\left| \frac{(\vec{r} \cdot \vec{n})}{(\vec{n} \cdot \vec{n})} \vec{n} \right| =
\left| \cos(\theta) (x_k - p) + \sin(\theta) (y_k - q) \right|
$$
For the straight line to be a "best fit", it will be required that the sum of
the weighted squares of all distances shall be a minimum:
$$
\sum_k w_k \left[ \cos(\theta) (x_k - p) + \sin(\theta) (y_k - q) \right]^2
= \mbox{minimum}(p,q,\theta)
$$
Working out a bit:
$$
\cos^2(\theta) \sum_k w_k (x_k - p)^2 +
\sin^2(\theta) \sum_k w_k (y_k - q)^2 +
$$ $$
2 \sin(\theta) \cos(\theta) \sum_k w_k (x_k - p) (y_k - q)
= \mbox{minimum}(p,q,\theta)
$$
Let us solve just one part of the puzzle, namely: how the points $(p,q)$
must be selected in such a way that a minimum may be reached with respect to
this choice. For certain parts of the above expression this would mean that:
$$
\sum_{k} w_k (x_k - p)^2 = \mbox{minimum} \EN
\sum_{k} w_k (y_k - q)^2 = \mbox{minimum}
$$
We have already seen that minimal values are reached if second order momenta
are described with respect to the midpoint of the points cloud as their origin.
In our case:
$$
p = \sum_k w_k x_k \EN q = \sum_k w_k y_k
$$
Define second order momenta with respect to the midpoint as usual:
$$
\sigma_{xx} = \sum_k w_k (x_k - p)^2 \EN
\sigma_{yy} = \sum_k w_k (y_k - q)^2
$$ $$
\sigma_{xy} = \sum_k w_k (x_k - p) (y_k - q)
$$
We can concentrate now on minimalization with respect to the angle $\theta$:
$$
\cos^2(\theta) \sigma_{xx} + \sin^2(\theta) \sigma_{yy}
+ 2 \sin(\theta) \cos(\theta) \sigma_{xy} = \mbox{minimum}(\theta)
$$
Extreme values may be found by differentiation to the independent variable:
$$
- 2 \sin(\theta) \cos(\theta) \sigma_{xx} + 2 \cos(\theta) \sin(\theta) \sigma_{yy}
+ 2 \cos^2(\theta) \sigma_{xy} - 2 \sin^2(\theta) \sigma_{xy} = 0
$$
Which leads to the familiar equation:
$$
\sin(2 \theta) (\sigma_{xx} - \sigma_{yy}) - 2\,\cos(2 \theta) \sigma_{xy} = 0
$$
The above expression is also recognized as the one which caused the "cross
correlation moment" $\sigma_{xy}$ to become zero in a rotated coordinate system.
It is an expression in two times the angle of the straight line with
the x-axis. The meaning of it being that, if $\theta$ is a solution, then also
$\theta + \pi/2$ must be a solution. Meaning in turn that besides the straight
line itself also the line perpendicular to it is a solution. However,
this is quite sensible because only extrema are found by differentiation
and putting the outcome to zero. Indeed, one finds a minimum for one value of
the angle $\theta$ and a maximum for the perpendicular angle. The wanted line
(minimum) is a best fit to the points and the (unwanted) perpendicular line is
a worst fit to the points. Both lines go through the midpoint or gravitational
center of the points cloud.
It must be concluded herefrom that the two straight lines through the midpoint
of the cloud, the one with the best fit as well as the one with the worst fit,
together form an orthogonal coordinate system which is the same as the standard
coordinate system, consisting of the main axes of inertia of the points
cloud.