Least Squares Best Fit

$ \def \EN {\quad \mbox{and} \quad} $ Suppose we have a collection of points in the plane and we want to draw a straight line through these points, in such a way that the line is a "best fit" to them. An equation for the straight line can always be set up as follows: $$ \cos(\theta) (x - p) + \sin(\theta) (y - q) = 0 $$ Here $\theta$ is the angle of the line's normal with the x-axis and $(p,q)$ is an arbitrary point on the line. The distance of an arbitrary point $\vec{r} = (x_k,y_k)$ to the line is given by the length of the projection of the point's vector $\vec{r}$ onto the normal $\vec{n}$ of the line. The latter is given by $\vec{n} = (\cos(\theta),\sin(\theta))$. Hence the length of the projection is: $$ \left| \frac{(\vec{r} \cdot \vec{n})}{(\vec{n} \cdot \vec{n})} \vec{n} \right| = \left| \cos(\theta) (x_k - p) + \sin(\theta) (y_k - q) \right| $$ For the straight line to be a "best fit", it will be required that the sum of the weighted squares of all distances shall be a minimum: $$ \sum_k w_k \left[ \cos(\theta) (x_k - p) + \sin(\theta) (y_k - q) \right]^2 = \mbox{minimum}(p,q,\theta) $$ Working out a bit: $$ \cos^2(\theta) \sum_k w_k (x_k - p)^2 + \sin^2(\theta) \sum_k w_k (y_k - q)^2 + $$ $$ 2 \sin(\theta) \cos(\theta) \sum_k w_k (x_k - p) (y_k - q) = \mbox{minimum}(p,q,\theta) $$ Let us solve just one part of the puzzle, namely: how the points $(p,q)$ must be selected in such a way that a minimum may be reached with respect to this choice. For certain parts of the above expression this would mean that: $$ \sum_{k} w_k (x_k - p)^2 = \mbox{minimum} \EN \sum_{k} w_k (y_k - q)^2 = \mbox{minimum} $$ We have already seen that minimal values are reached if second order momenta are described with respect to the midpoint of the points cloud as their origin. In our case: $$ p = \sum_k w_k x_k \EN q = \sum_k w_k y_k $$ Define second order momenta with respect to the midpoint as usual: $$ \sigma_{xx} = \sum_k w_k (x_k - p)^2 \EN \sigma_{yy} = \sum_k w_k (y_k - q)^2 $$ $$ \sigma_{xy} = \sum_k w_k (x_k - p) (y_k - q) $$ We can concentrate now on minimalization with respect to the angle $\theta$: $$ \cos^2(\theta) \sigma_{xx} + \sin^2(\theta) \sigma_{yy} + 2 \sin(\theta) \cos(\theta) \sigma_{xy} = \mbox{minimum}(\theta) $$ Extreme values may be found by differentiation to the independent variable: $$ - 2 \sin(\theta) \cos(\theta) \sigma_{xx} + 2 \cos(\theta) \sin(\theta) \sigma_{yy} + 2 \cos^2(\theta) \sigma_{xy} - 2 \sin^2(\theta) \sigma_{xy} = 0 $$ Which leads to the familiar equation: $$ \sin(2 \theta) (\sigma_{xx} - \sigma_{yy}) - 2\,\cos(2 \theta) \sigma_{xy} = 0 $$ The above expression is also recognized as the one which caused the "cross correlation moment" $\sigma_{xy}$ to become zero in a rotated coordinate system. It is an expression in two times the angle of the straight line with the x-axis. The meaning of it being that, if $\theta$ is a solution, then also $\theta + \pi/2$ must be a solution. Meaning in turn that besides the straight line itself also the line perpendicular to it is a solution. However, this is quite sensible because only extrema are found by differentiation and putting the outcome to zero. Indeed, one finds a minimum for one value of the angle $\theta$ and a maximum for the perpendicular angle. The wanted line (minimum) is a best fit to the points and the (unwanted) perpendicular line is a worst fit to the points. Both lines go through the midpoint or gravitational center of the points cloud.
It must be concluded herefrom that the two straight lines through the midpoint of the cloud, the one with the best fit as well as the one with the worst fit, together form an orthogonal coordinate system which is the same as the standard coordinate system, consisting of the main axes of inertia of the points cloud.