Correlation and explained variation

Recall the the correlation coeffficient $r$ is always between $-1$ and $1$.

Write $SSE=\sum (y-\hat y)^2$ (sum of squared errors, i.e.
error after regression), and $SSTOT=\sum
(y-\bar y)^2$ (total sum of squares, i.e. before regression)

{\bf Definition:} The explained variation is

$$R^2=1-\frac{SSE}{SSTOT}$$

{\bf Note:} $$R^2=1-\frac{\sum (y-\hat y)^2}{\sum (y-\bar y)^2}=\ldots=r^2$$
Details
Recall the correlation $r$, which is given by
$$
r=\frac{
          {\sum_{i=1}^{n}(x_i-\bar x)(y_i-\bar y)
          }
        }{
          \sqrt{\sum_{i=1}^{n} (x_i-\bar x)^2
           \sum_{i=1}^{n}(y_i-\bar y)^2}
        }
$$
is always between $-1$ and $1$.
The correlation is a useful concept, but one must note that
$r$ has no simple and direct interpretation other than the very vague
``measures how close the $x$ and $y$ data are to being on a straight
line''.

Consider therefore the sum of squared errors, i.e. deviations from the
straight line:
$$SSE=\sum_i (y_i-\hat y_i)^2$$

It is natural to compare this sum of squared errors
to the sum of squares which is obtained if no relationship is
assumed between $y$ and $x$. This latter, total, sum of squares is
denoted $SSTOT$ and computed with:

$$SSTOT=\sum_i (y_i-\bar y_i)^2 .$$

Note that $SSE$ is the variation which is still unexplained after a
linear relationship has been assumed, but $SSTOT$ is the variation to
begin with, i.e. the total variation in the $y$-data.
It is now reasonable to define the proportional variation which
remains unexplained,
$SSE/SSTOT$ and hence the explained variation is
$1-SSE/SSTOT$.

{\bf Definition:} The explained variation is

$$R^2=1-\frac{SSE}{SSTOT}$$

It must be noted that this is the same concept as before since
$$R^2=1-\frac{\sum (y-\hat y)^2}{\sum (y-\bar y)^2}=\ldots=r^2 .$$
We thus see that although $r$ has no simple direct interpretations,
$R^2$ has a natural interpretation and is therefore considerably more
useful.