Search This Blog

Thursday, April 26, 2018

What is Learning?

To put it in simple words, learning is the process by which an entity gains the ability to predict outcomes in an unknown domain by experiencing or observing patterns in a known domain of data. Learning means to study the features of a given sample of data and come to an inductive generalization or extrapolation of such characteristics for the entire population of which the sample is merely a part. For example, you observe that the sun rises in the east and sets in the west today and tomorrow and the days after that every day for a year. By the end of the year, if you predict that the sun will rise again in the east and set in the west—correctly so—then you have learnt.

To an empirical scientist, this is pretty much what is known as "fitting the data", but with the added caveat that such fit should correctly generalize. For example, suppose that the scientist collects data of the heights and weights of a random sample of people and tries to fit the data in order to hypothesize the law that relates the heights and weights of people. A hypothesis (the red curve) might look as follows.

Fig.1 Linear Regression from a sample of people

If this hypothesis is correct, then the height of a person grows linearly with the weight of that person. Clearly, from experience, this is not true. One cannot grow indefinitely tall. The sample of people over whom our scientist tried to learn is not a faithful representation of the entire population. Perhaps the actual response function (the blue curve) which our scientist tried to target is something as follows.

Fig.2 In-sample data not able to generalize to out-of-sample behaviour
Therefore, learning has not occurred in this example.

Allow me to abstract out what we formally mean by learning from the above examples. Heuristically, we understand that in order for any learning to take place, we need two important ingredients.
  1. There is a large data set of which a sample is given to us to study.
  2. There is a pattern in the data. In other words, there is an unknown target function which we seek to discover from the given data.
Having said that, it is not a great leap to write down the formal definition. We define learning as the collection of structures described below.
  1. A population $\mathcal X$, containing inputs $x\in \mathcal X$, along with a probability distribution $\mathcal P(\mathcal X)$.
  2. A target space $\mathcal Y$ containing the outputs $y \in \mathcal Y$.
  3. A target function $f: \mathcal X \to \mathcal Y$, taking $x\mapsto y=f(x)$, to be discovered.
  4. A data set $(S, f(S))$, called the training examples or data, containing the sample space $S \subseteq \mathcal X$, accessible to us, and the desired outputs $f(S)$. It is a finite set $\{(x_1,y_1), \dots, (x_N,y_N)\}$ where $y_1=f(x_1), \dots, y_N=f(x_N)$.
  5. A set of hypotheses $\mathcal H(\mathcal X, \mathcal Y)$, which is a subset of all functions from $\mathcal X$ to $\mathcal Y$, describing a particular model of the characteristics of the population.
  6. A learning algorithm $\mathcal A: \mathcal H \mapsto g \in \mathcal H$ which picks a hypothesis $g$ as the best possible description of $f$, namely that $g \approx f$.
Fig.3 The Learning Diagram (Source: http://work.caltech.edu/slides/slides02.pdf)

Let us digest the above definition for now. For an example illustrating the workings of this definition, please visit my complimentary post on a linear classification example. In the next post, we will discuss what it means for $g$ to be approximately equal to $f$ and if that is possible after all.

UPCOMING: Is Learning Feasible?

No comments:

Post a Comment

Any thoughts you'd like to share?