To get into any subject, there is a lot of jargon to wade through. I love simplicity but precise terminology makes things much easier. In this post I’m going to establish basic definitions for these terms, which are necessary for deep learning:
- machine learning
- supervised learning
- cost function
- function optimization
- local minimum vs. global minimum
Everyone who reads this post has some idea of learning. What does it mean for a machine to learn?
In the machine learning context, to learn means:
to create the ability to predict from data.
The method used to predict the data, learned or otherwise, is called a model. When a machine learns a model, then machine learning has happened!
A model can be thought of as a function, a math function. If you know about functions, think of the model as a function F where F(data) = prediction.
An example of data, prediction, model:
Suppose we have a month’s worth of weather data for Redwood City California, and we want to predict weather based on this data. The data is that in 30 days, it rained 3 days. So it rained 10% of the time.
Here is a very simple model: based on these 30 days of data, the probability of rain is 10%. This is a pretty low probability, so I predict no rain for all days. In my function language: the learning process based on historical data gave me a function F:
Learning Process (30 days of data: 27 no rain, 3 rain) = F = no rain
To elaborate on the function F:
F(any time) = no rain
If the weather continues to be the same, my model, the “no rain” model, has an accuracy of 90%. Impressive, isn’t it?
Despite the “no rain” model’s high accuracy (especially in times of drought), can we point out other limitations of this model? We will come back to this question. It is really a good chunk of the work of a scientist – deciding what to do with your model after you’ve computed it. This model might be obviously bad – but many models suffer from exactly the problem that this model does. Care must be taken!
Notice that very little learning took place to create this model. One calculation, the ratio of rainy days to all days, is used as the basis of all predictions. We can maybe say that the “learning” method used in this weather example is just extrapolation, or trending. And the computer didn’t learn it, we did. We decided the formula to use to predict the weather.
How do we get a machine to do the learning? How do we get a computer to build a model?
We give the machine some scaffolding, and program the machine to compute the details – the “walls and floors” of the model.
Since the machine is a computer and computers work with numbers, the scaffolding is a particular class of function, and the computer fits a “best” function of that particular type that matches our observed data.
The method used in almost all of machine learning is this: Estimate the cost of a mistake. Formulate it with variable parameters – use a template formula for your model function that has variable parameters. Minimize that cost with respect to those parameters – find the value of those parameters that gives you the least cost; find the “best” value of those parameters. That gives you the “best” function.
We will discuss this in more detail in another post.