Machine learning, as Arthur Samuel put it, is a field of study that gives computers the ability to learn without being explicitly programmed. This means that whilst in classical programming you need to provide the computer with rules and data and it will give you the answer, in machine learning, you can provide the computer with data and the answer and it will work out the rules. This can be explained using simple addition. To do the sum 2+3 in classical programming, you would give the computer the data i.e the numbers 2 and 3 and you would give it the rules i.e you have to add the numbers together and the computer would produce a result of 5. However, in machine learning, you can give the computer the data i.e the numbers 2 and 3 and you can give it the answer i.e 5 and it can work out the rule (provided that it has been given enough data points for it to be able to accurately say that you add the numbers each time).
An example of machine learning is classification software. For example, if you are trying to get a computer to accurately distinguish between pictures of cats and dogs, you would give it a large, varied data set of pictures of both dogs and cats and you would tell it which pictures contain dogs and which pictures contain cats. The hope is that, eventually, you would be able to give it a new picture and it would accurately be able to tell whether it is a picture of a cat or a dog by associating certain features with each group. For example, the algorithm might identify that cats have a specific ear shape or that dogs paws are of a specific colour. However, this is also the reason why we must provide the computer with a varied set. If we only had pictures of black dogs or only ginger cats in the data set, the computer would classify a black cat as a dog. When distinguishing between cats and dogs this doesn’t seem like too big of a deal. However, if we were using this algorithm to recognise people, and we don’t have enough variety in the data set i.e not enough females or people of colour, then the data set would be biased towards these groups of people, and this could have fatal consequences.
Now that I have explained what machine learning is and how it is different from classical programming, I will explain the different types of machine learning and how they work. First, there is supervised learning. The aim of supervised learning is to predict the target variable given predictor variables. An example of this is a machine learning algorithm which predicts house prices (the target variable) based on location, square footage and condition of the house (the predictor variables). In this instance, the machine learning algorithm would be given a data set with many different houses and it would get told their locations, their square footage and conditions amongst other things. It would also get told the prices of these houses and based on this information it would predict the price for a new house. The two sub-categories of supervised learning are classification and regression. In classification, the target variable has categories (like dog vs cat), whilst in regression, the target variable is continuous (like house prices).
The second type of machine learning is unsupervised learning. The difference between supervised and unsupervised learning is that supervised learning uses a labelled data set whilst unsupervised learning uses an unlabelled data set. The aim of unsupervised learning is to discover underlying patterns in the data. So, in unsupervised learning, an algorithm might be given images of cats and dogs and be told to separate the images into 2 groups. However, the algorithm will not add labels to the output.
The third type of machine learning is reinforcement learning. The aim of reinforcement learning is to learn a series of actions. In reinforcement learning, the machine is not given a predefined data set and thus the machine has to collect its own data. For example, if a machine learning algorithm were to distinguish between cats and dogs using reinforcement learning, it would do so using feedback. So, if it were given an image of a dog and it classified it as a cat, we would give negative feedback so that, eventually, the algorithm would be able to correctly classify the images.
There are platforms on which you can set up end to end machine learning training pipelines. For simpler use cases, you can also use out of box algorithms like IBM Watson and Google Vision AI. These require input data and output data, however, you don’t need to code the backend of the algorithm yourself. This is an easier way to ‘code’ an algorithm and then you can give it new data and see if it can classify the data.
This is one of the most innovative and interesting fields at the moment.