0%

Introduction to Machine Learning

Machine Learning

Machine Learning isn’t new; it has been around at least since the 1970’s, when the first related algorithm appeared. What has changed is that the explosion in computing power has allowed us to use machine learning to tackle ever-more-complex problems, while explosion of data being captured and stored has allowed us to apply machine learning to an ever-expanding range of domains.

The general idea behind most machine learning is that a computer learns to perform a task by studying a training set of examples. The computer (or system of distributer or embedded computers and controllers) then performs the same task with data it hasn’t encountered before.

Learning Strategies

Machine learning employs the following two strategies: Supervised Learning and Unsupervised Learning.

Supervised Learning

In supervised learning, the training set contains data and the correct output of the task with that data. This is like giving a student a set of problems and their solutions and telling that student to figure out how to solve other similar problems he or she will have to deal with in the future.

Supervised learning includes classification algorithms, which take as input a dataset and the class of each piece of data so that the computer can learn how to classify new data. For example, the input might be a set of past loan applications with an indication of which of them went bad. On the basis of this information, the computer classifies new loan applications. Classification can employ logic regression, classification trees, support vector machines, random forests, artificial neural networks (ANNs), or other algorithms.

Regression algorithms predict a value of an entity’s attribute (regression here has a wider sense than merely statistical regression). Regression algorithms include linear regression, decision trees, Bayesian networks, fuzzy classification, and ANNs.

Classification Regression
Logic regression Linear regression
Classification trees Decision trees
Support vector machines Bayesian networks
Random forests Fuzzy classification
Artificial neural networks Artificial neural networks

Unsupervised Learning

In unsupervised learning, the training set contains data but no solutions; the computer must find the solutions on its own. This is like giving a student a set of patterns and asking him or her to figure out the underlying motifs that generated the patterns.

Unsupervised learning includes clustering algorithms, which take as input a dataset covering various dimensions and partition it into clusters satisfying certain criteria. A popular algorithm is k-means clustering, which aims to partition the dataset so that each observation lies closet to the mean of its cluster. Other clustering approaches include hierarchical clustering, Gaussian mixture models, genetic algorithms (in which the computer learns the best way for a task through artificial selection), and ANNs.

Dimensionality reduction algorithms take the initial dataset covering various dimensions and project the data to fewer dimensions. These fewer dimensions try to better capture the data’s fundamental aspects. Dimensionality reduction algorithms include principal component analysis, tensor reduction, multidimensional statistics, random projection, and ANNs.

Clustering Dimension reduction
k-means clustering Principal component analysis
Hierarchical clustering Tensor decomposition
Gaussian mixture models Multidimensional statistics
Genetic algorithms Random projection
Artificial neural networks Artificial neural networks

References

  • Machine Learning by Panos Louridas and Christof Ebert (IEEE Software, vol. 33, no. 5, 2016)
  • There Is No AI Without IA by Seth Earley, IEEE Computing Edge, April 2017