Machine Learning

From Rice Wiki

Rule-based systems follows a set of pre-defined rules defined by experts to cover all scenarios to automate the decision making process. This is not sufficient for complex systems. Machine learning (ML) builds models to identify and predict patterns, make decisions, and automate processes.

Flow

Unstructured: Images, sentences

Unstructured data is not understood by machines without some algorithms to process it. Structured data is machine-readable.

First, preprocessing (such as data cleaning and sampling) is done to make the data useful.

Then, Exploratory data analysis (EDA), such as data visualization, allows us to understand the data and determine what types of algorithm to employ.

Thirdly, feature selection selects the important feature to reduce overfitting and improve accuracy of the model.

Constructing ML Models

A dataset is used to train ML models.

The training dataset is the data used for training a model, containing a set of observations composed of a set of features. In a regression/classification model, we gain a hypothesis function such that it predicts the target variable y.

The test dataset is the data used to test the performance of the trained model.

Classifications

Machine learning models is classified by the way they learn and the output they produce.

Supervised learning is the process by which a model's output is checked to see if it matches the expectation in a known dataset, in the hopes that it will produce a good approximation for an unknown dataset.

Unsupervised learning is the process by which a model attempt to find clusters/patterns. It is used by Amazon to find trends.

Classification produces a categorical output/label

Regression predicts a value.

Clustering finds rules within a dataset.