Machine Learning

From Rice Wiki
Revision as of 18:43, 1 April 2024 by Rice (talk | contribs)

Rule-based systems follows a set of pre-defined rules defined by experts to cover all scenarios to automate the decision making process. This is not sufficient for complex systems. Machine learning (ML) builds models to identify and predict patterns, make decisions, and automate processes.

Flow

Unstructured: Images, sentences

Unstructured data is not understood by machines without some algorithms to process it. Structured data is machine-readable.

First, preprocessing (such as data cleaning and sampling) is done to make the data useful.

Then, exploratory data analysis (EDA), such as data visualization, allows us to understand the data and determine what types of algorithm to employ.

Thirdly, feature selection selects the important feature to reduce overfitting and improve accuracy of the model.

Constructing ML Models

The training dataset is the data used for training a model, containing a set of observations composed of a set of features. In a regression/classification model, we gain a hypothesis function such that it predicts the target variable y.

The test dataset is the data used to test the performance of the trained model.