Revision as of 18:43, 1 April 2024

Rule-based systems follows a set of pre-defined rules defined by experts to cover all scenarios to automate the decision making process. This is not sufficient for complex systems. Machine learning (ML) builds models to identify and predict patterns, make decisions, and automate processes.

Flow

Unstructured: Images, sentences

Unstructured data is not understood by machines without some algorithms to process it. Structured data is machine-readable.

First, preprocessing (such as data cleaning and sampling) is done to make the data useful.

Then, exploratory data analysis (EDA), such as data visualization, allows us to understand the data and determine what types of algorithm to employ.

Thirdly, feature selection selects the important feature to reduce overfitting and improve accuracy of the model.

Constructing ML Models

The training dataset is the data used for training a model, containing a set of observations composed of a set of features. In a regression/classification model, we gain a hypothesis function $y=g(x)$ such that it predicts the target variable y.

The test dataset is the data used to test the performance of the trained model.

@@ Line 8: / Line 8: @@
 First, ''preprocessing'' (such as data cleaning and sampling) is done to make the data useful.
-Then, ''exploratory data analysis (EDA)'' allows us to understand the data and determine what types of algorithm to employ.
+Then, ''exploratory data analysis (EDA),'' such as data visualization, allows us to understand the data and determine what types of algorithm to employ.
+Thirdly, ''feature selection'' selects the important feature to reduce overfitting and improve accuracy of the model.
+= Constructing ML Models =
+The '''training dataset''' is the data used for training a model, containing a set of observations composed of a set of features. In a regression/classification model, we gain a hypothesis function <math>y = g(x)</math> such that it predicts the target variable ''y''.
+The '''test dataset''' is the data used to test the performance of the trained model.
 [[Category:Computer Science]]

Anonymous

Search

Machine Learning: Difference between revisions

Namespaces

More

Page actions

Revision as of 18:43, 1 April 2024

Flow

Constructing ML Models

Navigation

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Machine Learning: Difference between revisions

Revision as of 18:43, 1 April 2024

Flow

Constructing ML Models

Navigation

Wiki tools

Page tools

Categories