Dataset: Difference between revisions

From Rice Wiki
(Created page with "In machine learning, a model operates on a '''dataset'''. = Attributes of a dataset = The '''completeness''' of a dataset is the extent to which it contains all relevant '''features''' necessary for a given task. A dataset needs to have a sufficient number of observations, measured by the '''size''' of the dataset. The '''validity''' of the dataset is how accurate, clean, and relevant the data in the dataset is. A dataset can be '''high dimensional''', meaning that i...")
 
Line 1: Line 1:
In machine learning, a model operates on a '''dataset'''.
In machine learning, a model operates on a '''dataset'''.


= Attributes of a dataset =
= Performance attributes =
Several attributes determine how good a dataset is for a problem.
 
The '''completeness''' of a dataset is the extent to which it contains all relevant '''features''' necessary for a given task.
The '''completeness''' of a dataset is the extent to which it contains all relevant '''features''' necessary for a given task.


Line 7: Line 9:


The '''validity''' of the dataset is how accurate, clean, and relevant the data in the dataset is.
The '''validity''' of the dataset is how accurate, clean, and relevant the data in the dataset is.
= Usage attributes =
Some attributes of the dataset determines the way we use them.


A dataset can be '''high dimensional''', meaning that it has very high number of features, which can make calculations difficult.
A dataset can be '''high dimensional''', meaning that it has very high number of features, which can make calculations difficult.
The '''linearity''' of a dataset is a big factor in determining what model to use.


[[Category:Machine Learning]]
[[Category:Machine Learning]]

Revision as of 06:29, 26 April 2024

In machine learning, a model operates on a dataset.

Performance attributes

Several attributes determine how good a dataset is for a problem.

The completeness of a dataset is the extent to which it contains all relevant features necessary for a given task.

A dataset needs to have a sufficient number of observations, measured by the size of the dataset.

The validity of the dataset is how accurate, clean, and relevant the data in the dataset is.

Usage attributes

Some attributes of the dataset determines the way we use them.

A dataset can be high dimensional, meaning that it has very high number of features, which can make calculations difficult.

The linearity of a dataset is a big factor in determining what model to use.