Dataset: Difference between revisions
Line 16: | Line 16: | ||
The '''linearity''' of a dataset is a big factor in determining what model to use. | The '''linearity''' of a dataset is a big factor in determining what model to use. | ||
''[[Outliers]]'' are samples that show abnormal distance from other samples. They impact the accuracy of the model. | |||
The ''[[Skewness|skewness]]'' of a dataset determines the direction of the outliers. It also impacts model accuracy. | |||
[[Category:Machine Learning]] | [[Category:Machine Learning]] |
Latest revision as of 06:37, 26 April 2024
In machine learning, a model operates on a dataset.
Performance attributes
Several attributes determine how good a dataset is for a problem.
The completeness of a dataset is the extent to which it contains all relevant features necessary for a given task.
A dataset needs to have a sufficient number of observations, measured by the size of the dataset.
The validity of the dataset is how accurate, clean, and relevant the data in the dataset is.
Usage attributes
Some attributes of the dataset determines the way we use them.
A dataset can be high dimensional, meaning that it has very high number of features, which can make calculations difficult.
The linearity of a dataset is a big factor in determining what model to use.
Outliers are samples that show abnormal distance from other samples. They impact the accuracy of the model.
The skewness of a dataset determines the direction of the outliers. It also impacts model accuracy.