Outlier

From Rice Wiki
Revision as of 06:48, 26 April 2024 by Rice (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Outliers are samples in a dataset that show abnormal distance from other samples. They impact the accuracy of the model.

Detection

Outliers are detected during Exploratory data analysis. Several detection methods are listed.

  • Background knowledge such as impossible values like negative age
  • Visualization such as scatter plot
  • Data Analysis such as box plot
  • ML algorithms such as One-Class-SVM

Numerically, outliers are defined to be 1.5xIQR away from the min/max.