Outlier

From Rice Wiki

Outliers are samples in a dataset that show abnormal distance from other samples. They impact the accuracy of the model.

Detection

Outliers are detected during Exploratory data analysis. Several detection methods are listed.

  • Background knowledge such as impossible values like negative age
  • Visualization such as scatter plot
  • Data Analysis such as box plot
  • ML algorithms such as One-Class-SVM

Numerically, outliers are defined to be 1.5xIQR away from the min/max.