HELM

From Rice Wiki

HELM (Holistic Evaluation of Language Models) is a benchmark for evaluating LLMs for dangers to the user. It checks LLM in many scenarios with many metrics for ethical concerns. Its goal is to be a standardized and holistic language model benchmark.