AI safety evaluation checklist
Safety evaluation asks how an AI system fails, how quickly the team can detect it, and what controls prevent user harm.
Failure modes
List unsafe outputs, over-reliance, automation bias, misuse, security attacks, and edge cases.
Evaluation
Run scenario tests, adversarial prompts or inputs, human review audits, and regression suites.
Response
Define severity, shutdown criteria, user notification, remediation, and post-incident review.