Skip to main content

AI Safety Research

Benchmarks, evaluations, and empirical studies that probe how models behave in the gray zone.

View GrayZoneBench
Featured image for GrayZoneBench

By raxIT Labs

GrayZoneBench

An open-source safety benchmark for the prompts where the right answer is neither a refusal nor full compliance. The gray zone, where real deployment decisions happen.

AI Safety Research
AI Safety
Benchmark
LLM Evaluation
View nla-audit
Featured image for nla-audit

By raxIT Labs

nla-audit

The first credible LLM explainability primitive we have seen in three years, wired into a prompt-engineering loop. Reads what an open-weight model is computing at each token, so you can stop when output and thought disagree.

AI Safety Research
LLM Explainability
Mechanistic Interpretability
Prompt Engineering