AI Safety Research
Benchmarks, evaluations, and empirical studies that probe how models behave in the gray zone.
View GrayZoneBench

By raxIT Labs
GrayZoneBench
An open-source safety benchmark for the prompts where the right answer is neither a refusal nor full compliance. The gray zone, where real deployment decisions happen.
AI Safety Research
AI Safety
Benchmark
LLM Evaluation
View nla-audit
By raxIT Labs
nla-audit
The first credible LLM explainability primitive we have seen in three years, wired into a prompt-engineering loop. Reads what an open-weight model is computing at each token, so you can stop when output and thought disagree.
AI Safety Research
LLM Explainability
Mechanistic Interpretability
Prompt Engineering