AI Safety Research

Benchmarks, evaluations, and empirical studies that probe how models behave in the gray zone.

By raxIT Labs

August 24, 2025

GrayZoneBench

An open-source safety benchmark for the prompts where the right answer is neither a refusal nor full compliance. The gray zone, where real deployment decisions happen.

AI Safety Research

AI Safety

Benchmark

LLM Evaluation

By raxIT Labs

May 12, 2026

nla-audit

The first credible LLM explainability primitive we have seen in three years, wired into a prompt-engineering loop. Reads what an open-weight model is computing at each token, so you can stop when output and thought disagree.

AI Safety Research

LLM Explainability

Mechanistic Interpretability

Prompt Engineering