A single unauthenticated connection gives attackers a full shell; credential theft observed in under three minutes on honeypot servers.
Prompt quality is often judged subjectively. This project provides a repeatable simulation pipeline with transparent scoring, configurable weighting, and benchmark workflows that can be run from both ...