AI evaluation specialist and design engineer working at the seam between frontier models and physical systems.
As a frontier model evaluator across OpenAI, Alphabet, Hugging Face, and Microsoft, I specialise in adversarial red-teaming, failure-mode taxonomy, and rubric design for reasoning and long-form tasks. A quarter of a million completed evaluations, 98%+ approval sustained across four platforms.
As a design engineer, I work on physical vapour deposition systems - designing sputtering equipment, vacuum components, and the surrounding mechanical hardware - with self-educated fluency in plasma physics, sputter deposition, and ultra-high vacuum engineering.
The overlap - evaluating AI on physical, technical, and engineering tasks where domain fluency is uncommon - is the work I do best.
Adversarial prompting, rubric design, failure-mode analysis. Specialism in engineering and physical-reasoning evaluation where most evaluators lack domain fluency.
Mechanical design and full-lifecycle development of vacuum and PVD systems. Cathodic arc, magnetron source architecture, sputter deposition.
Two literary manuscripts in progress. Essays on AI evaluation methodology and industrial design now live on Substack.