Cong's Notes

Cong Peng Stockholm, Sweden

I’m a Engineer / Architect / Builder. For more details about me, see About.

This is a reboot for writing my notes and thoughts online after several years of pause.

Links:

Latest Notes

The Post-Training Dilemma: Safety Alignment vs Benchmark Score

22 Jun, 2026

Anthropic has long positioned itself put the AI safety at the frontier. But their models had many concerning bad behaviours in Andon Labs's Vending Bench and Arena. The bad behaviours include making lies, coordinate price cartel, exploited other's desperate situation. It's challenging to balance the safety alignment, instruction-following, and model capability, how are the labs doing on this? If a bit less safety alignment can lead to higher benchmark scores, it gives the labs incentives to reduce or do less safety alignment, what will the labs choose to do?
Frontier Model Benchmarks Need Maintenance, Audits, and Rolling Updates

1 Jun, 2026

HLE shows why frontier model scores need to be read alongside audits, verified answer sets, and maintenance, especially when questions sit at the edge of expert knowledge.
ARC-AGI-3 shows there is still a huge gap between frontier models and humans on agentic intelligence

16 May, 2026

GPT-5.5 and Opus 4.7 scored below 1% while human baseline is 100% on ARC-AGI-3, where models need to explore and learn in the novel game-style environments.
Incomparable SWE-bench Pro Scores

27 Apr, 2026

Claude Opus 4.7 and GPT-5.5 SWE-bench Pro scores are useful directional signals, but their reported margins are not directly comparable.