Notes
Short posts, notes, and ideas.
-
Incomparable SWE-bench Pro Scores
Claude Opus 4.7 and GPT-5.5 SWE-bench Pro scores are useful directional signals, but their reported margins are not directly comparable.
-
Make Yourself an Outlier
The only way to avoid being replaced by AI is to make yourself an outlier. LLMs and agents can easily make you averagely good, but they won't make you excellent. Because excellence is, by definition, an outlier.
-
About Scaling of Sleep-time Compute for Agents
You probably often hear scaling law, pre-training scaling, post-training scaling or test-time scaling, but you may not often hear sleep-time scaling. It's worth a few minutes to talk about it and how it would scale on the volume of context/memory, latency, and proactiveness.
-
Human-Written AGENTS.md Still Wins
LLM-generated repo context files can be redundant and even hurt coding-agent performance, while human-written ones are more likely to close the real context gap.
-
Context Learning Is Still Harder Than It Looks
CL-bench shows that frontier models still struggle to genuinely learn and apply novel knowledge that was not part of pre-training.
-
Semantic Layer as Context Infrastructure for Agents
A semantic layer can turn scattered business definitions and local knowledge into explicit context that both humans and agents can use reliably.
-
The Missing Layer in Agent Systems
Explicit context is only part of the problem. The harder part is capturing the reasoning that links information, decisions, and actions.