Notes

Short posts, notes, and ideas.

ARC-AGI-3 shows there is still a huge gap between frontier models and humans on agentic intelligence

16 May, 2026

GPT-5.5 and Opus 4.7 scored below 1% while human baseline is 100% on ARC-AGI-3, where models need to explore and learn in the novel game-style environments.
Incomparable SWE-bench Pro Scores

27 Apr, 2026

Claude Opus 4.7 and GPT-5.5 SWE-bench Pro scores are useful directional signals, but their reported margins are not directly comparable.
Make Yourself an Outlier

12 Apr, 2026

The only way to avoid being replaced by AI is to make yourself an outlier. LLMs and agents can easily make you averagely good, but they won't make you excellent. Because excellence is, by definition, an outlier.
About Scaling of Sleep-time Compute for Agents

7 Apr, 2026

You probably often hear scaling law, pre-training scaling, post-training scaling or test-time scaling, but you may not often hear sleep-time scaling. It's worth a few minutes to talk about it and how it would scale on the volume of context/memory, latency, and proactiveness.
Human-Written AGENTS.md Still Wins

2 Mar, 2026

LLM-generated repo context files can be redundant and even hurt coding-agent performance, while human-written ones are more likely to close the real context gap.
Context Learning Is Still Harder Than It Looks

10 Feb, 2026

CL-bench shows that frontier models still struggle to genuinely learn and apply novel knowledge that was not part of pre-training.
Semantic Layer as Context Infrastructure for Agents

28 Jan, 2026

A semantic layer can turn scattered business definitions and local knowledge into explicit context that both humans and agents can use reliably.
The Missing Layer in Agent Systems

13 Jan, 2026

Explicit context is only part of the problem. The harder part is capturing the reasoning that links information, decisions, and actions.

Notes

ARC-AGI-3 shows there is still a huge gap between frontier models and humans on agentic intelligence

Incomparable SWE-bench Pro Scores

Make Yourself an Outlier

About Scaling of Sleep-time Compute for Agents

Human-Written AGENTS.md Still Wins

Context Learning Is Still Harder Than It Looks

Semantic Layer as Context Infrastructure for Agents

The Missing Layer in Agent Systems