I’m a Engineer / Architect / Builder. For more details about me, see About.
This is a reboot for writing my notes and thoughts online after several years of pause.
Latest Notes
-
Frontier Model Benchmarks Need Maintenance, Audits, and Rolling Updates
HLE shows why frontier model scores need to be read alongside audits, verified answer sets, and maintenance, especially when questions sit at the edge of expert knowledge.
-
ARC-AGI-3 shows there is still a huge gap between frontier models and humans on agentic intelligence
GPT-5.5 and Opus 4.7 scored below 1% while human baseline is 100% on ARC-AGI-3, where models need to explore and learn in the novel game-style environments.
-
Incomparable SWE-bench Pro Scores
Claude Opus 4.7 and GPT-5.5 SWE-bench Pro scores are useful directional signals, but their reported margins are not directly comparable.
-
Make Yourself an Outlier
The only way to avoid being replaced by AI is to make yourself an outlier. LLMs and agents can easily make you averagely good, but they won't make you excellent. Because excellence is, by definition, an outlier.
Recent Posts
-
Beyond Answers, Below Autonomy: How Proactive AI Agents Offload Humans Without Overstepping
Proactive agents can offload humans from the glue work between insight and execution: they watch your systems, gather context, and turn signals into decision-ready options and actions. The key is staying below autonomy, because implicit context and accountability still sit with humans.
-
Using BERT to perform Topic Tag Prediction for Technical Articles
Updated:Experiments using BERT Mini embeddings and linear SVM for multilabel tag prediction on LinkedInfo articles.
-
A Walk Through of the IEEE-CIS Fraud Detection Challenge
Walkthrough of the IEEE-CIS fraud detection challenge with feature analysis and model experiments.
-
Skin Lesion Image Classification with Deep Convolutional Neural Networks
Updated:Deep CNN experiments (DenseNet/ResNet) on HAM10000 for skin lesion image classification.