Instead of using static position increments (+1) per token, RoPE-based language models can learn per-token and per-layer position increments. This has no detectable effect on model performance but allows us to see what the model thinks the distance is between each position and how this varies per-layer.
I’m working on an experiment comparing the internal representations of two architectures when solving a sequential algorithm, but training models to use a sequential algorithm is surprisingly hard. The optimization landscape makes it easier for models to learn parallel algorithms or memorize lookup tables, so I needed to make some specific architectural and training decisions to get models to actually learn the sequential algorithm. Even with all of these tricks, the results are seed-dependent and I needed to inspect the resulting models to prove that they did or didn’t learn the expected algorithm.
In this post, I’ll document what did and didn’t work, and the techniques I used to prove whether or not the model learned a sequential algorithm.
I was inspired by Turntrout to optimize my website more, and two changes took page load times from “fast enough” to “effectively instant: Switching from CloudFront to Bunny CDN to optimize CDN cache misses, and efficiently navigating with Micromorph.
I’ve been working on two fairly large vibe-coded apps, and my process has converged on:
Write a GitHub issue
(If complicated enough) tell an agent to make a plan and then update the issue
Have another agent read the issue and implement it
As the features get more complicated, I spend more and more time on step (1), and I’m finding that just taking the time to write a detailed enough issue is 90% of the work (and if I have a problem, going back and writing a much more detailed issue usually fixes it). The thing I realized this morning is that writing these issues and working through the plans is very similar to participating in a system design interview: You don’t need to implement anything, but you do need to have a good high-level design, and think through all of the edge cases and tradeoffs.