GPT-5.5 scores twice as high as Opus 4.7 on the Senior Engineer benchmark

Every.to (a media outlet for AI builders and operators) ran GPT-5.5 through their tests right after release — this is OpenAI's second major drop in a week, following Images 2.0.

– The senior-coding gap. On the new Senior Engineer benchmark — where the model has to rewrite raw production code the way an experienced developer would — GPT-5.5 scores 62.5 points. For comparison, Opus 4.7 gets only 33.5. There's still ground to cover before reaching human level (real engineers score around 80–90), but here's the funny part: GPT-5.5 hit its absolute peak when working from an architecture plan *written by Opus* ¯_(ツ)_/¯

– OpenAI can write again. This is their best writing model in the past year — finally clean structure and coherent logical prose.

– Routine work got more reliable. The model beats Opus 4.7 on dashboard builds, client reports, and boilerplate support responses.

– It loves structure. GPT-5.5 shines when it has a clear plan, an existing system, or a tight feedback loop. For pure vibes-based coding from scratch, PowerPoint decks, Ruby, or abstract product design — Opus 4.7 still holds the lead.

If Opus is good for broad creative strokes, GPT-5.5 is a confident mid-to-senior dev you can hand a legacy codebase to and go grab a coffee.