Agentic Engineering

How I Build Apps with an AI That Lives on My Ubuntu Server

Alexander Gomez · 2026 · v2.1

The Numbers Are In

90%
of code at Anthropic is written by AI

30%
of Google's new code is AI-generated

25%
of YC's Winter 2025 batch had codebases 95% AI-generated

These are not predictions. These are facts. From the last 12 months.

Anthropic/Amodei Google/Pichai YC/TechCrunch

The Scale of the Shift

Shopify E-commerce, ~10% of US online retail
CEO Tobi Lutke: "Prove why AI can't do it before requesting a new hire"
Klarna Swedish fintech, "buy now pay later"
5,000 → 3,000 employees. Remaining devs got a 60% raise.
Goldman Sachs Investment bank, founded 1869
Piloting thousands of autonomous AI coders alongside 12,000 humans

9.6 → 2.4
Average PR time in days — GitHub Copilot study

The question is: how deep should we go?

Shopify/CNBC Klarna/CNBC Goldman Sachs/CNBC PR time/Opsera

Now, you might think — okay, tech companies, of course they're doing this. But it's not just tech companies anymore.

Shopify. Tobi Lutke, their CEO, sent an internal memo: before you request a new hire, you need to prove why AI can't do the job instead. That's the new bar.

Klarna went from 5,000 employees to 3,000. And the remaining developers? They got a 60% raise, because they're now doing the work of two.

And then there's Goldman Sachs. Founded in 1869. Not exactly a Silicon Valley startup. They're now piloting thousands of autonomous AI coders working alongside 12,000 human developers.

And maybe the most practical number: average pull request time dropped from 9.6 days to 2.4 days. That's not hype — that's your daily workflow getting four times faster.

So the tools are here. The question is — how deep do you go?

But Let's Be Honest

METR study: experienced devs were 19% slower with AI — but believed they were faster
Stack Overflow: only 29% trust AI accuracy (down from 40%)
66% spend MORE time fixing "almost-right" AI code

Kent Beck — Creator of TDD & Extreme Programming, Agile Manifesto signatory

"90% of my skills dropped to $0. The leverage for the remaining 10% went up 1000x. I need to recalibrate."

So which is it? Faster or slower?
It depends on how you use it.

METR study Stack Overflow 2025 Kent Beck/Substack

Now — before you think I'm some kind of AI fanboy who's going to tell you everything is amazing... let's be honest.

There's a study by METR — a respected AI evaluation research group. They gave experienced developers AI tools and measured what happened. Those developers were 19% slower. Not faster. Slower.

And here's the kicker: they believed they were 26% faster. They felt faster while being slower.

Stack Overflow's survey: only 29% trust AI accuracy. And 66% say they spend more time fixing 'almost right' AI code than they would have spent writing it themselves.

Kent Beck — the guy who invented Test-Driven Development, who signed the Agile Manifesto — he said: '90% of my skills dropped to zero. The leverage for the remaining 10% went up 1000x. I need to recalibrate.'

So which is it? Is AI making us faster or slower? The answer is: it depends on how you use it.

Vibe Coding vs. Agentic Engineering

Andrej Karpathy — OpenAI co-founder, ex-Tesla AI Director

"Fully give in to the vibes, embrace exponentials, and forget that the code even exists."

Coined "vibe coding" · Feb 2025 · Collins Word of the Year 2025

Two camps: Vibe coding (no review, prototypes) vs. Agentic engineering (AI does the work, you review and orchestrate)

I'm firmly in the second camp. And I'll show you why.

Vibe coding/Wikipedia Collins WotY 2025

What is Claude Code?

Not a chatbot. Not an IDE plugin. An autonomous engineering agent.

Lives in your terminal (or VS Code / JetBrains)
Full codebase awareness — reads, searches, understands context
Edits files, runs commands, commits to git, triggers CI/CD
Plan mode, subagents, MCP servers, hooks

Think of it as: a junior dev that never sleeps, reads every file, follows instructions exactly… and never asks for a raise.

The Three Levels

Level	How	Who Drives	Example
1. Chat	Copy-paste prompts, manual integration	You type everything	ChatGPT, Claude.ai
2. IDE	AI in your editor, inline suggestions	You + AI side by side	Cursor, Copilot
3. Agentic	AI on your server, autonomous execution	AI executes, you review	Claude Code, Codex

Level 1 is talking to a tutor. Level 2 is pair programming.
Level 3 is having a developer on your team that you manage.

Where are most of you?

I think about AI-assisted development in three levels.

Level 1: Chat. You open ChatGPT or Claude in a browser, describe your problem, get code back, copy-paste it. You are the driver.

Level 2: IDE integration. Cursor, GitHub Copilot. The AI is inside your editor, giving inline suggestions. You and the AI work side by side. This is where most of you probably are.

Level 3: Agentic. The AI is on your server. It reads the codebase, makes a plan, edits files, runs tests, commits, and deploys. You review and approve.

Level 1 is talking to a tutor. Level 2 is pair programming. Level 3 is having a developer on your team that you manage.

And by the way — that METR study? The one where developers were 19% slower? They tested Level 2. My argument is that Level 3 is a fundamentally different experience.

Quick show of hands — where are most of you right now?

How I Work

I don't vibe code — I orchestrate. Plan mode: Claude proposes, I review, THEN it executes
Claude Code on my VPS via SSH — commits → GitHub Actions → auto-deploys
CLAUDE.md as persistent memory — Claude remembers my server, my conventions, my rules
I work from anywhere — phone, laptop, any terminal

Armin Ronacher — Creator of Flask & Jinja2

"Review every line, shape the architecture, and carry responsibility."

My AI doesn't just suggest code. It ships code.

Ronacher/lucumr.pocoo.org

So what does my workflow actually look like?

First: plan mode. I describe what I want. Claude reads the codebase, proposes a step-by-step plan. I read the plan, say 'yes' or 'change this part.' Only then does it write code.

Claude Code runs on my VPS. When I commit, it triggers GitHub Actions, which auto-deploys. The cycle is: I talk to Claude, it writes code, it commits, it deploys. All from the terminal.

I have a file called CLAUDE.md — a memory file. It knows my Docker setup, my routing rules, my conventions. It persists across sessions. When I start a new conversation, Claude already knows my entire server.

Armin Ronacher — the guy who created Flask — said it best: 'Review every line, shape the architecture, and carry responsibility.' That's exactly how I work.

My AI doesn't just suggest code. It ships code.

It also doesn't complain about code reviews.

EMX

"Turn your moments into cinematic soundtracks"

Upload a photo → AI analyzes the scene → generates lyrics, music, album art, video
Powered by: Gemini (vision + composition), Kie AI (music), Kling (video)
Python/FastAPI backend + Node.js BFF + single-file HTML frontend — all in Docker
Real-time SSE events drive the progressive UI

Photo → Vision → Compose → Image + Music (parallel) → Video

LIVE DEMO

EMX

emx.firebots.cloud

Emotion Pad — built with Claude Code
Upload a photo, watch face detection move the dot
Walk through the pipeline as it happens
Album cover, music, video — all generated live

Vibe Deployer

"Describe an app → AI builds it → deployed in 60 seconds"

Chat interface → your prompt hits an n8n workflow → Claude API writes the app
WebSocket relay pushes real-time progress back to the browser
Auto-deployed to apps.firebots.cloud/{slug}

Four containers. One prompt. Deployed app.

LIVE DEMO

Vibe Deployer

vibe.firebots.cloud

Type a prompt — something fun
Watch it generate in real-time
Open the deployed app
Show it actually works

The New Developer Workflow

Multiple Claude Code sessions — one per project, all on my VPS
Background agents researching while I work on something else
Git, PRs, deployment — all from the conversation. I don't leave the terminal.
CLAUDE.md as persistent memory — Claude remembers my entire server

Addy Osmani — Director at Google Cloud AI, 14 yrs leading Chrome DevEx

Uses 2+ LLMs in parallel to cross-check approaches — calls it "model musical chairs."

The developer of 2026 is a conductor, not a typist.

Osmani/addyosmani.com

What I Learned Along the Way

Plan mode is non-negotiable
"Measure twice, cut once" applies to AI too
You must understand the code
You're the architect, not the AI
The AI makes mistakes
But so do humans, and AI is faster at fixing them
It's not magic — it's a workflow
The magic is in knowing when to use which level
It changed how I think
I think in systems and architecture, less in syntax

My Challenge to You

You're all experienced developers. You know how to code.
The question isn't whether AI can help you.
The question is: at which level are you going to engage?

Move up one level this month.
Try Claude Code for one week. Use plan mode. Stay in control.
I guarantee you won't go back.

This talk started as a vague idea at 2 AM. My agent turned it into slides, speaker notes, and a deployed website. I mostly supervised from the couch.

Agentic Engineering

The Numbers Are In

The Scale of the Shift

But Let's Be Honest

Vibe Coding vs. Agentic Engineering

What is Claude Code?

The Three Levels

How I Work

EMX

EMX

Vibe Deployer

Vibe Deployer

The New Developer Workflow

What I Learned Along the Way

My Challenge to You

Questions?