• As we may think
  • Posts
  • The $25 prompt, and optimizing agentic coding for desktop and web

The $25 prompt, and optimizing agentic coding for desktop and web

Coding agents are having a moment.

The model gets most of the credit, but the experience lives in the details.

The craft is everything that makes the agent feel dependable—smooth streaming, clear progress, reliable tools, and safe defaults.

That’s the context for this post: a recent Memex Web release where we shipped a bunch of behind-the-scenes improvements that make the agent feel faster, clearer, and more reliable.

How we got here: we started desktop-first

Memex started as a desktop-native coding agent. That choice wasn’t about nostalgia—it was pragmatic. On desktop, you can do real work with fewer compromises: local files, a local terminal, local compute, and a tight feedback loop.

But desktop isn’t “easy mode.” Shipping a real agent means a lot of invisible integration work:

  • Windows / macOS / Linux support

  • Intel + ARM support

  • a one-click install where dependencies just work

  • cross-platform compiled bits that behave consistently everywhere

Getting an agent to feel solid across that matrix is its own kind of craftsmanship—turning “works on my machine” into “works on every machine.”

Then we took the agent to the web (a different kind of hard)

The web flips the problem. The environment isn’t local anymore—we have to provide it.

That means we’re responsible for things the desktop gets “for free”:

  • where code runs (a managed sandbox, not your laptop)

  • security isolation and permissions

  • long-lived streaming connections (terminal output, tool progress)

  • publishing pipelines (so shared apps are real, not screenshots)

  • performance under real network conditions, across browsers and screen sizes

The real challenge: one agent that works well everywhere

Here’s the part that’s harder than shipping “desktop” or “web” in isolation: We’re building one coding agent that behaves consistently across all of those environments.

Same core agent. Same expectations. Same “this feels solid” experience—whether you’re on desktop, in a browser tab, or on a phone.

This is also where the gap between POC and product shows up. You can prototype an agentic app builder quickly. But shipping it as a product means getting the details right, consistently: responsive streaming, legible progress, reliable tool execution, safe defaults, predictable cost, and gracefully handling the edge cases that only appear when real users bring real data.

Below are five changes from this release that are representative of that craft layer. None of them are flashy. All of them matter.

1) Streaming that actually streams

One place the web adds real complexity is streaming. Every extra hop (databases, usage tracking, polling) becomes another opportunity for jitter to leak into the experience.

If the agent is “streaming,” it should feel like streaming—not like a stuttery sequence of half-pauses.

This is one of those places where the implementation details show up immediately in UX. A 50–150ms delay per chunk doesn’t sound dramatic until it happens dozens of times in a single response. Then the whole product feels sluggish, even if the model is fast.

We found a classic mistake: we were doing a bunch of network work while streaming. Things like:

  • saving conversation state (database writes + retries)

  • checking for interrupts (“did the user hit stop?”)

  • recording usage / metering

Individually, each one is reasonable. In aggregate, they add jitter.

What we changed: we treated the streaming loop as sacred.

  • push non-critical work into background tasks

  • replace inline interrupt polling with fast in-memory checks (event-based)

  • keep the streaming path focused on one job: ship the next chunk

The principle is simple: anything that touches the network will jitter eventually. The craft is keeping that jitter from leaking into the user experience—so the agent feels smooth regardless of where it’s running.

2) Progress you can trust (even mid-tool)

Another place the web changes the game is tool visibility. When the agent is operating inside a managed runtime, the UI has to do more work to keep the user oriented and confident.

When a coding agent is doing real work, it isn’t just “writing.” It’s:

  • editing files

  • running terminal commands

  • calling tools

  • producing outputs incrementally

And during the long moments, users want one thing: “What’s happening right now?”

Previously, we usually fell back to generic status text like “Working…” or “Executing terminal command…”—especially during file edits and terminal operations. It’s not wrong, but it’s not helpful.

This is how it used to look…

The tricky part is that tool calls often arrive as partial JSON fragments while streaming. You might get half a filepath long before you get something a normal JSON parser accepts. If you wait for “perfectly valid JSON,” you end up waiting too long to show anything meaningful.

What we changed: we leaned into partial information—carefully.

  • backend attempts speculative / partial parsing as the stream arrives

  • as soon as we can infer something stable (“editing x”), we emit it

  • frontend uses a small state machine (thinking vs terminal vs file edit) so the UI shows the right header at the right time

  • large tool outputs render progressively to avoid UI jank

This is a trust multiplier. When people can see what’s happening, they can stop it, correct it early, or just relax and let it run.

…and this is how it looks today.

3) Terminal streaming & publishing without the “proxy tax”

Long-lived streams are a weird fit for a managed runtime if you route them through the wrong layer.

Terminal streaming via SSE can be open for minutes (or longer). If you proxy that stream through a service that has to remain alive, you end up paying—and scaling—for an “idle waiting room.”

We were doing exactly that: proxying terminal output through infrastructure that needed to stay up for the duration of the connection.

What we changed: we connect the browser directly to the sandbox stream.

  • authenticate once

  • mint a read-only token

  • return an HTTP redirect so the browser streams direct-to-sandbox

  • enforce role-based access (VIEWER vs ADMIN) at the route level

This does two important things:

  1. removes a hop (lower latency)

  2. removes the “keep a proxy alive” cost/scaling constraint

Bonus: publishing got faster (and safer)

Publishing is where “coding agent” turns into “real product.” On the web, publishing means we’re packaging and running something in the managed environment for other people to access.

Two changes helped a lot:

  • move dependency installs to build time (so caching actually helps, and cold starts hurt less)

  • let users specify which uploads and secrets are included, instead of implicitly shipping “everything”

That translates to shorter cold starts, smaller deployments, and fewer accidental exposures.

4) Micro-latency wins that compound

When you’re building one agent that runs everywhere, a lot of the pain shows up as small overhead repeated at scale. Tiny delays that are invisible in a single tool call become obvious across a full session.

One tool call doesn’t matter. Ten tool calls kinda matter. Fifty tool calls absolutely matter.

Agent workflows tend to repeat a lot of overhead that looks harmless in isolation:

  • validate a session

  • create a client

  • poll for completion

  • read a file to compute a diff

  • read it again to write history

  • sprinkle in sleeps “just to be safe”

This is how you get “it feels slow” even though nothing is catastrophically broken.

What we changed: we removed a bunch of accidental N+1s and unnecessary waiting.

  • reuse validated contexts instead of recreating them

  • remove sleeps when the protocol already tells us completion status

  • pass already-read file content through the pipeline to avoid rereads

  • fix a reliability edge case: propagate “terminal expired” correctly instead of masking it

This category is pure craft: shaving friction users feel, without sacrificing correctness or safety.

5) The $25 prompt (and why “view file” can be dangerous)

This last one is a reminder that consistency across environments isn’t only about speed—it’s also about safe defaults. On a local machine, “opening a file” feels harmless. In an agent, “opening a file” can turn into context, and context can turn into spend—unless you’re careful.

This one started as a support message from a user. Here’s the story.

A user bought $25 in credits and burned through all of them in a single tool call… just by viewing a file.

Here’s the kicker. The file didn’t even have many lines. It wasn’t a 200-page report. The agent didn’t run for a long time. It just opened a file that happened to be a SQL dump with a huge single-line INSERT.

If you’ve built tool-augmented LLM systems, you can probably guess what happened:

  • long context is enabled

  • the “read file” tool returns a lot of content

  • an odd file shape turns into unbounded tokens

  • everything works as expected … technically. But the user payed a lot of money for no result.

That’s not user error. That’s a product bug.

What we changed: we added guardrails so “view file” can’t surprise you.

  • cap extremely long lines (e.g., 2,000 characters per line)

  • cap total returned content (e.g., ~30k characters)

  • make truncation explicit in the UI so it’s obvious what happened

  • if you truly need everything, force an intentional next step instead of silently doing the expensive thing

The principle is simple: defaults should fail safe, not fail expensive.

The through-line: craft is what makes an agent trustworthy

We hope this was an interesting behind-the-curtain look at what it takes to build a single AI coding agent that operates well across desktop, the web, and mobile—where runtimes, constraints, and expectations shift, but the experience still has to feel coherent.

We take craft seriously because we think that’s where a useful product actually starts—and where it earns trust over time: smooth streaming, progress you can follow, tool runs that don’t drag, and defaults that keep you safe from surprises.

If you try the new release and anything feels off (or delightfully smoother), we’d love to hear it. We’re excited for you to put it through real work.

Happy building,