Leading From the Frontlines Means Tightening the Contracts

Most AI writing is still too high-level.

The real work is lower in the stack: execution contexts, environment boundaries, operator surfaces, fallback policy, idempotency, and diagnostics that explain which layer is broken instead of just saying "failed."

Over the last few sessions I kept seeing the same pattern in different forms:

  • a UI action that looked available but did not expose enough state for an operator to trust it
  • a CLI that appeared healthy until it inherited a non-interactive stdin and blew up inside a TUI library
  • an "AI system" problem that turned out to be a configuration-contract problem
  • a toolchain that needed stricter policy at the edges more than it needed a smarter model in the middle

This is what building and leading from the frontlines feels like right now: less "prompt wizardry," more contract engineering.

1. A button is not an operator surface

One of the more useful fixes recently was on an ad-review-and-push workflow. The anti-pattern was familiar: the product had a push button, but the operator could not tell whether the external destination was actually configured, whether the downstream ads would be created safely, or what exact state each job was in after the action.

The important change was not "add an integration." The important change was to expose the actual state model all the way through the UI.

At the action boundary, the system now blocks if the destination is not known:

const resolvedAdAccountId =
  adAccountId ?? import.meta.env.VITE_FACEBOOK_AD_ACCOUNT_ID ?? "unknown";
const destinationKnown = resolvedAdAccountId !== "unknown";
const disabled = approvedNotPushedCount === 0 || !destinationKnown;

return (
  <>
    <Button
      variant="primary"
      disabled={disabled}
      onClick={() => setOpen(true)}
    >
      {destinationKnown
        ? `Push to Facebook (${approvedNotPushedCount})`
        : "Meta destination missing"}
    </Button>
    {!destinationKnown && approvedNotPushedCount > 0 ? (
      <a href="/diagnostics">Open diagnostics</a>
    ) : null}
  </>
);

That is a small interface decision, but it changes the trust model completely. Instead of pretending the system is ready and letting the user discover failure later, it turns missing configuration into an explicit blocked state.

The confirmation step also got more honest:

<p>
  {destinationKnown
    ? "Safety mode: campaign, ad sets, and Meta ads are created paused."
    : "Meta destination is unknown in the frontend environment. Run diagnostics and set VITE_FACEBOOK_AD_ACCOUNT_ID before creating external ads."}
</p>

Again, the interesting part here is not the integration itself. It is the contract the UI establishes with the human operator.

If a product claims to safely create external ads, the interface should make all of the following legible:

  • whether the destination is known
  • whether the action is blocked
  • whether the action is idempotent
  • whether created entities are paused or live
  • whether a retry is safe
  • where to look next when the workflow is not ready

Without that, the model may be "working," but the product is still lying.

2. The right fix often lives one layer earlier than people think

Another recent issue looked, at first, like some vague CLI instability. It was not. The application rendered, but when launched without a real TTY on stdin it eventually crashed inside prompt_toolkit.

The durable fix was to reject the invalid execution context before the TUI stack came online:

def run(self):
    """Run the interactive CLI loop with persistent input at bottom."""
    if not sys.stdin.isatty():
        print(
            "Error: hermes chat requires an interactive terminal on stdin.\n"
            "Run `hermes chat` from a terminal, or use single-query mode for automation.",
            file=sys.stderr,
        )
        _run_cleanup()
        raise SystemExit(1)

That fixed the main crash path, but there was a second-order problem too: setup logic could relaunch the chat process while preserving the same bad stdin context. So the setup path needed its own guard:

def _offer_launch_chat():
    """Prompt the user to jump straight into chat after setup."""
    print()
    if not is_interactive_stdin():
        print_warning("Skipping chat launch because stdin is not an interactive terminal.")
        print_info("Run `hermes chat` from a terminal when setup finishes.")
        return

    if not prompt_yes_no("Launch hermes chat now?", True):
        return

    from hermes_cli.relaunch import relaunch
    relaunch(["chat"])

The regression test is the real tell that this was the correct fix:

class _NonTTY(io.StringIO):
    def isatty(self):
        return False

def test_run_rejects_non_tty_stdin_before_prompt_toolkit(monkeypatch, capsys):
    import cli as cli_mod

    shell = object.__new__(cli_mod.HermesCLI)
    monkeypatch.setattr(sys, "stdin", _NonTTY())
    monkeypatch.setattr(cli_mod, "_run_cleanup", lambda: None)

    with pytest.raises(SystemExit) as exc:
        shell.run()

    assert exc.value.code == 1
    err = capsys.readouterr().err
    assert "requires an interactive terminal on stdin" in err

This is the kind of problem that often gets misclassified as "AI reliability." But the underlying bug had nothing to do with the model. It was a missing precondition check.

That has become one of my stronger operating heuristics lately:

When an AI tool looks flaky, first ask whether the system validated its execution context early enough.

3. Good diagnostics separate layers instead of flattening them

Another major lesson has been around diagnostics.

Shallow health checks are still too common. People verify that a token exists, that an endpoint returns 200, or that a deploy completed, and they call the system ready. That is not enough once AI systems touch real workflows.

The more useful diagnostic design split the system into explicit layers:

  1. Frontend build-time configuration
  2. Runtime / edge-function configuration
  3. External-system relationship checks
  4. Worker liveness and storage dependencies

The frontend diagnostic client starts with local facts:

function getFrontendChecks(): DiagnosticCheck[] {
  return [
    {
      group: "Frontend",
      name: "Supabase URL",
      status: import.meta.env.VITE_SUPABASE_URL ? "ok" : "failed",
      detail: import.meta.env.VITE_SUPABASE_URL
        ? redactedUrl(import.meta.env.VITE_SUPABASE_URL)
        : "VITE_SUPABASE_URL is missing.",
    },
    {
      group: "Meta",
      name: "Visible ad account",
      status: import.meta.env.VITE_FACEBOOK_AD_ACCOUNT_ID ? "ok" : "warning",
      detail: import.meta.env.VITE_FACEBOOK_AD_ACCOUNT_ID
        ? `Target shown as ${import.meta.env.VITE_FACEBOOK_AD_ACCOUNT_ID}.`
        : "VITE_FACEBOOK_AD_ACCOUNT_ID is absent, so push confirmation must stay blocked until server diagnostics prove Meta is configured.",
    },
  ];
}

And the authenticated edge diagnostics check server-side dependencies without leaking secrets:

const checks: Check[] = [
  envCheck("Supabase", "Worker URL", "WORKER_URL"),
  envCheck("Webflow", "API key", "WEBFLOW_API_KEY"),
  envCheck("Meta", "Access token", "FACEBOOK_ACCESS_TOKEN"),
  envCheck("Meta", "Ad account", "FACEBOOK_AD_ACCOUNT_ID"),
  envCheck("Meta", "Page ID", "FACEBOOK_PAGE_ID"),
  localOnlyCheck(),
  ...(await storageChecks()),
  await workerHealthCheck(),
];

The envCheck helper itself is intentionally boring:

function envCheck(group: Check["group"], name: string, key: string): Check {
  const value = Deno.env.get(key);
  return {
    group,
    name,
    status: value ? "ok" : "failed",
    detail: value
      ? `${key} is configured. Value is redacted.`
      : `${key} is missing.`,
  };
}

That "boring" design decision matters. It lets you expose readiness without training operators to depend on secrets or raw credential surfaces.

The other useful check was one that catches local-only configuration bleeding into the wrong environment:

function localOnlyCheck(): Check {
  const allowlist = Deno.env.get("PDF_PRIVATE_HOST_ALLOWLIST") ?? "";
  const hasKong = allowlist.split(",").map((v) => v.trim()).includes("kong");
  const env = Deno.env.get("APP_ENV") ?? "";
  return {
    group: "Worker",
    name: "Private host allowlist",
    status: hasKong && env === "production"
      ? "failed"
      : hasKong
      ? "warning"
      : "ok",
    detail: hasKong
      ? "PDF_PRIVATE_HOST_ALLOWLIST includes local-only host kong. This is acceptable locally only."
      : "No local-only private host allowlist detected.",
  };
}

That is exactly the sort of thing production systems need more of: not just "is it up," but "did a local assumption leak across an environment boundary?"

4. The real product is the state machine, not the happy path

One of the reasons these interfaces got better is that the underlying workflow started treating job state as the core product surface.

On the review page, the useful numbers were not vanity counts. They were operationally meaningful buckets:

const { succeeded, failed, approved, pushed, approvedNotPushed } = useMemo(
  () => ({
    succeeded: jobs.filter((job) => job.status === "succeeded").length,
    failed: jobs.filter((job) => job.status === "failed").length,
    approved: jobs.filter((job) => job.review_status === "approved").length,
    pushed: jobs.filter((job) => !!job.facebook_ad_id).length,
    approvedNotPushed: jobs.filter(
      (job) =>
        job.status === "succeeded" &&
        job.review_status === "approved" &&
        !job.facebook_ad_id,
    ).length,
  }),
  [jobs],
);

Even more important was refusing to let users approve placeholder outputs before the worker had actually rendered them:

if (previous.status !== "succeeded") {
  setMutationError(
    "Wait for the render to finish before approving or rejecting.",
  );
  return;
}

That is another tiny rule with outsized impact. It prevents the state machine from drifting into nonsense.

I keep seeing this everywhere: most reliability gains do not come from making the model more magical. They come from reducing illegal states and making legal states obvious.

5. Policy hardening matters as much as model quality

I also spent time tightening agent-system policy around execution and permissions. The safest version of these tools is not the one that hopes for good behavior. It is the one that narrows the allowed surface by default.

A representative hardened config block looked like this:

{
  "permissionMode": "approve-reads",
  "nonInteractivePermissions": "fail",
  "bundledDiscovery": "allowlist",
  "exec": {
    "security": "allowlist",
    "ask": "on-miss",
    "strictInlineEval": true
  },
  "fs": {
    "workspaceOnly": true
  }
}

The interesting part here is not one particular setting. It is the posture:

  • reads are easier than writes
  • non-interactive contexts get stricter treatment
  • tool execution is allowlisted instead of assumed-safe
  • filesystem access is scoped to the workspace by default

That is a much healthier baseline for AI systems than "give the agent broad authority and hope the prompt is good."

6. What X is getting right, and where it still tends to blur

One of the more useful themes I keep seeing on X is the shift away from thinking of AI as "just chat" and toward thinking in terms of models, apps, harnesses, evals, and feedback loops. That is directionally correct.

But even that conversation sometimes remains too abstract. Once you are inside live systems, the questions get sharper:

  • Which exact environment variable is missing?
  • Which retry is idempotent and which one is dangerous?
  • Which state transition is illegal?
  • Which runtime inherited the wrong stdin?
  • Which config belongs to one toolchain versus another?
  • Which health check proves a service is merely reachable, versus actually ready?

Those are the questions that compound.

7. My working thesis right now

AI leadership from the frontlines is not mostly about having the best macro take.

It is about getting close enough to the machine state that you can tighten the contracts for everyone else:

  • interface contracts between product and operator
  • execution contracts between runtime and tool
  • environment contracts between local, staging, and production
  • safety contracts between agent power and human review
  • retry contracts between failure and side effect

The people who win in this phase will not just have better models.

They will have systems that can clearly answer:

  • what happened
  • where it happened
  • why it happened
  • whether it is safe to retry
  • which layer owns the fix
  • what the operator should do next

That is what I mean by leading from the frontlines right now.

Not talking about AI from a distance.

Getting close enough to the interfaces that other people can actually trust the machine.


From The Font Lines

Creating this super simple, almost retro blog, as I miss the days when writing was easy, and the stakes were lower. There is beauty in the simplicity of reading musings that are not made or manufactured for the purpose of getting views or producing content. 

Some context, I have been in tech for over 10 years and have worked with some incredible companies over that time. From top Y Combinator startups to public companies. It's been a blast, and I love what I do. What is it that I do, you ask? Simply put, I run a company called OffDeck. I believe in building companies that last and have great fundamentals. We are not VC-backed, and I have no desire for that, at least anytime soon. You can learn more about OffDeck separately, and I am sure I will be talking about it a lot here. 

Some other things about myself: I live in Cincinnati, Ohio, and I remain deeply integrated into the mainstream tech ecosystem. I was lucky enough to live in both San Francisco and New York, which was a fantastic experience. I have a great passion for New York Tech and love the ecosystem. The New York energy and vibe are incredible. 

Right now, my days are filled with product work and building OffDeck. We are very close to launching, and I am very excited to share more on that over the next several weeks. 

Why am I calling this blog From The Front Lines? 
I am an operator at heart who happens to love tech. I sit on the front lines helping founders and CEOs daily. The most privileged time I have is with founders; it's an honor. The privilege affords me the ability to see the reality behind the growth, vanity metrics, and dozens of podcast appearances. I get to listen to founders when they are flying high, when the fuel levels get low, when the winds come, when an engine fails, and when they are just not sure if they can fly the plane. That is why this blog is called from the front lines. I sit at the front lines with the founders I serve and the friends I have made. 

Also, the concept of front lines is I am on the field. I still work every day in tech and operate in the weeds in systems. I am not directing from an arm's length or teaching. This gives me real perspectives and experience that others simply don't have. 

Well, this is it for this post.