OpenClaw, Codex, and the Billing Trap *Again*

I thought I had already learned this lesson.

I had written it down once: do not put OPENAI_API_KEY in a global shell environment when using Codex over VS Code Remote SSH. ChatGPT billing and OpenAI API billing are separate systems. If Codex sees an API key, it may use API billing instead.

That mistake cost real money. Fortunately I always put hard caps on accounts that can spend money. In my first pass with openclaw, it hit my $20 monthly cap in 2 days. In my second attempt it hit a $20 cap in 3 days. This is about that, and why I think my third attempt will be better.

Here’s how I tried to clean up OpenClaw *again* and stepped into the same class of problem from a different direction.


The Starting Point

The goal was simple.

I wanted OpenClaw running on a headless Ubuntu machine (NATHAN) to handle basic personal automation:

  • Watch for important email
  • Send WhatsApp alerts
  • Eventually help with calendar and local workflows

I had tried setting this up 2 months ago but costs exploded and I didn’t have time to debug so I uninstalled OpenClaw. This time I thought I could control costs because OpenClaw didn’t need a frontier model. It needed a cheap one:

gpt-4.1-mini

That instinct was correct. I also made a new OpenAI API project just for OpenClaw and gave it it’s own private key.


The Intended Architecture

The mail-alert system had already evolved into something much cleaner:

systemd timer
  -> direct Gmail IMAP search
  -> YAML rules
  -> deterministic prefilter
  -> optional classifier
  -> WhatsApp alert
  -> metadata-only state

This design was intentionally boring:

  • Only recent mail
  • Cheap string checks first
  • AI used sparingly

The old push-based system was retired for a reason:

Gmail push
  -> Pub/Sub
  -> Tailscale Funnel
  -> watcher
  -> OpenClaw gateway
  -> agent session

Too clever. Too expensive. Too opaque.


The New Cost Spike

Then the billing report showed something unexpected. $7 spent in a day. I went to the api usage states on Openai and looked at the usage costs per project key

A single day of usage looked like:

  • 105 requests
  • ~8.8 million input tokens
  • ~42k output tokens
  • ~$4.34 total
  • Dominated by:
gpt-5.4

That was the problem.

OpenClaw was supposed to use:

gpt-4.1-mini

So where were the expensive calls coming from? I will own my part of this. Although I restricted OpenClaw to only use the gpt-4.1-mini model, I did NOT restrict the project itself to only allow that key — which I should have. OpenAI API allows project level model restrictions, but not API Key level restrictions. I went and looked how I could lock down the key to 4.1 but you can’t do that (yet?). So I just assumed you couldn’t do it at the project level… Read in the ASS U ME joke.


The Investigation Got Weird

We checked everything:

  • Mail alert timers
  • Gateway logs
  • Old Gmail hook
  • Local session files

Nothing explained the spike.

The uncomfortable truth:

The API key showed who paid, but not who made the call.

That’s a bad boundary.


Why Splitting Projects Isn’t Enough

Creating a separate OpenAI project helps with billing isolation.

But it still leaves a gap:

You know the key was used
You don’t know which component used it

If everything shares one key, attribution is still guesswork.


The Real Fix: A Guardrail Proxy

The stronger solution was to create my own local proxy:

OpenClaw
  -> local OpenAI-compatible proxy
  -> OpenAI API

Now every request goes through a controlled choke point.

The proxy:

  • Enforces model allowlists
  • Logs metadata
  • Tracks which component made the call

Current components:

  • gateway
  • classifier
  • cli

Captured metadata includes:

  • requested model vs effective model
  • token counts
  • latency
  • status codes

No raw prompts are stored by default.


The Security Upgrade

Before:

OpenClaw gateway → real OpenAI API key

After:

OpenClaw gateway → local token → proxy → real key

The real key now lives only in:

~/.config/openclaw/openai-guardrail.env

That’s a much cleaner boundary.


Model-Level Guardrails

The OpenAI project now enforces model limits, because I finally figured that out.

Allowed:

gpt-4.1-mini

Blocked:

gpt-5.4

If something tries to use a disallowed model, it fails locally:

model_not_allowed

before reaching OpenAI.

Protection is now layered:

  1. Local proxy allowlist
  2. OpenAI project restrictions
  3. Separate billing project

Then Codex Broke Again

While using Codex in vscode to help investigate these billing issues and build the proxy, I suddenly hit an out-of-tokens error.

That shouldn’t happen in ChatGPT mode, which is the way I use codex in vscode.

Checking:

~/.codex/auth.json

revealed the problem:

Codex had somehow changed the auth file and switched to API-key auth and was using the new OpenClaw key, and in vscode it was calling with expensive the 5.4 model

That explained everything.


What Likely Happened

During setup:

  • A new API key was created
  • It existed temporarily in files and shell environments
  • Codex tooling saw it
  • Codex silently switched to API mode

The key lesson:

If a tool can see an API key, assume it might use it.


The Fixes We Put In Place

1. No Global API Keys

Checked and removed from:

  • .bashrc
  • .profile
  • .zshrc

Only allowed location:

~/.config/openclaw/openai-guardrail.env

2. Codex Wrapper

Created:

~/bin/codex

It:

  • Unsets OPENAI_API_KEY
  • Verifies auth mode
  • Refuses to run if unsafe

3. Auth Check Tool

~/bin/codex-auth-check

Outputs:

{
  "auth_mode": "chatgpt",
  "has_OPENAI_API_KEY": false
}

4. Bash Warning

On login:

confirmed codex in chatgpt mode

Or loudly warns if not.


5. Systemd Watcher

Monitors:

~/.codex/auth.json

Logs changes and triggers checks.


Current Architecture

OpenClaw

OpenClaw components
  -> local proxy
  -> OpenAI project
  -> gpt-4.1-mini only

Codex

Codex
  -> ChatGPT auth
  -> no API key
  -> no API billing

These are intentionally separate.


The Real Lesson

The problem wasn’t just cost.

It was identity.

Before:

Who made this request?
→ The API key

That’s not good enough.

After:

gateway made this request
classifier made this request
cli made this request

That’s observability.


What I’d Tell Myself Before Starting

  • Don’t export API keys globally
  • Don’t let editors inherit secrets
  • Don’t share keys across tools blindly
  • Don’t trust billing dashboards for debugging

Do:

  • Build a choke point
  • Log metadata
  • Separate auth modes
  • Check ~/.codex/auth.json regularly

And most importantly:

If a system can spend money automatically, “working” is not the same as “safe.”


References



Comments

Leave a Reply

Your email address will not be published. Required fields are marked *