OpenClaw, Codex, and the Billing Trap *Again*

I thought I had already learned this lesson.

I had written it down once: do not put OPENAI_API_KEY in a global shell environment when using Codex over VS Code Remote SSH. ChatGPT billing and OpenAI API billing are separate systems. If Codex sees an API key, it may use API billing instead.

That mistake cost real money. Fortunately I always put hard caps on accounts that can spend money. In my first pass with openclaw, it hit my $20 monthly cap in 2 days. In my second attempt it hit a $20 cap in 3 days. This is about that, and why I think my third attempt will be better.

Here’s how I tried to clean up OpenClaw *again* and stepped into the same class of problem from a different direction.

The Starting Point

The goal was simple.

I wanted OpenClaw running on a headless Ubuntu machine (NATHAN) to handle basic personal automation:

Watch for important email
Send WhatsApp alerts
Eventually help with calendar and local workflows

I had tried setting this up 2 months ago but costs exploded and I didn’t have time to debug so I uninstalled OpenClaw. This time I thought I could control costs because OpenClaw didn’t need a frontier model. It needed a cheap one:

gpt-4.1-mini

That instinct was correct. I also made a new OpenAI API project just for OpenClaw and gave it it’s own private key.

The Intended Architecture

The mail-alert system had already evolved into something much cleaner:

systemd timer
  -> direct Gmail IMAP search
  -> YAML rules
  -> deterministic prefilter
  -> optional classifier
  -> WhatsApp alert
  -> metadata-only state

This design was intentionally boring:

Only recent mail
Cheap string checks first
AI used sparingly

The old push-based system was retired for a reason:

Gmail push
  -> Pub/Sub
  -> Tailscale Funnel
  -> watcher
  -> OpenClaw gateway
  -> agent session

Too clever. Too expensive. Too opaque.

The New Cost Spike

Then the billing report showed something unexpected. $7 spent in a day. I went to the api usage states on Openai and looked at the usage costs per project key

A single day of usage looked like:

105 requests
~8.8 million input tokens
~42k output tokens
~$4.34 total
Dominated by:

gpt-5.4

That was the problem.

OpenClaw was supposed to use:

gpt-4.1-mini

So where were the expensive calls coming from? I will own my part of this. Although I restricted OpenClaw to only use the gpt-4.1-mini model, I did NOT restrict the project itself to only allow that key — which I should have. OpenAI API allows project level model restrictions, but not API Key level restrictions. I went and looked how I could lock down the key to 4.1 but you can’t do that (yet?). So I just assumed you couldn’t do it at the project level… Read in the ASS U ME joke.

The Investigation Got Weird

We checked everything:

Mail alert timers
Gateway logs
Old Gmail hook
Local session files

Nothing explained the spike.

The uncomfortable truth:

The API key showed who paid, but not who made the call.

That’s a bad boundary.

Why Splitting Projects Isn’t Enough

Creating a separate OpenAI project helps with billing isolation.

But it still leaves a gap:

You know the key was used
You don’t know which component used it

If everything shares one key, attribution is still guesswork.

The Real Fix: A Guardrail Proxy

The stronger solution was to create my own local proxy:

OpenClaw
  -> local OpenAI-compatible proxy
  -> OpenAI API

Now every request goes through a controlled choke point.

The proxy:

Enforces model allowlists
Logs metadata
Tracks which component made the call

Current components:

gateway
classifier
cli

Captured metadata includes:

requested model vs effective model
token counts
latency
status codes

No raw prompts are stored by default.

The Security Upgrade

Before:

OpenClaw gateway → real OpenAI API key

After:

OpenClaw gateway → local token → proxy → real key

The real key now lives only in:

~/.config/openclaw/openai-guardrail.env

That’s a much cleaner boundary.

Model-Level Guardrails

The OpenAI project now enforces model limits, because I finally figured that out.

Allowed:

gpt-4.1-mini

Blocked:

gpt-5.4

If something tries to use a disallowed model, it fails locally:

model_not_allowed

before reaching OpenAI.

Protection is now layered:

Local proxy allowlist
OpenAI project restrictions
Separate billing project

Then Codex Broke Again

While using Codex in vscode to help investigate these billing issues and build the proxy, I suddenly hit an out-of-tokens error.

That shouldn’t happen in ChatGPT mode, which is the way I use codex in vscode.

Checking:

~/.codex/auth.json

revealed the problem:

Codex had somehow changed the auth file and switched to API-key auth and was using the new OpenClaw key, and in vscode it was calling with expensive the 5.4 model

That explained everything.

What Likely Happened

During setup:

A new API key was created
It existed temporarily in files and shell environments
Codex tooling saw it
Codex silently switched to API mode

The key lesson:

If a tool can see an API key, assume it might use it.

The Fixes We Put In Place

1. No Global API Keys

Checked and removed from:

.bashrc
.profile
.zshrc

Only allowed location:

~/.config/openclaw/openai-guardrail.env

2. Codex Wrapper

Created:

~/bin/codex

It:

Unsets OPENAI_API_KEY
Verifies auth mode
Refuses to run if unsafe

3. Auth Check Tool

~/bin/codex-auth-check

Outputs:

{
  "auth_mode": "chatgpt",
  "has_OPENAI_API_KEY": false
}

4. Bash Warning

On login:

confirmed codex in chatgpt mode

Or loudly warns if not.

5. Systemd Watcher

Monitors:

~/.codex/auth.json

Logs changes and triggers checks.

Current Architecture

OpenClaw

OpenClaw components
  -> local proxy
  -> OpenAI project
  -> gpt-4.1-mini only

Codex

Codex
  -> ChatGPT auth
  -> no API key
  -> no API billing

These are intentionally separate.

The Real Lesson

The problem wasn’t just cost.

It was identity.