I thought I had already learned this lesson.
I had written it down once: do not put OPENAI_API_KEY in a global shell environment when using Codex over VS Code Remote SSH. ChatGPT billing and OpenAI API billing are separate systems. If Codex sees an API key, it may use API billing instead.
That mistake cost real money. Fortunately I always put hard caps on accounts that can spend money. In my first pass with openclaw, it hit my $20 monthly cap in 2 days. In my second attempt it hit a $20 cap in 3 days. This is about that, and why I think my third attempt will be better.
Here’s how I tried to clean up OpenClaw *again* and stepped into the same class of problem from a different direction.
The Starting Point
The goal was simple.
I wanted OpenClaw running on a headless Ubuntu machine (NATHAN) to handle basic personal automation:
- Watch for important email
- Send WhatsApp alerts
- Eventually help with calendar and local workflows
I had tried setting this up 2 months ago but costs exploded and I didn’t have time to debug so I uninstalled OpenClaw. This time I thought I could control costs because OpenClaw didn’t need a frontier model. It needed a cheap one:
gpt-4.1-mini
That instinct was correct. I also made a new OpenAI API project just for OpenClaw and gave it it’s own private key.
The Intended Architecture
The mail-alert system had already evolved into something much cleaner:
systemd timer
-> direct Gmail IMAP search
-> YAML rules
-> deterministic prefilter
-> optional classifier
-> WhatsApp alert
-> metadata-only state
This design was intentionally boring:
- Only recent mail
- Cheap string checks first
- AI used sparingly
The old push-based system was retired for a reason:
Gmail push
-> Pub/Sub
-> Tailscale Funnel
-> watcher
-> OpenClaw gateway
-> agent session
Too clever. Too expensive. Too opaque.
The New Cost Spike
Then the billing report showed something unexpected. $7 spent in a day. I went to the api usage states on Openai and looked at the usage costs per project key
A single day of usage looked like:
- 105 requests
- ~8.8 million input tokens
- ~42k output tokens
- ~$4.34 total
- Dominated by:
gpt-5.4
That was the problem.
OpenClaw was supposed to use:
gpt-4.1-mini
So where were the expensive calls coming from? I will own my part of this. Although I restricted OpenClaw to only use the gpt-4.1-mini model, I did NOT restrict the project itself to only allow that key — which I should have. OpenAI API allows project level model restrictions, but not API Key level restrictions. I went and looked how I could lock down the key to 4.1 but you can’t do that (yet?). So I just assumed you couldn’t do it at the project level… Read in the ASS U ME joke.
The Investigation Got Weird
We checked everything:
- Mail alert timers
- Gateway logs
- Old Gmail hook
- Local session files
Nothing explained the spike.
The uncomfortable truth:
The API key showed who paid, but not who made the call.
That’s a bad boundary.
Why Splitting Projects Isn’t Enough
Creating a separate OpenAI project helps with billing isolation.
But it still leaves a gap:
You know the key was used
You don’t know which component used it
If everything shares one key, attribution is still guesswork.
The Real Fix: A Guardrail Proxy
The stronger solution was to create my own local proxy:
OpenClaw
-> local OpenAI-compatible proxy
-> OpenAI API
Now every request goes through a controlled choke point.
The proxy:
- Enforces model allowlists
- Logs metadata
- Tracks which component made the call
Current components:
gatewayclassifiercli
Captured metadata includes:
- requested model vs effective model
- token counts
- latency
- status codes
No raw prompts are stored by default.
The Security Upgrade
Before:
OpenClaw gateway → real OpenAI API key
After:
OpenClaw gateway → local token → proxy → real key
The real key now lives only in:
~/.config/openclaw/openai-guardrail.env
That’s a much cleaner boundary.
Model-Level Guardrails
The OpenAI project now enforces model limits, because I finally figured that out.
Allowed:
gpt-4.1-mini
Blocked:
gpt-5.4
If something tries to use a disallowed model, it fails locally:
model_not_allowed
before reaching OpenAI.
Protection is now layered:
- Local proxy allowlist
- OpenAI project restrictions
- Separate billing project
Then Codex Broke Again
While using Codex in vscode to help investigate these billing issues and build the proxy, I suddenly hit an out-of-tokens error.
That shouldn’t happen in ChatGPT mode, which is the way I use codex in vscode.
Checking:
~/.codex/auth.json
revealed the problem:
Codex had somehow changed the auth file and switched to API-key auth and was using the new OpenClaw key, and in vscode it was calling with expensive the 5.4 model
That explained everything.
What Likely Happened
During setup:
- A new API key was created
- It existed temporarily in files and shell environments
- Codex tooling saw it
- Codex silently switched to API mode
The key lesson:
If a tool can see an API key, assume it might use it.
The Fixes We Put In Place
1. No Global API Keys
Checked and removed from:
.bashrc.profile.zshrc
Only allowed location:
~/.config/openclaw/openai-guardrail.env
2. Codex Wrapper
Created:
~/bin/codex
It:
- Unsets
OPENAI_API_KEY - Verifies auth mode
- Refuses to run if unsafe
3. Auth Check Tool
~/bin/codex-auth-check
Outputs:
{
"auth_mode": "chatgpt",
"has_OPENAI_API_KEY": false
}
4. Bash Warning
On login:
confirmed codex in chatgpt mode
Or loudly warns if not.
5. Systemd Watcher
Monitors:
~/.codex/auth.json
Logs changes and triggers checks.
Current Architecture
OpenClaw
OpenClaw components
-> local proxy
-> OpenAI project
-> gpt-4.1-mini only
Codex
Codex
-> ChatGPT auth
-> no API key
-> no API billing
These are intentionally separate.
The Real Lesson
The problem wasn’t just cost.
It was identity.
Before:
Who made this request?
→ The API key
That’s not good enough.
After:
gateway made this request
classifier made this request
cli made this request
That’s observability.
What I’d Tell Myself Before Starting
- Don’t export API keys globally
- Don’t let editors inherit secrets
- Don’t share keys across tools blindly
- Don’t trust billing dashboards for debugging
Do:
- Build a choke point
- Log metadata
- Separate auth modes
- Check
~/.codex/auth.jsonregularly
And most importantly:
If a system can spend money automatically, “working” is not the same as “safe.”
References
- Original Codex/OpenClaw mistake:
https://fishdan.com/how-i-f-ed-up-the-openclaw-codex-install-again/ - IMAP mail alert architecture:
https://fishdan.com/753-2/

Leave a Reply