building multi-tenant agents

We open-sourced the architecture behind noticed's multi-tenant agent harness. Here's why and how it works.

Yesterday I ran a workshop on how we built the agent infrastructure behind noticed. This post is the written version. If you want to skip straight to the code, the repo is noticed-claw on GitHub — the name is a nod to OpenClaw, the single-user harness it started from. We kept the shape, tore out the assumptions that only work for one person, and rebuilt the tenant-aware pieces from scratch. Hence: noticed's take on OpenClaw. noticed-claw.

one model, one prompt, one agent — that only works for chatbots

Most agent harnesses are built for one user running one agent. Claude Code reads from local disk. OpenClaw stores state in a single session. Codex ties compaction to one thread. That's fine when there's one person.

noticed is a personal networking agent. Every person who signs up gets their own agent — one that knows their network, remembers their conversations, and acts on their behalf across GitHub, LinkedIn, email, and calendar. That means multi-tenancy. And multi-tenancy breaks every assumption baked into single-user harnesses.

what breaks

When you go from one user to many, every implicit component becomes a problem:

Tenant isolation. User A's memories and conversations must be invisible to user B. No leaks.
Session identity. The same person talks to their agent on iMessage, Telegram, and Slack. The agent needs to track all of it without losing context when the platform changes.
Concurrent webhooks. Telegram sends duplicate webhooks. Slack retries on timeout. Without thread-level locking, the agent responds twice — or responds to the wrong user.
Proactive behavior at scale. One user's heartbeat cron is a setInterval. A thousand users' heartbeats are a shared automation runner that respects each tenant's timezone and active hours.

None of this is exotic. But none of it is solved by default in the tools people reach for when building agents.

what we mean by "harness"

The model generates text. The harness decides everything else: how context gets loaded, what survives compaction, how memory is stored and recalled, which tools are available and under what rules.

Claude Code, OpenClaw, Codex, Deep Agents — these are all harnesses. They just aren't multi-tenant ones.

You can swap models without rebuilding the harness. You can't swap tenancy models without rebuilding it from scratch. That's why the harness is the thing that matters.

eight subsystems

We broke our harness into eight subsystems. Each one solves a specific problem that surfaces when you go from one user to many. This is the architecture behind noticed-claw, a compact version of noticed's production agent-core.

Eight subsystems · one orchestration core

Identity — prompt-builder.ts, brand-voice.ts, persona-catalog.ts. Defines who the agent is for each workspace. Persona, voice, behavioral constraints. Two people using noticed get agents that feel different because they are different.

Memory — memory-manager.ts, memory-extract.ts, memory-flush.ts. Extracts facts, preferences, and relationship signals from conversations. Stores them per tenant. Recall pulls from that person's memory only.

Context — workspace-files.ts, session-awareness.ts, mission-engine.ts. Loads the right workspace data before each turn. Mission, goals, recent activity, network state. Changes from person to person.

Compaction — compaction.ts, conversation-search.ts. Conversations get long. Context windows don't. Compaction summarizes and archives older turns so the agent stays coherent without losing what it learned three weeks ago.

Tools — tools/registry.ts, tools/capability-registry.ts, llm-runner.ts. The registry controls which tools exist. The policy layer controls who gets access to what. Not every workspace has the same integrations.

Sessions — agent-router.ts, session-manager.ts, thread-queue.ts. Routes messages and manages threads across platforms. Keeps conversations coherent when someone moves from Telegram to Slack mid-thread.

Automation — heartbeat.ts, cron.ts, tools/cron-tool.ts. Scheduled tasks, heartbeats, proactive behavior. Your agent should surface a warm intro or remind you about a follow-up without waiting for you to ask.

Evals — src/eval/. Measures whether agents are actually improving. Without structured evaluation you're shipping vibes. This is the subsystem most people skip and regret later.

orchestration ties it together

The orchestration layer in agent-turn.ts coordinates the full lifecycle of every agent turn: acquire a thread lock, resolve context, build tools, call the model, handle compaction, write to memory, release the lock.

Thread-level locking matters more than people expect. Two webhooks for the same conversation arrive within milliseconds. You need exactly one agent response. The orchestration layer handles that.

The whole system uses Postgres Row Level Security for tenant-scoped data. OpenClaw didn't work for us because you can't run multiple agents in the same system. noticed-claw is our interpretation of an OpenClaw-like harness that solves the multi-tenant problem.

one turn, end to end

Maria is a noticed user. She's on Telegram talking to her agent; her Slack heartbeat cron is scheduled to fire in the same minute. Here's what actually happens:

Telegram webhook arrives. agent-router.ts classifies the inbound message, resolves Maria's tenant from the Telegram chat id, and hands off to orchestration.
Heartbeat fires on Slack. Two seconds later, the automation scheduler enqueues a heartbeat turn for Maria's Slack workspace. Now there are two pending turns for the same tenant on different platforms.
Thread lock. Orchestration acquires a Postgres row-level lock on Maria's Telegram thread. The Slack heartbeat lives on a different thread, so it grabs a separate lock and runs concurrently.
Context build (Telegram turn). Identity resolves Maria's persona and brand voice. Memory pulls her recent facts and preferences. Context loads workspace state — active missions, recent commits in her network, pending intros. Tools registers only the integrations Maria has connected.
Model call. The LLM generates a response referencing a new person Maria mentioned. Compaction inspects the turn, decides it's short enough, and skips summarization.
Memory writes. The extract pass pulls one new fact ("Maria introduced Tom to Lucy last week") and one preference ("prefers intros routed via email, not Slack"). Both are scoped to Maria's tenant via RLS before they ever land in the database.
Lock release. Telegram turn finishes and releases the lock. The Slack heartbeat — which built its own lighter context and decided there was nothing worth surfacing — completes a moment later and releases its lock too.
Duplicate webhook. Telegram retries the original webhook five seconds later, as it sometimes does. Orchestration sees the recent dedup key, returns HTTP 200, and drops the retry on the floor.

Eight steps. One user, two platforms, two concurrent turns, zero leaks into another tenant.

what's still hacky

Nothing here is finished. The places we're most uncomfortable:

Cross-platform identity resolution. If Maria reaches out from a new Telegram account we haven't seen, we can't always tell it's her. We match on email, phone, and OIDC claims where we have them, but the fallback — asking the user to verify — is not free, and the matching itself is heuristic.
Postgres RLS at scale. Row-level security is the right primitive for tenant isolation, but every query pays for the extra policy evaluation. We haven't hit the wall yet, but we know there's a wall and we haven't modelled where it is.
Compaction quality. Summarization is lossy by definition. The subtle relationship signals ("Tom sounded hesitant when I mentioned Lucy") are exactly the ones that compress worst. Memory extraction rescues some of this, but we're still losing information.
Heartbeats vs. quiet hours. Automation respects each tenant's active hours, but we don't yet have a clean way for the agent to decide, in the moment, that a particular heartbeat would be unwelcome. Right now that's a config decision, not an inference.
Mission verification for agent-created goals. Developer-defined missions have hard checkpoints we verify against real data. Goals the agent creates on its own are unverified — fine for lightweight prompts, shakier once the agent starts acting on them.

None of this blocks the harness from working today. All of it is on the list.

why we open-sourced it

I ran this workshop because when we started building noticed, we wanted a reference for how a real multi-tenant agent system works in production. It didn't exist. So we wrote one.

The repo includes the full workshop walkthrough, the subsystem breakdown, and a local demo you can run to test workspace customization, memory persistence, persona switching, cron jobs, and cross-session awareness.

This is a reference architecture for how we think about the multi-tenant agent harness problem. Take what's useful, ignore what's not, and if you see something we got wrong, open an issue.

noticed-claw on GitHub.

If you want to see what this architecture powers in practice, join the waitlist at noticed.so.