v3 · enterprise observability | SOC 2 in progress

Operationalintelligence forAI infrastructure.

CostLynx is the cost observability and governance layer for teams running LLMs in production — spend, tokens, attribution, anomalies, and savings, normalized across every provider you ship on.

API-first ingestion · No prompts or responses stored · 14-day evaluation, no card required

Spend tracked / 24h
$2,487,512
+3.8% vs 7-day avg
Inference events / sec
14,204
live · 6 providers
Avg savings detected
31.4%
across customer cohort
Anomalies / week
128 caught
$412k flagged · paused early
01 · The problem

AI spend is the fastest-growing
line item nobody can explain.

Token bills arrive monthly. Models, providers, and prompts change weekly. The result: finance can’t attribute, engineering can’t optimize, and leadership flies blind. CostLynx closes that loop.

01 / Visibility
87%

of engineering orgs running LLMs in production cannot attribute spend to a single feature, model, or team within the current billing cycle.

Internal benchmark · n=312 platform leads · Q1 2026
02 / Drift
3.4×

median AI bill overshoot vs forecast in the quarter after launching a new agent or retrieval pipeline. Without anomaly detection, drift is a monthly invoice surprise.

CostLynx customer cohort · trailing 180 days
03 / Opportunity
31%

average modeled savings available from capability-aware model optimization and pricing-provenance corrections — left on the table when teams lack a control plane.

Savings engine simulations · 9-month window
02 · The intelligence layer

One operational layer
between your apps and
your AI providers.

CostLynx sits beside your inference path — not in it — and normalizes every event into a single usage schema. From that schema we run attribution, anomaly detection, budget enforcement, and capability-aware savings recommendations. Your prompts and responses never leave your stack.
Ingest
Capture every inference event.
API-first ingestion or SDK instrumentation. Idempotent dedupe. Provider sync for OpenAI billing.
POST /v1/events 14.2k / s
sdk · python, ts, go 3 langs
openai · billing sync hourly
Normalize
One schema across every provider.
Tokens, costs, latencies, model versions, attribution tags — reconciled into a single columnar store you can query.
schema · usage.v3 42 fields
pricing provenance 4 tiers
org / project / env hierarchy
Operate
Govern, attribute, save.
Dashboards, budgets, anomaly rules, attribution, and capability-aware savings recommendations — all reviewed before any production change.
attribution & showback live
budgets · slack webhook sla 5m
savings engine v3
03 · Capabilities

What teams use day to day.

A control plane your engineers will actually open every morning — and your CFO will reference at the next board review.

Spend timeline
Every token, every model, every workload.
A single, normalized timeline of AI spend across every provider you ship on. Slice by team, environment, feature, model, or prompt template.
Anomaly engine
Catch spend drift before the invoice arrives.
Z-score detection against your own historical baseline. Delivers to Slack via webhook.
Attribution
Attribution that holds up in board review.
Org → project → env hierarchy with consistent tagging fields.
Savings engine
Capability-aware recommendations.
Same-provider and cross-provider alternatives, with pricing provenance you can audit. Never auto-applied.
Budgets
Burn-down by team, project, or env.
Hard caps, soft thresholds, and forecasted exhaustion dates — all delivered to the right owner.
Multi-provider
OpenAI · Anthropic · Google · Azure · Bedrock · Mistral.
Apples-to-apples comparison on the same operational dataset.
Audit log
Every threshold change, every override.
SOC 2-aligned event log for procurement and security.
Exports
API-first data access.
Query spend, usage, and attribution data programmatically via the v1 API.
04 · Savings engine

Cut spend without
compromising performance.

CostLynx models capability-aware alternatives against your real traffic — then shows you what you would have spent, what you would have saved, and where evaluation says quality holds. Nothing changes in production unless you ship it.

01
Replay against your actual traffic.
No synthetic prompts. We simulate alternative allocations on a representative slice of your last 30 days of usage.
02
Capability-aware, not blind cheap.
Recommendations are gated by capability checks: tool-use, structured output, context length, latency p95.
03
Pricing provenance, every time.
Estimates resolve from organisation override → billing import → public list → unavailable, in that order.
summarize-doc · production · last 30d
simulated · n=82,440 events
Current model mix
$28,140 / mo
gpt-4o100% · $0.118 / 1k
median latency812 ms
quality (judge)0.92
Recommended
$17,420 / mo
gpt-4o38% · long-context
claude-haiku-3.562% · short docs
quality delta−0.4% · within band
Estimated saving
−38.1%
05 · Multi-provider observability

Every provider, one source of truth.

No more reconciling six dashboards into a spreadsheet at month-end. Normalized cost, usage, latency, and pricing — across every model you ship on.

Provider
Ingestion
Pricing source
Models tracked
P95 latency
OOpenAI
API + billing sync
org override → billing
38
612 ms
AAnthropic
API ingestion
org override → public
12
704 ms
GGoogle · Vertex
API ingestion
org override → public
22
588 ms
AzAzure OpenAI
API + deployment sync
deployment rates
16
642 ms
BAWS Bedrock
API ingestion
CUR reconcile · daily
19
724 ms
MMistral · self-host
SDK metering
unit-cost formula
7
428 ms
06 · Observability map

Watch your AI spend flow
across every provider — in real time.

Every inference event in your platform — from the application that fired it to the provider that served it — is captured by CostLynx's observability mesh and normalized into spend, attribution, and anomaly signals. What you see below is the same data your on-call team watches at 3am.
Application layerCostLynx layerProviders
app · checkout-agent12.4k/sapp · summarize-docanomalyapp · search-rag8.1k/sapp · ticket-router3.6k/sapp · onboard-llm920/sOBSERVE · cost analyticsspend · attribution · forecastsDETECT · anomaly enginez-score · slack · webhookGOVERN · budget controlshard caps · burn-down alertsOpenAI · gpt-4o46%Anthropic · claude-3.529%Google · gemini-1.515%Azure · gpt-4o-mini7%Mistral · self-host3%
Live events / sec25,084▲ 1.8% · last 1h
Tracked workloads38 / 424 near budget threshold
p95 ingest latency38 ms▼ 6% · 7d
Saved · trailing 30d$214.8Kacross 14 active models
07 · Built for both sides of the table

Engineering ships. Finance forecasts.
One operational source.

For engineering & platform
Cost as a first-class signal next to latency and error rate.

Stop instrumenting bespoke spend metrics into Datadog. Stop building monthly “why did our OpenAI bill triple” postmortems. CostLynx gives you the same SRE-grade workflow for spend.

  • Token-level attribution by feature
  • Per-workload p95 + $ / 1k tokens
  • Anomaly rules via Slack webhook
  • Capability-aware savings simulations
  • Idempotent SDK ingestion
  • API-first — no UI lock-in
For finance & FinOps
Track AI spend with the same rigor as cloud.

Move AI off the “miscellaneous SaaS” line and into an attributable, auditable category. Showback and attribution, board-ready cost-per-feature.

  • Cost attribution by org / project / env
  • Burn-down vs monthly budget
  • Monthly burn-rate forecast
  • Savings opportunity tracking
  • API export
  • Procurement-ready audit log
08 · Governance & trust

Procurement-ready on day one.

Designed alongside the platform, security, and finance teams that have to sign off on you — not bolted on at series B.

Security posture
Metadata-only by design.
Tokens, costs, model versions, attribution tags. No prompts or responses cross the boundary unless you opt in.
TLS 1.3AES-256 at restEU residency
Identity & access
SSO, SAML, MFA, RBAC.
Granular roles for engineering, finance, and security. Audit log of every threshold change, override, and key rotation.
SAML 2.0RBAC
Compliance
SOC 2 Type II in progress.
Sub-processor list, DPA, vendor security questionnaire, and pen-test summary on request.
SOC 2 II · IPGDPR
Operations
99.95% target uptime.
Hosted on Vercel's global edge network with a status page updated within 5 minutes of any incident.
SLA · EnterpriseAudit log
09 · Pricing

Priced for the team you have today —
built for the one you’ll have next year.

Per-workspace pricing. Usage scales with events ingested, not seats. No charge for read-only stakeholders. Growth includes a 14-day free trial.

Starter
$79/ month
For teams getting their first LLM workload into production.
  • 3 projects · 2 environments
  • 500k events / month included
  • Overview & usage dashboards
  • Slack webhook anomaly alerts
  • Community support · 1 business day
Enterprise
Custom
For platform organizations with custom governance, SLAs, and data-residency needs.
  • SSO (SAML), MFA, audit log
  • Regional data residency · EU / US
  • Prompt-sampling controls & strict mode
  • Dedicated success manager & SLA
  • Custom data export & ingestion lanes
  • Security review · DPA on request
10 · FAQ

Questions we get from
platform and finance teams.

Does CostLynx sit in my inference path?
No. CostLynx is a metering and analytics layer, not a proxy. Your application calls providers directly; CostLynx ingests usage events asynchronously. Latency to your end users is unchanged.
Do you store our prompts or responses?
By default, no. We ingest metadata only — tokens, costs, model versions, attribution tags, and timing. Optional prompt sampling for enterprise is available with strict-mode redaction and per-project opt-in.
How accurate are the savings estimates?
Savings are modeled against a representative slice of your last 30 days of traffic and gated by capability checks. We surface a confidence band and the pricing provenance for every estimate. Nothing ships to production without your review.
How does ingestion scale?
The ingestion API is idempotent and dedup-keyed. We process at sustained rates above 50k events/sec per workspace, with burst tolerance on top. SDKs handle local batching and back-pressure automatically.
What does "capability-aware" actually mean?
A savings recommendation is only surfaced when the candidate model can plausibly serve the workload — tool-use, structured output, context window, and observed latency p95 are all checked against the current workload's requirements.
Can we self-host?
Self-hosted CostLynx is available on the Enterprise plan via private deployment in your cloud (AWS, GCP, Azure). The control plane and data plane both run in your VPC; we provide upgrade tooling and a managed support tier.

See your AI spend
by tomorrow morning. Literally.

A working dashboard in under an hour. No rip-and-replace. No proxy. No prompts stored.