Why Your AWS Bedrock Bill Makes No Sense (And How to Fix It)

Pratik Kulkarni
FinOps , AWS
30 Mar, 2026
03 Mins read

When a startup says “our AWS bill is too high,” the conversation almost always starts at the aggregate level — total monthly spend, a few large services, maybe a spike someone noticed. That’s not where the problem lives.

For AI workloads, the cost problem lives at the unit level: what does it cost to serve one user? One feature call? One workflow completion?

Without that number, you’re optimizing in the dark.

Why Aggregate Bills Are Useless for AI Products

A traditional SaaS product running on EC2 or RDS has relatively predictable infrastructure costs. You provision servers, they run, you pay. Scaling cost is roughly linear.

AI products don’t work this way. Your cost structure changes:

Per-inference costs vary wildly by model (Claude Haiku vs. Opus is a 20x cost difference)
Token consumption varies by user behavior, not just user count
Cached vs. uncached requests change the economics entirely
Batching strategies can cut costs 50% with zero product impact

Here’s what makes this harder to track: model inference costs don’t all live in the same place in your AWS bill. If you’re using third-party models through Bedrock — Anthropic’s Claude, Meta’s Llama, Mistral — those inference charges flow through AWS Marketplace, not the Bedrock service line. In CUR, you’ll see them under a Marketplace record, not bedrock.amazonaws.com.

What does appear under the Bedrock service line: Knowledge Base storage (OpenSearch Serverless), Agents orchestration, Guardrails evaluations, custom model training, and Provisioned Throughput reservations. AWS-native models like Titan also bill through Bedrock directly.

The practical consequence: a Cost Explorer view filtered to “Amazon Bedrock” will miss the bulk of your model spend. Third-party models appear as their own service line items — separate from the Bedrock line entirely. That tells you nothing about where to optimize — and gives you false confidence that costs are low.

Building Unit Cost Visibility from AWS CUR

AWS Cost and Usage Reports (CUR) are the foundation. The key is knowing where to look and what to do with the data after you have it.

For third-party model inference (Claude, Llama, Mistral, etc.), filter CUR by:

product/ProductCode = marketplace
product/ProductName = [your model provider's listing name]

For Bedrock-native costs (Knowledge Bases, Agents, Guardrails, Provisioned Throughput):

product/ProductCode = AmazonBedrock

Both record types include lineItem/UsageType, pricing/unit, and resource tags. The token-level fields you need — input_tokens, output_tokens, model_id — are present in both, but you have to query across both sources to get the complete picture.

The data you need to extract per invocation:

model_invocations
  ├── model_id (which model was called)
  ├── input_tokens
  ├── output_tokens
  ├── timestamp
  └── [your tags: feature, user_segment, environment]

Tagging your Bedrock calls by feature and user segment is the unlock. Without tags, you have costs but no attribution. With tags, you can answer: “Feature X costs $0.04 per completion, Feature Y costs $0.18.”

The Right-Sizing Question

Once you have unit costs, the obvious question is: does this feature need this model?

Most teams default to the most capable model for everything. That’s expensive and usually unnecessary. A summarization task that doesn’t require nuanced reasoning doesn’t need a frontier model — it needs a model that’s accurate enough and fast enough at a fraction of the cost.

The framework:

Identify what the feature actually needs to do
Test the cheapest model that meets the bar
Measure quality — not just cost — against the baseline
Ship the cheaper model if quality holds

This alone typically reduces inference costs 30–50% on the features where it applies.

What Good Looks Like

A mature AI FinOps setup for a mid-size product gives you:

Cost per feature, updated daily
Token volume by user segment
Model spend breakdown with trend lines
Budget alerts before you’ve already overspent
A clear path from “we have a cost problem” to “this is the specific thing driving it”

This requires some instrumentation on top of AWS CUR data you already have, with tagging discipline applied to your Bedrock calls.

Conclusion

Getting clarity on Bedrock costs is fundamentally a visibility problem. Once you’ve wired up CUR tagging and built unit-level metrics, the right-sizing decisions become straightforward — you’re no longer guessing at which model or feature is driving cost, the data tells you. The tools are already there in your AWS account. The work is connecting them.

If your Bedrock bill is a mystery, that’s solvable. Book a call and we’ll walk through what visibility would look like for your specific setup.