Type something to search...

Why Your AWS Bedrock Bill Makes No Sense (And How to Fix It)

When a startup says “our AWS bill is too high,” the conversation almost always starts at the aggregate level — total monthly spend, a few large services, maybe a spike someone noticed. That’s not where the problem lives.

For AI workloads, the cost problem lives at the unit level: what does it cost to serve one user? One feature call? One workflow completion?

Without that number, you’re optimizing in the dark.

Why Aggregate Bills Are Useless for AI Products

A traditional SaaS product running on EC2 or RDS has relatively predictable infrastructure costs. You provision servers, they run, you pay. Scaling cost is roughly linear.

AI products don’t work this way. Your cost structure changes:

  • Per-inference costs vary wildly by model (Claude Haiku vs. Opus is a 20x cost difference)
  • Token consumption varies by user behavior, not just user count
  • Cached vs. uncached requests change the economics entirely
  • Batching strategies can cut costs 50% with zero product impact

Here’s what makes this harder to track: model inference costs don’t all live in the same place in your AWS bill. If you’re using third-party models through Bedrock — Anthropic’s Claude, Meta’s Llama, Mistral — those inference charges flow through AWS Marketplace, not the Bedrock service line. In CUR, you’ll see them under a Marketplace record, not bedrock.amazonaws.com.

What does appear under the Bedrock service line: Knowledge Base storage (OpenSearch Serverless), Agents orchestration, Guardrails evaluations, custom model training, and Provisioned Throughput reservations. AWS-native models like Titan also bill through Bedrock directly.

The practical consequence: a Cost Explorer view filtered to “Amazon Bedrock” will miss the bulk of your model spend. Third-party models appear as their own service line items — separate from the Bedrock line entirely. That tells you nothing about where to optimize — and gives you false confidence that costs are low.

Building Unit Cost Visibility from AWS CUR

AWS Cost and Usage Reports (CUR) are the foundation. The key is knowing where to look and what to do with the data after you have it.

For third-party model inference (Claude, Llama, Mistral, etc.), filter CUR by:

product/ProductCode = marketplace
product/ProductName = [your model provider's listing name]

For Bedrock-native costs (Knowledge Bases, Agents, Guardrails, Provisioned Throughput):

product/ProductCode = AmazonBedrock

Both record types include lineItem/UsageType, pricing/unit, and resource tags. The token-level fields you need — input_tokens, output_tokens, model_id — are present in both, but you have to query across both sources to get the complete picture.

The data you need to extract per invocation:

model_invocations
  ├── model_id (which model was called)
  ├── input_tokens
  ├── output_tokens
  ├── timestamp
  └── [your tags: feature, user_segment, environment]

Tagging your Bedrock calls by feature and user segment is the unlock. Without tags, you have costs but no attribution. With tags, you can answer: “Feature X costs $0.04 per completion, Feature Y costs $0.18.”

The Right-Sizing Question

Once you have unit costs, the obvious question is: does this feature need this model?

Most teams default to the most capable model for everything. That’s expensive and usually unnecessary. A summarization task that doesn’t require nuanced reasoning doesn’t need a frontier model — it needs a model that’s accurate enough and fast enough at a fraction of the cost.

The framework:

  1. Identify what the feature actually needs to do
  2. Test the cheapest model that meets the bar
  3. Measure quality — not just cost — against the baseline
  4. Ship the cheaper model if quality holds

This alone typically reduces inference costs 30–50% on the features where it applies.

What Good Looks Like

A mature AI FinOps setup for a mid-size product gives you:

  • Cost per feature, updated daily
  • Token volume by user segment
  • Model spend breakdown with trend lines
  • Budget alerts before you’ve already overspent
  • A clear path from “we have a cost problem” to “this is the specific thing driving it”

This requires some instrumentation on top of AWS CUR data you already have, with tagging discipline applied to your Bedrock calls.

Conclusion

Getting clarity on Bedrock costs is fundamentally a visibility problem. Once you’ve wired up CUR tagging and built unit-level metrics, the right-sizing decisions become straightforward — you’re no longer guessing at which model or feature is driving cost, the data tells you. The tools are already there in your AWS account. The work is connecting them.


If your Bedrock bill is a mystery, that’s solvable. Book a call and we’ll walk through what visibility would look like for your specific setup.

Related Posts

Connect Claude Code to Live AWS Tools with the Agent Toolkit

AI coding agents are getting remarkably capable — but they have a blind spot. The models powering them were trained on data that's months or years old. When you ask your agent about Amazon S3 Tables,

read more

AWS Bedrock Cost Structure: What You're Actually Paying For

AWS Bedrock looks simple from the outside — call an API, get a response, pay per token. The reality is that a production Bedrock setup has several distinct cost layers, and they behave very differentl

read more

AWS Bedrock vs SageMaker: How to Pick the Right One

If you're building an AI product on AWS, you'll hit this question early: Bedrock or SageMaker? The short answer is that they solve different problems, and most startups only need one. What Each Se

read more

Deploying Engineering Resource Management Knowledge Graph on AWS

Resource planning in engineering orgs is a multi-hop problem. The data is there — skills, project history, availability — it's just stored in flat tables that you need to join on demand. This post wal

read more