A neutral, build-grade walkthrough of image generation on Amazon Bedrock: the three model families (Amazon Nova Canvas, Titan Image Generator, Stability AI Stable Image) compared on quality, control and price; the prompting techniques that actually move output (structure, style, negative prompts, seeds, classifier-free guidance); the editing toolkit (inpainting, outpainting, variations, background removal); the production patterns for batch generation and S3 + CloudFront delivery; real per-image economics; and the watermarking, safety and licensing facts you need before a single image ships.
You can run image models a dozen ways — a hosted API from a model lab, a GPU instance running an open checkpoint, a third-party SaaS. On AWS the centre of gravity is Amazon Bedrock, because it turns "which model" into a runtime parameter instead of an infrastructure project.
Bedrock exposes image models through the same InvokeModel surface it uses for text: you pass a JSON body, you get bytes (base64-encoded PNGs) back. There are no GPUs to provision, no checkpoints to download, no container to keep warm. The image model is a serverless endpoint behind your AWS account's IAM, billed per image generated rather than per GPU-hour rented. For most teams that is the decisive difference — the alternative, standing up your own diffusion serving stack on EC2 or SageMaker, only pays off at very high, very steady volume.
The trade you accept is the standard managed-service trade. You get the models AWS and its partners choose to host, at the versions they choose to expose, with the parameters they surface. You give up the ability to load an arbitrary community fine-tune or LoRA the day it drops on a model hub. For the overwhelming majority of commercial image work — product photography, marketing creative, thumbnails, synthetic data — the hosted models are more than sufficient, and the operational savings are large. Teams that genuinely need bespoke checkpoints or LoRA stacks self-host on SageMaker or raw EC2 with Inferentia/GPU; everyone else uses Bedrock.
The second reason AWS is a coherent home for image generation is everything downstream of the model. The images have to land somewhere (S3), be served fast and cheap to users (CloudFront), be moderated (Rekognition or Bedrock Guardrails), and sometimes be tracked for provenance (invisible watermarks). All of that is native AWS plumbing that sits naturally beside a Bedrock call. The model is one box in the diagram; the pipeline is the product.
A note on framing for the rest of this guide: "image generation" here means text-to-image and image-to-image inference for production use — generating, editing, and delivering pictures at scale — not training image models from scratch, which is a different (and far more expensive) discipline.
Bedrock's image catalogue is three families, each with a clear centre of gravity. Picking well is mostly about matching the family to the job rather than chasing a single "best" model.
The honest summary: Nova Canvas is the default for most new builds because it pairs strong prompt adherence with built-in editing and conditioning; Titan is the cheap, dependable workhorse with the broadest track record and a built-in invisible watermark; Stable Image is the one you reach for when you want photographic realism or a specific artistic register that the Amazon models render more conservatively.
What it is: Amazon's state-of-the-art image model, launched alongside the Nova family in late 2024 and the default recommendation for new image work on Bedrock in 2026. Model ID amazon.nova-canvas-v1:0.
Strengths: high prompt adherence (it follows multi-clause prompts and counts objects more reliably than older models), built-in editing operations (inpainting, outpainting, variation, background removal) exposed through the same API, and conditioning — you can guide generation with a reference image using canny-edge or segmentation conditioning to control layout while letting the model fill in style.
Controls: resolution up to 2048×2048, an explicit negativeText field, cfgScale, seed, and up to 5 images per request. Outputs carry an invisible watermark and pass through integrated content moderation.
Use it for: ecommerce product imagery, marketing creative with specific composition requirements, and any workflow that needs generate-then-edit in the same model rather than bolting an editor on afterwards.
What it is: Amazon's first-party image model, now at G1 v2 (amazon.titan-image-generator-v2:0, with v1 still available). The reliable, inexpensive option with the longest production track record on Bedrock.
Strengths: low and predictable per-image cost, solid general-purpose quality, and a feature set that covers text-to-image, image variation, inpainting, outpainting, and (in v2) image conditioning and background removal. Every Titan image ships with an invisible watermark by default — useful when provenance is a compliance requirement.
Controls: standard and premium quality tiers, resolutions up to 1408×1408, negativeText, cfgScale, seed, and up to 5 images per request. The two-tier quality model is the main cost lever: standard for drafts and high volume, premium when the output is the final asset.
Use it for: high-volume generation where unit cost dominates, internal tooling, synthetic-data generation, and any pipeline that wants a built-in watermark without extra work.
What it is: Stability AI's models on Bedrock — Stable Image Ultra (stability.stable-image-ultra-v1:0), Stable Image Core (stability.stable-image-core-v1:0), and Stable Diffusion 3.5 Large. Ultra is the top-quality tier, Core is the speed/price tier, SD3.5 sits in between with strong typography and prompt understanding.
Strengths: photographic realism, fine-grained stylistic range, and notably better in-image text rendering than older diffusion models. This is the family teams pick when "it has to look like a real photo" or "it has to nail a specific art direction" is the brief.
Controls: negative_prompt, seed, an aspect_ratio selector (1:1, 16:9, 3:2, and more) rather than raw pixel dimensions, and an output-format choice. Note the snake_case field names — the Stability request schema differs from the Amazon models' camelCase, which matters when you write the integration.
Use it for: hero marketing imagery, editorial and lifestyle visuals, stylised brand creative, and anywhere photographic fidelity or art direction is the whole point.
Quality is subjective and version-dependent; the durable differences are in control surface, watermarking, editing support and price. Those are the axes that should drive the choice.
| Family | Best at | Max resolution | Editing built in | Invisible watermark | Indicative cost / image |
|---|---|---|---|---|---|
| Nova Canvas | Prompt adherence + edit-in-place + conditioning | 2048×2048 | Inpaint, outpaint, variation, bg-removal | Yes (default) | $0.04–$0.08 |
| Titan G1 v2 | Cheap, dependable, high volume | 1408×1408 | Inpaint, outpaint, variation, bg-removal | Yes (default) | $0.008–$0.04 |
| Stable Image Ultra | Photographic realism, top quality | ~1MP (aspect-ratio based) | Via separate edit endpoints | No (by default) | $0.08–$0.12 |
| Stable Image Core | Fast, lower-cost Stability output | ~1MP (aspect-ratio based) | Via separate edit endpoints | No (by default) | $0.03–$0.04 |
| SD 3.5 Large | Typography + balanced quality/price | ~1MP (aspect-ratio based) | Via separate edit endpoints | No (by default) | $0.04–$0.07 |
There are no magic words. There is structure. A prompt is a specification, and the models reward specifications that name the right things in a sensible order, then subtract the unwanted with a negative prompt.
The reliable structure is subject → composition → lighting → medium/style → quality bar. "A ceramic coffee mug" is a subject. "A matte-white ceramic coffee mug, centred, three-quarter view, on a light oak table, soft diffused window light from the left, shallow depth of field, product photography, sharp focus, high detail" is a specification. The second prompt does not produce a better image because it is longer; it produces a better image because every clause removes a degree of freedom the model would otherwise resolve at random.
Order matters less than coverage, but front-loading the subject helps — the models weight earlier tokens slightly more. Keep clauses concrete and visual ("soft diffused window light") rather than abstract or emotional ("beautiful", "amazing"), which the models cannot render. Camera and lens language ("35mm", "macro", "wide angle") is genuinely effective for photographic output because it appears in the captioned training data. Medium words ("oil painting", "watercolour", "3D render", "vector illustration", "product photo") are the single biggest lever on overall look.
The negative prompt is the other half of the specification and the most under-used control. Rather than fighting an artefact by adding more positive words, name the artefact in the negative field: "blurry, low quality, distorted hands, extra fingers, watermark, text, jpeg artefacts, oversaturated". On the Amazon models this is the negativeText field; on Stability it is negative_prompt. A good negative prompt is reusable across a whole product line — build one, tune it once, apply it everywhere.
Every generation starts from a random noise field selected by a seed. Leave the seed unset and each call gives a different image; fix the seed and the same prompt + seed + parameters reproduces the same image. This is the foundation of an iterative workflow: when you get a near-miss you like, lock its seed, then change one word at a time and watch the effect in isolation. Without a fixed seed you are changing the prompt and the noise simultaneously and cannot tell which moved the result.
Seeds are also how you generate controlled variation for A/B creative: hold the prompt constant and sweep the seed across a range to get a grid of distinct-but-on-brief options. Store the winning seed alongside the prompt in your asset metadata so any image can be regenerated exactly later.
The cfgScale parameter controls how strongly the model is pushed toward the prompt versus allowed to wander. Low values (roughly 3–6) give more creative, more varied, sometimes more natural images that may drift from the prompt. High values (roughly 9–12) hew tightly to the prompt but can look over-cooked, contrasty, or stiff at the extreme. The default (around 7–8 on most Bedrock image models) is a sensible middle.
The practical rule: if the model is ignoring part of your prompt, raise cfgScale before you rewrite the prompt; if the output looks artificial or saturated, lower it. Treat it as a dial you tune per use case once, not per image. Photographic work usually wants it slightly lower; graphic, iconographic or text-heavy work usually wants it slightly higher.
1) Prompt structure (subject → composition → lighting → medium → quality) explains most of the result. 2) Negative prompt removes recurring artefacts and is reusable across a product line. 3) Seed makes any good result reproducible and lets you iterate one variable at a time. 4) cfgScale trades prompt-faithfulness against diversity. Master these four and you rarely need anything else.
Generation is only half of production image work; the other half is editing what you generated (or what a photographer shot). The Amazon models expose these as first-class operations; Stability offers them through dedicated edit endpoints.
Treat the editing operations as a small, composable vocabulary. Most real workflows chain two or three of them — generate a base, remove the background, then outpaint a new scene around the subject — rather than relying on a single perfect generation. Building the pipeline around edits is what separates a demo from a system.
taskType on the same model; masks can be a painted image or, on Nova, a text mask ("the mug") that the model resolves itself.A single InvokeModel call is a toy. A production image system is a pipeline: request handling, batch generation, durable storage, fast delivery, and moderation. Here is the shape that works on AWS.
The canonical architecture is small and boring, which is the point. A request arrives at API Gateway and a Lambda function; the Lambda calls Bedrock InvokeModel for the chosen model; the returned base64 image is decoded and written to an S3 bucket with a content-addressable key; the object is served to end users through a CloudFront distribution sitting in front of that bucket. Metadata — prompt, negative prompt, seed, model ID, parameters — is written to DynamoDB so any image can be traced and regenerated. That is the entire backbone.
For anything beyond interactive single-image generation, decouple the request from the work with a queue. The user (or upstream job) drops a generation request onto SQS; a fleet of Lambda or Fargate workers pulls from the queue, calls Bedrock, and writes results to S3. This absorbs spikes, gives you natural retry and dead-letter handling for the occasional throttle or content-filter rejection, and lets you respect Bedrock's per-model rate limits with a simple concurrency cap on the consumer. Generating a thousand product images is a queue-drain, not a thousand synchronous calls.
Delivery is where cost and latency are won or lost. S3 is durable but is object storage, not a CDN — serving images straight from S3 to a global audience is slow and pricey on egress. Put CloudFront in front: it caches images at edge locations close to users, collapses repeat requests, and makes S3-to-CloudFront data transfer free, so you pay the cheaper CDN egress rate instead of raw S3 egress. For user-generated or sensitive imagery, serve through signed CloudFront URLs or origin access control so the S3 bucket itself stays private. Set long cache TTLs (generated images are immutable — the same key never changes) and you serve most traffic from cache for pennies.
Three rules keep batch jobs cheap and reliable. First, generate at the lowest quality tier and resolution that passes review for drafts, and only re-run the winners at premium/high-res — quality tier is the single biggest cost lever on the Amazon models. Second, store every parameter set with its output so a re-run is deterministic and you never pay twice to recreate something you already made. Third, cap consumer concurrency to live within the model's requests-per-minute quota and let the queue hold the backlog; chasing higher throughput by hammering the endpoint just earns throttling exceptions.
For large recurring jobs, request a service quota increase for the specific image model in the specific region rather than assuming default limits will hold at scale — image-model quotas are per-model and per-region, and the default is sized for development, not for a catalogue-wide regeneration.
Image generation is priced per image, not per token, which makes the unit economics unusually easy to reason about. The model call is usually the largest line item, but storage and delivery are not zero at scale.
The model cost is a function of three things: which model, which resolution, and (on the Amazon models) which quality tier. As a rough 2026 map: Titan standard at 512×512 is the cheapest meaningful option (around $0.008/image); Titan premium and Nova Canvas at standard resolutions sit in the $0.04–$0.06 band; Stable Image Core is roughly $0.03–$0.04; SD 3.5 around $0.04–$0.07; and Stable Image Ultra at the top is roughly $0.08–$0.12. Higher resolution and premium tiers push toward the top of each range. Always confirm against the live Bedrock pricing page, but the ordering — Titan cheapest, Ultra dearest — is stable.
The pipeline costs are small per image but real in aggregate. S3 storage is a few cents per GB-month; a generated PNG is typically a fraction of a megabyte, so a million stored images is on the order of low-hundreds of dollars per month of storage. The cost that surprises teams is egress: serving images directly from S3 to users runs at S3 egress rates, whereas serving through CloudFront both caches (so most requests never hit S3) and bills at lower CDN egress rates with free S3-to-CloudFront transfer. At any real traffic level the CDN pays for itself many times over.
The honest end-to-end framing: for a typical marketing or ecommerce workload generating tens of thousands of images a month and serving them behind a CDN, the all-in AWS cost is dominated by the per-image model fee, with storage and delivery as a modest tail. The lever with the most leverage is generating drafts cheap and reserving premium/high-res for finals — that one discipline routinely halves the model bill.
Per-image model fee (the big one — minimise it by drafting at standard tier/low res and finalising only winners) + S3 storage (cents/GB-month; PNGs are small) + CloudFront delivery (cheaper than raw S3 egress and mostly served from cache). For most teams the model fee is 80%+ of the bill; the rest is rounding — until you forget the CDN and pay full S3 egress, which is the classic avoidable mistake.
The model and the prompt get the attention; watermarking, content safety and usage rights are what actually gate a launch. These are not optional polish — they are the difference between a pilot and something a brand or a regulated business can put its name on.
Watermarking and provenance: images from Amazon Nova Canvas and Titan carry an invisible, machine-detectable watermark by default, and AWS provides a way to check whether an image was produced by these models. This is genuinely useful — it lets you (and downstream parties) verify provenance and is increasingly relevant as AI-content disclosure expectations harden. Stability's models on Bedrock do not embed an invisible watermark by default, so if provenance tracking matters for your use case you either choose an Amazon model or add your own provenance metadata (for example, C2PA-style content credentials) in your pipeline.
Content safety: every image-generation path needs a moderation layer, both inbound (block prompts that request disallowed content) and outbound (catch outputs that slipped through). The Amazon models apply built-in moderation and will refuse certain requests; do not rely on that alone. Add your own controls — Amazon Bedrock Guardrails for prompt-level policy, and Amazon Rekognition's moderation labels on generated images to flag unsafe content before it is stored or served. For user-generated workflows this is mandatory, not optional, and it is far cheaper to run a moderation pass than to clean up after a bad image ships.
Licensing and usage rights: the commercial-use terms come from the model provider, surfaced through Bedrock's model access flow, and they are not uniform across the three families — Amazon's models and Stability's models carry their own terms, and some require accepting an end-user licence agreement before access is granted. Before you build a business on generated imagery, read the specific terms for the specific model you will ship, confirm commercial use is permitted for your scenario, and keep records of which model produced which asset (that DynamoDB metadata again). "It came out of an AI" is not a licence; the model provider's terms are.
The abstractions land when you map them to real jobs. Two domains drive most commercial image-generation volume on AWS, and each uses a recognisable subset of the toolkit.
Ecommerce is the highest-volume use case and the one where the editing toolkit matters most. The pattern: shoot or generate a product, run background removal to a transparent cut-out, then outpaint or composite the subject into many lifestyle scenes and many aspect ratios without a new shoot. Image variation produces on-brand alternates for testing; inpainting swaps colourways or fixes flaws; conditioning keeps a consistent packshot layout across an entire catalogue. The economic story is stark — replacing a fraction of a studio shoot with generated and edited imagery removes a per-SKU cost that scales with catalogue size, and the per-image AWS cost is cents.
Marketing creative leans more on generation quality and reformatting. The pattern: generate hero imagery (often Stable Image Ultra or Nova Canvas for the look), then outpaint each approved hero into the full set of channel formats (square, story, banner, wide) so one creative idea ships everywhere consistently. Seeds give controlled A/B variants from a single concept; negative prompts enforce brand consistency (no text, no clutter, a specific palette) across a campaign. The win is speed and volume — a campaign's worth of on-brief, correctly-sized creative in an afternoon instead of a week.
Two more that recur: thumbnail and content imagery at scale (blogs, listings, courses) where Titan's low unit cost dominates and quality requirements are moderate; and synthetic data generation, where image-to-image and controlled seeds produce labelled training variations for downstream computer-vision models. Both are volume plays where the per-image economics and the batch pipeline — not the single-image quality ceiling — decide whether the project is viable.
Skip the "best model" debate; match the family to the job. This is the same logic to apply per workflow rather than per company — different jobs in the same product often want different models.
| If the job is… | Reach for | Why | Key control |
|---|---|---|---|
| Ecommerce cut-outs at volume | Titan v2 or Nova Canvas | Native background removal + low unit cost | Background removal + standard tier |
| Hero / lifestyle marketing imagery | Stable Image Ultra | Top photographic realism + art direction | Negative prompt + aspect_ratio |
| Composition-locked product shots | Nova Canvas | Conditioning keeps layout, restyles inside | Image conditioning (canny/segmentation) |
| High-volume thumbnails / content | Titan standard tier | Cheapest meaningful option | Standard tier + 512–768px |
| In-image text / typography | SD 3.5 Large | Best text rendering of the families | cfgScale slightly higher |
| Edit-in-place workflows | Nova Canvas | Inpaint/outpaint/variation in one model | taskType + text masks |
| Provenance / watermark required | Nova Canvas or Titan | Invisible watermark by default | (automatic) |
Situation: Per-SKU studio photography had become the bottleneck and the cost line that scaled worst with catalogue growth. The team wanted generated + edited product imagery — clean transparent cut-outs, lifestyle composites, and every creative in five aspect ratios — behind a fast CDN, with content moderation and provenance, but had no in-house Bedrock or image-pipeline experience and limited budget to fund a build.
What CloudRoute did: Routed within 20 hours to a UK AWS partner with Bedrock + media-pipeline track record. The partner scoped a Bedrock image pipeline (Nova Canvas for conditioning + editing, Titan standard tier for high-volume drafts), an SQS-fed batch worker on Fargate writing to S3, CloudFront delivery via signed URLs, Rekognition moderation on outputs, and DynamoDB metadata (prompt, seed, model ID) per asset. The work was structured against AWS Activate credits plus a Bedrock POC funding track so the customer paid $0.
Outcome: Pipeline in production within 7 weeks: background-removed cut-outs + multi-aspect lifestyle creative generated in batch at cents per image, served from cache behind CloudFront. Studio-shoot spend on covered SKU categories fell sharply; per-image AWS cost was credit-funded. CloudRoute's commission was paid by the partner from AWS's engagement funding — the customer paid $0.
engagement window: 7 weeks · founder time: ~12 hours · models used: Nova Canvas + Titan · cost to customer: $0
CloudRoute routes you to a vetted AWS partner who designs and ships the Bedrock pipeline — models, batch, S3 + CloudFront delivery, moderation, provenance — often AWS-funded via Activate + Bedrock POC credits. Customer pays $0.