# VirtueLogic AI Capabilities Reference
# 136 annotated capability patterns
# https://www.virtuelogic.app/ai-engineering/capabilities
# Generated at build time
#
# See also:
# - /ai-engineering/architectures/llm.txt — 42 system architecture patterns
# - /ai-engineering/augmentation/llm.txt — 113 augmentation techniques
# - /ai-engineering/infrastructure/llm.txt — 65 infrastructure & ops patterns

## Marketing & Ad Copy Generation
ID: generation-text-marketing-copy
Category: Generation > Text Generation
Complexity: low | Phase: inference-time
Modalities: text

When to use: You need to produce brand-consistent promotional text at volume — product descriptions, ad variants, email sequences, or social posts — where tone consistency and A/B diversity matter more than deep originality.

When NOT to use: The output will be published without human review; regulatory or legal accuracy is required (pharma, financial services claims); or the task is long-form editorial content that requires journalistic voice and fact-checking.

Key tools: OpenAI GPT-4o, Anthropic Claude 3.5 Sonnet, Jasper AI, Copy.ai, Writer.com

Cost: Low per-call cost; scales linearly with volume. Token costs dominate for bulk generation; fine-tuning on brand voice adds one-off train-time cost.

---

## Document & Report Drafting
ID: generation-text-document-drafting
Category: Generation > Text Generation
Complexity: medium | Phase: inference-time
Modalities: text

When to use: You need to convert structured inputs (bullet notes, transcripts, data tables, outlines) into polished prose documents — business reports, proposals, SOPs, contract clauses — where the primary value is prose structuring, not source synthesis.

When NOT to use: The task is condensing content from a long document into a shorter summary (route to Extraction > Summarization); legal accuracy is non-negotiable without expert review; or the document requires live data lookup the model cannot perform.

Key tools: Anthropic Claude 3.5 Sonnet, OpenAI GPT-4o, Notion AI, Microsoft Copilot for Word

Cost: Medium. Long-context inputs (transcripts, data dumps) incur higher input token costs. Iterative revision loops multiply cost — a single-pass outline-to-draft pattern is most efficient.

---

## Translation & Localisation
ID: generation-text-translation-localisation
Category: Generation > Text Generation
Complexity: low | Phase: inference-time
Modalities: text

When to use: You need high-throughput translation of prose content with cultural adaptation — multilingual publishing pipelines, product UI strings, marketing copy in 10+ locales — especially where human post-editing capacity is limited.

When NOT to use: Legal, medical, or certified document translation where regulatory accuracy is required; code translation between programming languages (code generation domain); or real-time conversational translation where latency is critical and specialized ASR+MT pipelines are more appropriate.

Key tools: DeepL API, OpenAI GPT-4o, Google Cloud Translation (Advanced), ModernMT, Anthropic Claude 3.5 Sonnet

Cost: Low to medium. Per-character or per-token pricing; dedicated MT APIs (DeepL) significantly cheaper than general LLMs for pure translation volume. LLMs add cost but gain quality on culturally nuanced and marketing content.

---

## Conversational Dialogue Generation
ID: generation-text-conversational-dialogue
Category: Generation > Text Generation
Complexity: low | Phase: inference-time
Modalities: text

When to use: You need to generate scripted or semi-scripted dialogue — chatbot response templates, FAQ answer banks, training simulation scripts, persona-consistent replies — where the exchange is bounded in scope and does not require autonomous multi-step reasoning.

When NOT to use: The use case requires live multi-turn task completion with tool use (route to Agents); the channel is voice (route to Audio Generation > TTS); or real-time customer routing logic is needed (route to Classification).

Key tools: Anthropic Claude 3.5 Sonnet, OpenAI GPT-4o, Rasa, Intercom Fin

Cost: Low for individual responses. Conversation history accumulation in multi-turn sessions multiplies input token cost — implement context window management (summarization, pruning) early.

---

## Code Completion & Scaffolding
ID: generation-code-completion-scaffolding
Category: Generation > Code Generation
Complexity: medium | Phase: inference-time
Modalities: text

When to use: You need to accelerate developer throughput via inline autocomplete, boilerplate generation from schema or spec, or infrastructure-as-code generation from architecture descriptions — especially for well-typed, convention-heavy codebases.

When NOT to use: The task requires multi-step iterative coding with test-run feedback loops (route to Agents > Coding Agent); you are generating SQL from natural language questions (NL-to-query); or security review of generated code is not possible before deployment.

Key tools: GitHub Copilot, Cursor, Amazon CodeWhisperer, Tabnine, Anthropic Claude 3.5 Sonnet

Cost: Low per-completion via IDE plugin subscription. API-based scaffolding (Terraform/IaC from spec) costs more per call but replaces hours of manual authoring. ROI is high for boilerplate-heavy stacks.

---

## Natural Language to Query Generation
ID: generation-code-nl-to-query
Category: Generation > Code Generation
Complexity: medium | Phase: inference-time
Modalities: text, tabular

When to use: You need a natural language interface to structured data stores — BI dashboards, internal data portals, or developer tooling — where non-technical users need to query databases or APIs without knowing SQL or GraphQL syntax.

When NOT to use: The query target is unstructured text (route to Search/Retrieval); the generated query will execute without a review/preview step against production data; or the schema changes frequently and cannot be injected into the prompt reliably.

Key tools: OpenAI GPT-4o with function calling, Anthropic Claude 3.5 Sonnet, LangChain SQL Agent, Vanna.AI, DuckDuckGo/DuckDB NL interface

Cost: Low to medium per query. Schema injection (full DDL or selected table descriptions) adds input tokens proportional to schema size. Iterative clarification loops multiply cost.

---

## Automated Test Generation
ID: generation-code-test-generation
Category: Generation > Code Generation
Complexity: medium | Phase: inference-time
Modalities: text

When to use: You need to accelerate test coverage creation for existing source code — generating unit tests from function signatures, integration test skeletons, or regression tests from bug reports — where the primary goal is coverage improvement, not test execution.

When NOT to use: Test execution and result analysis is needed (route to Agents > Coding Agent); you are generating non-test source code; or the codebase lacks sufficient inline documentation for the model to infer expected behavior.

Key tools: GitHub Copilot, Cursor, CodiumAI (Qodo), EvoSuite (Java), Pynguin (Python)

Cost: Low per file. Most value comes from generating test scaffolding; marginal cost per test case is minimal. Highest ROI on codebases with zero test coverage.

---

## Code Refactoring & Migration
ID: generation-code-refactoring-migration
Category: Generation > Code Generation
Complexity: high | Phase: inference-time
Modalities: text

When to use: You need single-pass, deterministic code transformation — language migration (Python 2→3, Java→Kotlin), framework upgrade codemods, API modernisation, or style normalisation — where the transformation rules are well-defined and the output can be diffed and reviewed.

When NOT to use: The refactoring requires multi-step test-run-fix cycles (route to Agents > Coding Agent); the transformation is purely semantic (detecting bugs without remediation); or human code review of the diff is not feasible at the volume being processed.

Key tools: Anthropic Claude 3.5 Sonnet, OpenAI GPT-4o, GitHub Copilot, Sourcery (Python), OpenRewrite (Java)

Cost: Medium to high. Long source files require large context windows; chunking strategies add complexity. OpenRewrite and Sourcery offer rule-based codemods that are cheaper for well-known patterns; LLMs add value for nuanced semantic refactoring.

---

## Product & Commercial Visual Generation
ID: generation-image-product-visual
Category: Generation > Image & Visual Generation
Complexity: medium | Phase: inference-time
Modalities: image

When to use: You need to produce e-commerce or advertising imagery at scale — product backgrounds, lifestyle composites, packaging mockups, catalogue variants — where photography costs or logistics make traditional shoots impractical for every SKU or locale variant.

When NOT to use: Brand identity design (logo, icon systems) is needed (route to Graphic Design); video is required; the asset must meet photorealistic quality bars for high-budget broadcast advertising without extensive post-production review.

Key tools: Midjourney, Adobe Firefly (Generative Fill), Stability AI Stable Diffusion XL, DALL-E 3 via OpenAI API, Bria AI

Cost: Low to medium per image. Bulk catalogue generation scales cost linearly. Fine-tuning on brand product style (LoRA/DreamBooth) adds one-off train-time cost but dramatically improves consistency.

---

## Graphic Design & Brand Asset Generation
ID: generation-image-graphic-design
Category: Generation > Image & Visual Generation
Complexity: medium | Phase: inference-time
Modalities: image

When to use: You need to generate brand concept assets — logo ideas, icon sets, UI illustrations, infographic elements, or social media templates — particularly in early-stage ideation where rapid visual iteration is more valuable than pixel-perfect output.

When NOT to use: Final production-ready brand identity work requiring vector format, precise color profiles, and legal trademark clearance; product photography generation; or animation assets (video generation domain).

Key tools: Midjourney, Adobe Firefly, DALL-E 3, Canva AI (Magic Media), Stable Diffusion XL

Cost: Low per image for concept iteration. Professional brand identity projects still require human designer for vectorization, trademark search, and refinement — AI accelerates concepting, not final delivery.

---

## Image Editing & Inpainting
ID: generation-image-image-editing-inpainting
Category: Generation > Image & Visual Generation
Complexity: medium | Phase: inference-time
Modalities: image

When to use: You have an existing image that needs targeted modification — removing/replacing backgrounds, inpainting objects, extending the frame (outpainting), upscaling resolution, or applying style transfer — where generating from scratch would discard valuable source material.

When NOT to use: The task requires full image generation with no source image; video frame editing (route to Video Generation); watermark removal may violate copyright of the original image owner.

Key tools: Adobe Firefly (Generative Fill), Stable Diffusion with ControlNet, DALL-E 3 Inpainting, Runway ML, Topaz Photo AI

Cost: Low to medium. Adobe Firefly and similar tools priced per generation credit. Topaz for upscaling is a one-time purchase. Self-hosted Stable Diffusion adds GPU cost but eliminates per-call fees for high volume.

---

## Synthetic Visual Training Data Generation
ID: generation-image-synthetic-training-data-visual
Category: Generation > Image & Visual Generation
Complexity: high | Phase: train-time
Modalities: image

When to use: Your computer vision model needs more labelled training data than you can acquire through real-world collection — especially for rare events, dangerous scenarios, or domain-randomized environments — and you need auto-labelled synthetic images.

When NOT to use: The generated images are for human consumption or marketing (route to Product Visual or Graphic Design); you need text/tabular synthetic data (route to Synthetic Data > Tabular); the real-world distribution shift between synthetic and production data has not been evaluated.

Key tools: NVIDIA Omniverse / Isaac Sim, BlenderProc, Stability AI Stable Diffusion (domain randomization), Synthetic Data Vault (for non-visual), Unity Perception

Cost: High upfront. 3D pipeline setup (Omniverse, BlenderProc) requires significant engineering investment. Once the pipeline is established, marginal cost per image is low. Diffusion-based augmentation is cheaper but lower fidelity for precise label generation.

---

## Text-to-Speech & Voiceover Generation
ID: generation-audio-tts-voiceover
Category: Generation > Audio & Speech Generation
Complexity: low | Phase: inference-time
Modalities: audio, text

When to use: You need high-quality speech synthesis at scale — product UI narration, audiobook production, IVR prompts, e-learning voiceovers, or multilingual TTS — where consistency, naturalness, and production speed matter more than live human performance nuance.

When NOT to use: Real-time conversational speech is required (conversational agent domain); music or non-speech audio is needed; the use case is transcription (route to Extraction > Media Parsing).

Key tools: ElevenLabs, OpenAI TTS API, Google Cloud Text-to-Speech, Azure Neural TTS, Resemble AI

Cost: Low per character/word. ElevenLabs and OpenAI TTS priced per 1K characters. Voice cloning adds one-off setup. Streaming TTS for real-time applications requires latency-optimized endpoints at higher per-call cost.

---

## Music & Soundtrack Generation
ID: generation-audio-music-generation
Category: Generation > Audio & Speech Generation
Complexity: low | Phase: inference-time
Modalities: audio

When to use: You need royalty-free background music, adaptive game soundtracks, or short-form jingles generated from mood/style prompts — particularly where licensing costs for stock music are high or volume of unique tracks needed exceeds what libraries can provide.

When NOT to use: The output is intended as a standalone commercial musical work attributed to an artist; the use case is sound effects or Foley (route to Sound Effects); music transcription is needed (route to Extraction).

Key tools: Suno AI, Udio, Meta MusicGen (open source), Google MusicLM, Soundraw

Cost: Low per track for consumer tools (Suno/Udio subscription). Open-source MusicGen requires GPU compute but eliminates licensing concerns. Adaptive game audio at runtime may need low-latency optimized serving.

---

## Sound Effects & Foley Generation
ID: generation-audio-sound-effects
Category: Generation > Audio & Speech Generation
Complexity: low | Phase: inference-time
Modalities: audio

When to use: You need to generate procedural sound effects, UI sounds, environmental ambience, or Foley assets from text descriptions — particularly when stock libraries lack the specific effect needed, or when unique per-instance sound variation is required for games.

When NOT to use: Music is needed (route to Music Generation); speech synthesis is required (route to TTS); audio enhancement or noise removal of existing recordings (not a generation task).

Key tools: ElevenLabs Sound Effects, Stability AI Stable Audio, AudioCraft (Meta), Adobe Podcast AI (enhancement only), Freesound + generative augmentation

Cost: Low. Most tools priced per generation or via subscription. Open-source Stable Audio and AudioCraft can run locally for high-volume game audio pipelines. Marginal cost per effect is minimal.

---

## Voice Conversion & Style Transfer
ID: generation-audio-voice-conversion
Category: Generation > Audio & Speech Generation
Complexity: high | Phase: inference-time
Modalities: audio

When to use: You have existing speech audio that needs speaker style transformation — accent modification, age/gender characteristic transfer, emotion injection, prosody editing, or dubbing voice replacement to match lip-sync — while preserving the spoken content.

When NOT to use: You need to generate speech from text (route to TTS); transcribing speech to text (route to Extraction > Media Parsing); classifying speaker identity (route to Classification > Audio).

Key tools: RVC (Retrieval-based Voice Conversion), Kits.AI, ElevenLabs Voice Conversion, Resemble AI, So-VITS-SVC

Cost: Medium. Real-time voice conversion requires low-latency GPU inference. Dubbing pipelines that combine ASR + translation + TTS + voice conversion have compounding costs. Self-hosted RVC is cheapest at scale for non-real-time use.

---

## Text-to-Video & Prompt-Driven Generation
ID: generation-video-text-to-video
Category: Generation > Video & Animation Generation
Complexity: high | Phase: inference-time
Modalities: video, text

When to use: You need to generate short video clips from text prompts — social media content, concept visualizations, storyboard animatics, or automated ad creative from product briefs — where production speed outweighs the need for precise creative control.

When NOT to use: Precise camera control, character consistency, or scenes longer than ~10 seconds are required; image-to-video animation from a specific source image is needed (route to Image-to-Video); editing existing footage is the task (route to Video Editing).

Key tools: OpenAI Sora (via ChatGPT Plus/Pro, $20–$200/mo), Runway Gen-3 Alpha (from $12/mo; per-second credit billing), Kling AI (global.klingai.com; free tier + credit packs from ~$0.014/credit), Pika Labs (pika.art; free tier; paid from $8/mo), Luma Dream Machine (luma.ai; free tier + subscription plans), Stability AI Stable Video Diffusion (open-source; self-host or API)

Cost: High per clip relative to image generation. Runway Gen-3 starts at $12/mo with credit-based billing (~$0.05–$0.15/sec of video). Pika Labs free tier available; paid plans from $8/mo. Kling AI offers credit packs (per-second pricing ~$0.014–$0.056/credit). Sora is available in ChatGPT Plus ($20/mo) and Pro ($200/mo). GPU-intensive; self-hosting impractical at current model sizes.

---

## Avatar & Talking Head Video Generation
ID: generation-video-avatar-talking-head
Category: Generation > Video & Animation Generation
Complexity: medium | Phase: inference-time
Modalities: video, audio, text

When to use: You need to generate presenter/spokesperson video at scale — e-learning instructor avatars, personalised video messages, digital twin spokespeople, or AI presenter content — where real-world video shoots are impractical per-recipient or per-topic.

When NOT to use: Full scene video generation is required (route to Text-to-Video); deepfake detection rather than generation is the task (route to Classification); audio only is sufficient (route to TTS).

Key tools: HeyGen, Synthesia, D-ID, Runway ML, ElevenLabs + Wav2Lip

Cost: Medium. Enterprise platforms (Synthesia, HeyGen) priced per video minute at a significant per-unit premium over TTS alone. Personalized video at scale (1:1 messages) requires API integration with per-video pricing.

---

## Image-to-Video & Animation Generation
ID: generation-video-image-to-video-animation
Category: Generation > Video & Animation Generation
Complexity: medium | Phase: inference-time
Modalities: video, image

When to use: You have a static image (product photo, illustration, design mockup, character artwork) that needs to be animated into a short video clip — product ads, animated social content, 2D character motion, or synthetic video training data generation.

When NOT to use: No source image is available and generation from text only is needed (route to Text-to-Video); 3D rendering pipelines from geometric models are required; editing existing video footage is the task (route to Video Editing).

Key tools: Runway Gen-3 (Image-to-Video), Stability AI Stable Video Diffusion, Kling AI, Pika Labs, AnimateDiff (open source)

Cost: Medium. Per-generation pricing similar to text-to-video but typically lower given constrained motion from the source image. Open-source AnimateDiff requires GPU but eliminates per-call cost for high-volume pipelines.

---

## AI Video Editing & Enhancement
ID: generation-video-video-editing-enhancement
Category: Generation > Video & Animation Generation
Complexity: medium | Phase: inference-time
Modalities: video, text, audio

When to use: You have existing video footage that needs AI-assisted post-production — automated editing from a transcript, scene cut detection, colour grading, upscaling, frame rate interpolation, subtitle generation, or highlight reel extraction from long recordings.

When NOT to use: Generating video from scratch (route to Text-to-Video or Image-to-Video); avatar/presenter generation (route to Talking Head); audio-only post-production tasks.

Key tools: Descript, Adobe Premiere Pro (Sensei AI), Topaz Video AI, RunwayML (Inpainting/Cut), Opus Clip

Cost: Medium. Descript and Opus Clip subscription-based. Topaz Video AI is a one-time purchase for upscaling. Professional NLE integrations (Premiere Sensei) included in Creative Cloud. GPU upscaling at high resolution can be expensive in cloud compute.

---

## Tabular Data Augmentation & Oversampling
ID: generation-synthetic-data-tabular-augmentation
Category: Generation > Synthetic Tabular & Structured Data Generation
Complexity: medium | Phase: train-time
Modalities: tabular

When to use: Your ML model training dataset has class imbalance or insufficient data volume, and you need to generate statistically realistic synthetic rows to oversample minority classes or augment under-represented segments — where the primary goal is model performance improvement.

When NOT to use: Privacy compliance is the primary driver (route to Privacy-Safe Synthetic Data); time series data is needed (route to Prediction domain); you need image or audio training data; generating text data for LLM fine-tuning.

Key tools: CTGAN (SDV library), TVAE (SDV library), Synthpop (R), YData Synthetic, Gretel.ai

Cost: Low to medium. Open-source SDV library is free; GPU accelerates training for large tables. Gretel.ai and YData are managed services with per-row or subscription pricing. Cost is front-loaded at training time; inference-time generation is cheap.

---

## Privacy-Safe Synthetic Data Generation
ID: generation-synthetic-data-privacy-anonymisation
Category: Generation > Synthetic Tabular & Structured Data Generation
Complexity: high | Phase: train-time
Modalities: tabular

When to use: You need to share, publish, or use sensitive data (PII, health records, financial transactions) in a context where the original cannot be used — regulatory compliance (GDPR, HIPAA), dev/test environment population, or external data sharing — and differential privacy guarantees or re-identification risk measurement are required.

When NOT to use: Data masking or tokenisation (not full generation) is sufficient for the use case; augmenting data for model performance without privacy constraints (route to Tabular Augmentation); de-identification without replacement data generation.

Key tools: Gretel.ai (DP synthetic), Mostly AI, DataSynthesizer, Google DP Library, ARX Data Anonymization Tool

Cost: Medium to high. Differential privacy adds compute overhead and quality trade-offs. Managed platforms (Gretel, Mostly AI) simplify compliance documentation but at per-row or subscription cost. Legal review of the synthetic data output for regulatory compliance is an additional non-compute cost.

---

## LLM Training & Fine-Tuning Data Generation
ID: generation-synthetic-data-llm-training-data
Category: Generation > Synthetic Tabular & Structured Data Generation
Complexity: high | Phase: train-time
Modalities: text

When to use: You need to create instruction-following datasets, QA pairs, preference pairs, or synthetic dialogue corpora for fine-tuning or RLHF — particularly where real annotated data is scarce, expensive, or unavailable for the target domain.

When NOT to use: Tabular or structured data generation is needed (route to Tabular Augmentation); image/audio training data (route to Synthetic Visual Training Data); data annotation of real examples (classification/labelling task).

Key tools: Anthropic Claude (teacher model), OpenAI GPT-4o (teacher model), Stanford Alpaca pipeline, LLaMA Factory, Argilla (human-in-the-loop curation)

Cost: Medium. Teacher model API costs dominate — generating 50K instruction pairs at ~500 tokens each is non-trivial. Self-hosted teacher models reduce cost at the expense of quality. Filtering and curation pipelines (dedup, quality scoring) add compute cost.

---

## Simulation & Scenario Data Generation
ID: generation-synthetic-data-simulation-scenario
Category: Generation > Synthetic Tabular & Structured Data Generation
Complexity: high | Phase: train-time
Modalities: tabular, text

When to use: You need synthetic event sequences, sensor streams, or edge-case scenarios for system testing or model robustness training — fraud transaction sequences, adversarial attack scenarios, simulated IoT sensor faults, or rare operational events that cannot be collected from production.

When NOT to use: Privacy anonymisation is the primary concern (route to Privacy-Safe Synthetic Data); real-time simulation platform execution is the goal (not data generation); physical simulation without AI generation is sufficient.

Key tools: Gretel.ai (time series + event), NVIDIA DALI (for vision pipelines), SimPy (discrete event simulation), Faker (Python), SDV Sequential synthesizers

Cost: Medium to high depending on scenario complexity. Simple event log generation with Faker or SimPy is near-zero cost. AI-driven adversarial scenario generation using LLMs incurs API costs proportional to scenario count. Complex simulation environments (NVIDIA, Unity) require significant engineering investment.

---

## Speech-to-Text Transcription
ID: extraction-media-parsing-speech-to-text
Category: Extraction & Structuring > Media Parsing & Transcription
Complexity: low | Phase: inference-time
Modalities: audio

When to use: When you have audio recordings (calls, meetings, lectures, podcasts) that need to be converted to searchable or processable text. Especially valuable when downstream NLP tasks (summarisation, entity extraction, search) require a text representation of spoken content.

When NOT to use: When audio quality is poor and accuracy requirements are high (accents, heavy background noise, overlapping speakers). When real-time latency <200 ms is required with strict accuracy guarantees. When the content is music or non-speech audio.

Key tools: Whisper (OpenAI), AssemblyAI, Google Cloud Speech-to-Text, Azure AI Speech, Amazon Transcribe

Cost: Low per-minute cost at scale (Azure/AWS ~$0.01–$0.02/min). Self-hosted Whisper near-zero marginal cost but requires GPU for real-time use.

---

## Document OCR & Layout Parsing
ID: extraction-media-parsing-document-ocr
Category: Extraction & Structuring > Media Parsing & Transcription
Complexity: low | Phase: inference-time
Modalities: image

When to use: When you have scanned PDFs, image-based documents, or photographed forms that need machine-readable text or structured data. Mandatory before any downstream NLP can operate on digitised paper documents.

When NOT to use: When the PDF is already digitally generated (text layer present) — use a plain PDF parser instead. When document quality is extremely low (crumpled, heavily handwritten cursive) and cost of errors exceeds the cost of manual entry.

Key tools: Google Document AI, Azure AI Document Intelligence, AWS Textract, Tesseract (open-source)

Cost: API services ~$1–$10 per 1000 pages depending on feature set. Tesseract is free but lacks table extraction quality of cloud services.

---

## Video Frame & Scene Content Parsing
ID: extraction-media-parsing-video-frame-analysis
Category: Extraction & Structuring > Media Parsing & Transcription
Complexity: medium | Phase: inference-time
Modalities: video

When to use: When you need to make video content searchable, index video by visual scene, extract on-screen text, or build content-based navigation (chapters, highlights). Common for media asset management, compliance monitoring, and e-learning content indexing.

When NOT to use: When audio content is the primary signal (route to speech-to-text). When video duration is very long (>2 h) and only a small portion is relevant — pre-filter with scene detection before sending to a vision model. When high temporal precision (<1 s) is required for action recognition.

Key tools: Google Video Intelligence API, AWS Rekognition Video, Azure Video Indexer, GPT-4o (video frames via API)

Cost: Cloud APIs typically $0.05–$0.15 per video minute. Costs multiply with frame sample rate — sampling every 1 s vs. every 5 s can 5× the cost with marginal accuracy gain for many tasks.

---

## Image Content & Metadata Extraction
ID: extraction-media-parsing-image-content-extraction
Category: Extraction & Structuring > Media Parsing & Transcription
Complexity: low | Phase: inference-time
Modalities: image

When to use: When you need to inventorise the contents of image collections (e-commerce products, medical images, satellite imagery), extract text embedded in photos (signs, labels, screenshots), or digitise charts/graphs as a pre-processing step before data analysis.

When NOT to use: When the primary need is to classify images into categories (route to visual classification). When EXIF metadata alone suffices (no vision model needed). When chart digitisation needs semantic interpretation of data values — that semantic step routes to entity/attribute extraction.

Key tools: GPT-4o Vision, Google Cloud Vision API, AWS Rekognition, Azure AI Vision

Cost: Cloud vision APIs: ~$0.0015–$0.005 per image for standard label detection. GPT-4o vision costs more (~$0.005–$0.02/image) but handles complex reasoning over image content.

---

## Named Entity Recognition (NER)
ID: extraction-entity-named-entity-recognition
Category: Extraction & Structuring > Entity & Attribute Extraction
Complexity: low | Phase: inference-time
Modalities: text

When to use: When you need to tag, count, or filter documents by the presence of specific named entity types (people, organisations, locations, dates). Foundation step for many downstream tasks including relationship extraction, KG population, and search faceting.

When NOT to use: When entity resolution across records is needed (route to entity resolution). When the task is to extract relationships between already-identified entities. When the domain is highly specialised and a general NER model will have poor recall — budget for fine-tuning or use a domain-specific model.

Key tools: spaCy (open-source), Hugging Face Transformers (BERT-NER), AWS Comprehend, Azure AI Language

Cost: spaCy/Hugging Face: near-zero at self-hosted scale. AWS Comprehend: ~$0.0001 per unit (100 chars). Cost is low but can accumulate on large corpora.

---

## Structured Field Extraction from Documents
ID: extraction-entity-document-field-extraction
Category: Extraction & Structuring > Entity & Attribute Extraction
Complexity: medium | Phase: inference-time
Modalities: text, image

When to use: When you have a defined target schema and need to pull specific fields from semi-structured or unstructured documents at scale (invoices, contracts, medical records, insurance claims). The schema is known in advance.

When NOT to use: When the document is already machine-readable structured data — use a standard parser. When the set of fields to extract is open-ended and not known upfront (route to attribute extraction). When the primary task is OCR/digitisation without semantic parsing.

Key tools: Azure AI Document Intelligence (custom models), Google Document AI (custom extractor), AWS Textract + Comprehend, LLM-based extraction (GPT-4o with JSON schema)

Cost: Custom model services: $0.01–$0.05 per page after training costs. LLM-based extraction (GPT-4o + JSON mode): $0.005–$0.05 per document. Training custom Document AI models requires 50–500 labelled examples.

---

## Attribute & Property Extraction
ID: extraction-entity-attribute-property-extraction
Category: Extraction & Structuring > Entity & Attribute Extraction
Complexity: medium | Phase: inference-time
Modalities: text

When to use: When the set of attributes to extract is not predefined — you want the model to discover and enumerate properties from free text (product specs, job postings, clinical notes). Useful for schema discovery before formalising a field extraction pipeline.

When NOT to use: When you have a fixed schema and want specific fields — route to document field extraction. When the attributes are proper names/entities (people, orgs, dates) — route to NER. When the output needs to map to a standard controlled vocabulary — route to classification.

Key tools: GPT-4o (prompt engineering), Claude 3.5 Sonnet, spaCy with custom pipelines, Hugging Face information extraction models

Cost: Primarily LLM-based; cost scales with document volume and length. GPT-4o: ~$0.01–$0.05 per document for medium-length texts. Batch processing APIs reduce cost 50%.

---

## Entity Resolution & Deduplication
ID: extraction-entity-entity-resolution-dedup
Category: Extraction & Structuring > Entity & Attribute Extraction
Complexity: high | Phase: inference-time
Modalities: tabular

When to use: When you have multiple data sources with overlapping entity records and need to determine which records refer to the same real-world entity (person, company, product). Essential in MDM, CRM consolidation, and data warehouse integration.

When NOT to use: When records are already uniquely keyed and no cross-source matching is needed. When the task is to extract entity mentions from text (route to NER). When exact-match deduplication is sufficient — AI is overkill for deterministic matching.

Key tools: Splink (open-source probabilistic matching), AWS Entity Resolution, Dedupe.io, GPT-4o (pairwise comparison at small scale)

Cost: Splink: free, runs on Spark/DuckDB. AWS Entity Resolution: $0.001 per record comparison. LLM pairwise comparison becomes cost-prohibitive above 100k records due to O(n²) comparison growth.

---

## Meeting & Call Summarisation
ID: extraction-summarization-meeting-call-summary
Category: Extraction & Structuring > Summarization & Condensation
Complexity: low | Phase: inference-time
Modalities: audio, text

When to use: When meeting transcripts, sales call recordings, or support interactions need to be condensed into action items, decisions, or CRM-ready summaries at scale. High ROI when the organisation has >50 meetings/week generating transcripts.

When NOT to use: When full verbatim transcript is required for compliance or legal discovery (route to speech-to-text). When the meeting content needs deep semantic extraction of entities for a structured database (route to document field extraction after transcription). When audio quality makes transcription unreliable.

Key tools: Otter.ai, Fireflies.ai, Gong.io (sales calls), GPT-4o (custom pipeline on Whisper transcripts)

Cost: Dedicated tools: $10–$30/user/month. Custom GPT-4o pipeline: ~$0.02–$0.10 per meeting including transcription. Most cost comes from transcript tokens at long meetings (2 h meeting ≈ 30k tokens).

---

## Document Abstractive Summarisation
ID: extraction-summarization-document-abstractive
Category: Extraction & Structuring > Summarization & Condensation
Complexity: low | Phase: inference-time
Modalities: text

When to use: When stakeholders need a condensed, paraphrased overview of long documents (research papers, reports, legal briefs, email threads) without reading the full text. The model synthesises meaning rather than extracting verbatim passages.

When NOT to use: When verbatim source attribution is required (e.g. legal quotation). When the document is the input to a specific question (route to RAG/semantic search). When faithfulness to source is paramount and the domain is high-stakes (medical, legal) — abstractive summarisation can introduce subtle distortions.

Key tools: GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro (long-context), LangChain MapReduce summarisation chain

Cost: Cost scales with input token count. Long documents (100-page report ≈ 75k tokens) with GPT-4o: ~$0.60–$1.50 per document. Chunking + MapReduce reduces cost but degrades cross-chunk synthesis.

---

## Key Point & Insight Extraction
ID: extraction-summarization-key-point-extraction
Category: Extraction & Structuring > Summarization & Condensation
Complexity: low | Phase: inference-time
Modalities: text

When to use: When you need a scannable bullet-point distillation of a document, or when you want to surface the most important sentences verbatim (extractive) rather than paraphrase them. Good for analyst workflows where source traceability matters.

When NOT to use: When the output needs to read as coherent prose (route to abstractive summarisation). When comprehensive argument mapping is needed (legal/academic debate analysis) — key point extraction is lossy by design. When you need structured data output rather than prose bullets.

Key tools: GPT-4o (prompt-based), Claude 3.5 Sonnet, Hugging Face extractive summarisation models (BERT-Extractive)

Cost: Primarily LLM-based at inference cost. Cheaper than full abstractive summarisation since output is short. Extractive models (BERT-based) are near-zero cost at self-hosted scale.

---

## Automated Digest & Newsletter Generation
ID: extraction-summarization-digest-newsletter
Category: Extraction & Structuring > Summarization & Condensation
Complexity: medium | Phase: inference-time
Modalities: text

When to use: When you need to automate regular (daily/weekly) content aggregation across multiple sources into a branded, readable digest. High value for market intelligence, research monitoring, and internal knowledge-sharing workflows.

When NOT to use: When a single document needs summarising (route to document abstractive summarisation). When content curation requires significant editorial judgment that automated scoring cannot replicate. When the audience has high expectations for original voice/writing quality.

Key tools: GPT-4o (generation), Feedly + API (content aggregation), RSS feed parsers, LangChain (pipeline orchestration)

Cost: Primarily engineering cost for pipeline setup. Ongoing LLM costs are low per digest (typically <$0.10/issue at moderate volume). Content aggregation services (Feedly Pro) add $50–$100/month.

---

## Relationship & Event Extraction
ID: extraction-knowledge-relationship-extraction
Category: Extraction & Structuring > Knowledge & Relationship Extraction
Complexity: medium | Phase: inference-time
Modalities: text

When to use: When you need to extract structured facts about how entities relate (X works for Y, X acquired Y, X causes Y) from unstructured text, as input to knowledge graphs, competitive intelligence, or causal analysis pipelines.

When NOT to use: When entity identification alone is the goal (route to NER). When the output needs to be stored in a graph database and the primary deliverable is the graph (route to knowledge graph population). When relations are already encoded in structured data.

Key tools: spaCy (dependency parsing + custom components), Hugging Face REBEL (relation extraction), GPT-4o (prompt-based triple extraction), Stanford OpenIE

Cost: Open-source models (REBEL, Stanford OpenIE): near-zero at self-hosted scale. LLM-based extraction: $0.01–$0.05 per document. Cost-quality tradeoff is real — open-source models have lower precision on complex relations.

---

## Knowledge Graph Population
ID: extraction-knowledge-kg-construction
Category: Extraction & Structuring > Knowledge & Relationship Extraction
Complexity: high | Phase: design-time
Modalities: text

When to use: When the primary deliverable is a populated, queryable knowledge graph — not just a list of extracted triples. Use when you need entity-relationship traversal, graph analytics, or to support downstream reasoning over a connected fact base.

When NOT to use: When you only need a flat list of extracted relationships without graph storage (route to relationship extraction). When the knowledge base is small enough that a structured database suffices. When maintenance of ontology alignment is too costly relative to benefit.

Key tools: Neo4j (graph database), Amazon Neptune, LlamaIndex (KG construction pipeline), Diffbot (automated web KG)

Cost: Neo4j Community: free. Neo4j Enterprise / Neptune: significant licensing or usage costs at scale. LlamaIndex-based pipeline: primarily LLM inference costs plus graph database hosting. Total cost of ownership is high due to ontology design and curation overhead.

---

## Claim & Fact Extraction
ID: extraction-knowledge-claim-fact-extraction
Category: Extraction & Structuring > Knowledge & Relationship Extraction
Complexity: medium | Phase: inference-time
Modalities: text

When to use: When you need to identify discrete, verifiable factual claims from text — separately from verifying them. Common in fact-checking pipelines, regulatory compliance review, and financial analysis where claims must be surfaced before they can be checked against authoritative sources.

When NOT to use: When claim verification or fact-checking is the goal (that is a classification task — true/false/misleading). When the text is already structured and facts are directly readable. When all statements in the text are subjective opinions without verifiable factual content.

Key tools: GPT-4o (prompt-based claim extraction), Claude 3.5 Sonnet, ClaimBuster (open-source claim detection), Hugging Face claim detection models

Cost: LLM-based extraction: $0.01–$0.05 per document. ClaimBuster API has free tier for research. Costs are comparable to other LLM extraction tasks.

---

## Taxonomy & Ontology Induction
ID: extraction-knowledge-taxonomy-ontology-induction
Category: Extraction & Structuring > Knowledge & Relationship Extraction
Complexity: high | Phase: design-time
Modalities: text

When to use: When you need to discover a category hierarchy or concept structure from a corpus rather than design it manually — e.g., building a product taxonomy from a catalogue, inducing a domain glossary, or seeding an ontology from scientific literature.

When NOT to use: When an authoritative ontology already exists for the domain (OWL, SNOMED, WordNet) — map to it rather than inducing a new one. When the corpus is too small (<1000 documents) for reliable statistical induction. When ontology governance and formal logic constraints are required (use a knowledge engineer).

Key tools: GPT-4o (concept clustering + labelling), BERTopic (topic modelling), word2vec / fastText (distributional similarity), Protégé (ontology authoring + validation)

Cost: BERTopic and embedding models: near-zero at self-hosted scale. LLM-based hierarchical labelling: moderate cost depending on corpus size. Primary cost is human curation of the induced structure (always required for production use).

---

## Schema Mapping & Data Normalisation
ID: extraction-transformation-schema-mapping
Category: Extraction & Structuring > Data Transformation & Structuring
Complexity: medium | Phase: design-time
Modalities: tabular

When to use: When integrating systems with different data models and needing to automate the field-to-field mapping (ETL pipelines, API integration, data warehouse ingestion). AI assistance is particularly valuable when source schema is poorly documented or inconsistently named.

When NOT to use: When schemas are already well-documented and a simple lookup table suffices — deterministic mapping code is more reliable than AI mapping. When the transformation involves complex business logic that must be explicitly audited. When source data is unstructured text (route to unstructured-to-structured conversion).

Key tools: Airbyte (data integration with AI mapping assist), dbt (data transformation), GPT-4o (schema mapping via JSON schema), Matillion

Cost: dbt: open-source, hosting cost only. Airbyte Cloud: usage-based pricing. LLM-assisted schema suggestion: one-time design cost, not ongoing inference cost. Primary ongoing cost is data pipeline infrastructure.

---

## Unstructured-to-Structured Conversion
ID: extraction-transformation-unstructured-to-structured
Category: Extraction & Structuring > Data Transformation & Structuring
Complexity: medium | Phase: inference-time
Modalities: text

When to use: When you have free-text content (clinical notes, emails, survey responses, log messages) that needs to be converted to a structured format (JSON, table, EHR fields) for downstream analytics, storage, or system integration.

When NOT to use: When the source is already structured or semi-structured and a parser suffices. When the target schema is not yet defined — define the schema first or route to attribute extraction for schema discovery. When accuracy requirements are >98% for high-stakes fields (medication doses, financial figures) without human review in the loop.

Key tools: GPT-4o with JSON mode / structured outputs, Claude 3.5 Sonnet, LangChain extraction chains, Azure AI Document Intelligence

Cost: Primarily LLM inference cost. GPT-4o with structured outputs: $0.005–$0.05 per document. Batch API reduces cost ~50%. For clinical NLP at scale, domain-specific models (AWS Comprehend Medical) may be more cost-effective.

---

## AI-Driven Data Enrichment
ID: extraction-transformation-data-enrichment
Category: Extraction & Structuring > Data Transformation & Structuring
Complexity: medium | Phase: inference-time
Modalities: tabular, text

When to use: When existing records lack attributes that can be inferred from available text or external signals — e.g. inferring company industry from a description, imputing missing demographic fields, or appending AI-generated quality scores to product records.

When NOT to use: When enrichment attributes require real-time freshness (stock prices, live inventory) — AI inference is not a substitute for live data feeds. When the attribute being inferred has legal compliance implications (protected characteristics, credit scoring). When deterministic lookup is possible (append industry from a known company database rather than infer it).

Key tools: GPT-4o (inference-based enrichment), Clearbit / Apollo.io (firmographic enrichment APIs), AWS Comprehend (sentiment + keyphrase appending), scikit-learn (imputation models)

Cost: LLM inference: $0.005–$0.05 per record depending on text length. Firmographic APIs: $0.01–$0.10 per record lookup. scikit-learn imputation: near-zero. Cost-benefit depends heavily on the value of the enriched attribute.

---

## Intelligent Format & Encoding Conversion
ID: extraction-transformation-format-conversion
Category: Extraction & Structuring > Data Transformation & Structuring
Complexity: high | Phase: inference-time
Modalities: text

When to use: When converting between formats requires semantic interpretation that rules-based parsers cannot handle — legacy EDI files with non-standard delimiters, domain-specific notations (chemical SMILES, musical MusicXML), or non-standard CSV layouts with merged header rows.

When NOT to use: When the conversion is fully deterministic and a library handles it (e.g. JSON to XML with a 1:1 schema, standard CSV to Parquet). When the format is proprietary and no training data exists for the LLM to learn from. When accuracy of conversion must be formally verifiable.

Key tools: GPT-4o (interpretation + transformation), Claude 3.5 Sonnet, RDKit (chemistry format conversion), music21 (musical notation conversion)

Cost: LLM-based conversion: $0.01–$0.10 per document. Domain-specific libraries (RDKit, music21): free. Primary cost is prompt engineering and validation tooling for the conversion pipeline.

---

## Sentiment & Opinion Analysis
ID: classification-text-sentiment-opinion
Category: Classification & Detection > Text Classification
Complexity: low-medium | Phase: production
Modalities: text

When to use: Customer feedback volumes exceed manual review capacity; need to track brand/product sentiment over time; NPS verbatim tagging at scale; aspect-level breakdown required (e.g., 'delivery was slow but product was great').

When NOT to use: Texts are highly domain-specific with coded language (e.g., legal filings, clinical notes) where 'positive/negative' is not meaningful; sample size is small enough for manual tagging; you need to extract *what* is mentioned, not how it is felt.

Key tools: OpenAI GPT-4o (zero-shot or few-shot via API), HuggingFace Transformers — cardiffnlp/twitter-roberta-base-sentiment-latest, AWS Comprehend (Sentiment + KeyPhrases), Google Natural Language API (Sentiment), Azure AI Language (Sentiment + Opinion Mining)

Cost: Managed APIs (Comprehend, Google NL): ~$0.0001–0.001 per record. Fine-tuned transformer hosted on GPU instance: $0.50–2/hr inference. LLM zero-shot: $0.002–0.01 per 1K tokens (GPT-4o).

---

## Intent & Topic Classification
ID: classification-text-intent-topic
Category: Classification & Detection > Text Classification
Complexity: low-medium | Phase: production
Modalities: text

When to use: Routing inbound messages (support tickets, emails, chat) to correct queue or handler; detecting user intent in a chatbot; tagging articles/documents for downstream filtering; classifying search queries before retrieval.

When NOT to use: You need free-form slot filling or parameter extraction (use NER/entity extraction); classes are not known in advance (use topic modelling, not classification); conversation requires multi-turn context to resolve intent (use dialogue management).

Key tools: OpenAI GPT-4o or GPT-4o-mini (zero-shot with label descriptions), HuggingFace text-classification — facebook/bart-large-mnli (zero-shot NLI), Rasa NLU (for chatbot pipelines), AWS Comprehend Custom Classification, Dialogflow CX (intent detection for voice/chat bots)

Cost: Zero-shot LLM (GPT-4o-mini): ~$0.0002–0.001 per classification. Fine-tuned small model (BERT-base): near-zero marginal cost once hosted. Dialogflow CX: $0.007 per text request.

---

## Content Safety & Moderation
ID: classification-text-content-moderation
Category: Classification & Detection > Text Classification
Complexity: medium | Phase: production
Modalities: text, image, video

When to use: User-generated content platform needs automated first-pass screening; real-time pre-moderation before post goes live; bulk retroactive audits of historical content; detecting spam or phishing in email/messaging pipelines.

When NOT to use: Content is internal business communication where false-positive removal has high cost; legal/regulatory compliance review requires auditable reasoning (a classifier score is not an audit trail); single-language model is applied to a multilingual platform without language-aware routing.

Key tools: OpenAI Moderation API (free, multi-category), AWS Rekognition (image/video moderation) + AWS Comprehend (text), Perspective API by Jigsaw/Google (toxicity scoring), Llama Guard 3 (Meta — open-weight safety classifier, self-hostable), Microsoft Azure AI Content Safety

Cost: OpenAI Moderation API: free. Perspective API: free up to quota. Azure Content Safety: ~$1 per 1K text records. Llama Guard self-hosted: GPU inference cost only (~$0.50–1/hr on A10G).

---

## Document Type & Compliance Classification
ID: classification-text-document-type-compliance
Category: Classification & Detection > Text Classification
Complexity: medium | Phase: production
Modalities: text, document (PDF/image of document)

When to use: Ingesting large volumes of unstructured legal, financial, or medical documents that need routing to different processing pipelines; automating pre-classification before extraction (classify first, then extract fields); compliance tagging at document ingestion for records management.

When NOT to use: Documents are already structured and machine-readable with clear type metadata; you need to extract specific fields or clauses (that is entity extraction, not classification); classes are defined by the business and change frequently (requires retraining overhead).

Key tools: Azure Document Intelligence (Form Recognizer) — prebuilt document type models, AWS Textract + Comprehend Custom Classification pipeline, OpenAI GPT-4o with document text (zero-shot or few-shot for new document types), Llama 3.1 fine-tuned on domain corpus (self-hosted for sensitive documents)

Cost: Azure Document Intelligence: ~$0.001–0.01 per page. AWS Textract: $0.0015 per page (text detection). GPT-4o for long documents: $0.01–0.05 per document depending on length. Self-hosted fine-tuned model: GPU inference cost.

---

## Object Detection & Localisation
ID: classification-visual-object-detection
Category: Classification & Detection > Image & Video Classification
Complexity: medium-high | Phase: production
Modalities: image, video

When to use: Need to locate *where* objects are in an image, not just whether they are present; counting instances (inventory, occupancy); triggering actions based on presence (safety alerts, retail OOS); feeding downstream tracking or measurement pipelines.

When NOT to use: Only whole-image label is needed with no location (use image classification); extremely high-resolution medical images with small anomalies where segmentation is needed; real-time video stream at >30fps with limited edge hardware (latency constraint may preclude cloud inference).

Key tools: YOLOv8 / YOLOv10 (Ultralytics) — fastest for edge/real-time, RT-DETR (Baidu/HuggingFace) — transformer-based, high accuracy, AWS Rekognition (managed, general object detection), Google Cloud Vision API (object localization), Roboflow (dataset management + training platform for custom models)

Cost: Cloud API (Rekognition, Google Vision): $0.001–0.004 per image. YOLOv8 self-hosted on GPU: ~$0.50–1/hr on T4; near-zero on CPU for low throughput. Roboflow Hosted Inference: ~$0.001 per image.

---

## Scene & Image Classification
ID: classification-visual-scene-image-classification
Category: Classification & Detection > Image & Video Classification
Complexity: low-medium | Phase: production
Modalities: image

When to use: Need a single category label for an entire image; medical imaging triage (classify scan type or finding present/absent); e-commerce product image tagging; content moderation of image posts; satellite/aerial imagery land-use classification.

When NOT to use: Multiple distinct objects need to be located within the image (use object detection); pixel-level precision is needed (use segmentation); images contain text that should be read rather than visually classified (use OCR + text classification).

Key tools: Google Cloud Vision API (label detection + safe search), AWS Rekognition (image labeling), OpenAI GPT-4o Vision (zero-shot classification via prompt), HuggingFace Image Classification models — google/vit-base-patch16-224, microsoft/resnet-50, Apple Create ML / CoreML (on-device iOS/macOS deployment)

Cost: Google Vision / AWS Rekognition: $0.001–0.002 per image. GPT-4o Vision: $0.002–0.01 per image (depending on resolution tokens). Self-hosted ViT-base on CPU: feasible at low throughput.

---

## Image Segmentation
ID: classification-visual-segmentation
Category: Classification & Detection > Image & Video Classification
Complexity: high | Phase: production
Modalities: image, video

When to use: Need pixel-level understanding — organ/tumour boundary delineation in medical imaging; autonomous driving scene parsing; removing/replacing backgrounds; counting cells in microscopy; measuring area/volume of defects.

When NOT to use: A bounding box is sufficient for downstream use (segmentation is 3–10x more expensive to annotate and train); real-time low-latency applications where segmentation FPS is too low for the hardware budget; you only need a single image-level label.

Key tools: Segment Anything Model 2 (SAM 2 by Meta — promptable, zero-shot segmentation), Ultralytics YOLOv8-seg (instance segmentation, real-time capable), MONAI (medical image segmentation framework, PyTorch-based), HuggingFace Segmentation models — nvidia/segformer-b0-finetuned-ade-512-512, Roboflow (labeling + training for custom segmentation)

Cost: SAM 2 (self-hosted): GPU required; A10G ~$1.50/hr. Roboflow hosted inference: ~$0.003–0.005 per image. Medical segmentation models (MONAI): significant GPU compute for 3D volumes (CT/MRI). Annotation cost is 5–10x higher than bounding boxes.

---

## Video Action & Event Recognition
ID: classification-visual-video-action-recognition
Category: Classification & Detection > Image & Video Classification
Complexity: high | Phase: production
Modalities: video

When to use: Classifying what is happening over time in a video clip: workout rep counting, sports highlight tagging, retail dwell behaviour analysis, workplace safety compliance monitoring, gesture-based UI control, security event detection.

When NOT to use: A single frame is sufficient for the classification (use image classification; temporal context adds cost without benefit); video is too low resolution or frame rate for action cues to be legible; real-time latency requirements (<100ms) on edge hardware are incompatible with video encoder inference.

Key tools: Google Video Intelligence API (action recognition, content moderation), AWS Rekognition Video (activity detection, person tracking), OpenAI GPT-4o Vision on sampled frames (zero-shot action description), VideoMAE / TimeSformer (HuggingFace, fine-tuneable transformers for video), MediaPipe (Google — pose + gesture classification, runs on-device)

Cost: Google Video Intelligence: ~$0.10 per minute of video. AWS Rekognition Video: $0.10 per minute (stored video). Fine-tuned VideoMAE self-hosted: significant GPU required (A100 for training; A10G for inference). MediaPipe: free, CPU-capable for pose/gesture.

---

## Transaction & Record Categorisation
ID: classification-structured-transaction-categorisation
Category: Classification & Detection > Structured Data Classification
Complexity: low-medium | Phase: production
Modalities: structured data (tabular), text (transaction descriptions)

When to use: Bank/fintech open banking pipelines needing merchant category enrichment; accounting automation (GL coding from transaction descriptions); expense management platforms; purchase order routing; insurance claim type triage.

When NOT to use: Transaction volume is low enough for rules-based regex mapping; categories are unstable and change frequently (high retraining cost); you need to detect *anomalous* transactions rather than categorize normal ones (use anomaly detection).

Key tools: OpenAI GPT-4o-mini (few-shot classification of transaction descriptions), Mastercard/Visa MCC enrichment APIs (merchant category codes via network data), Plaid Transactions API (pre-categorized transaction enrichment for US banking), AWS Comprehend Custom Classification (for proprietary category taxonomies), XGBoost / LightGBM on engineered features (merchant name, amount, day-of-week) — strong baseline for high-volume batch classification

Cost: Plaid enrichment: included in Plaid subscription. GPT-4o-mini: ~$0.00015 per classification. LightGBM: near-zero marginal inference cost once trained. MCC APIs: typically per-call or volume pricing through card networks.

---

## Lead & Customer Segmentation
ID: classification-structured-lead-customer-scoring-segmentation
Category: Classification & Detection > Structured Data Classification
Complexity: medium | Phase: production
Modalities: structured data (CRM attributes, firmographics, engagement signals)

When to use: CRM has sufficient historical data (>1000 labelled customers per tier) to train a segmentation model; ICP scoring needs to go beyond simple rule-based firmographics; customer lifecycle labelling feeds different nurture tracks; account health tiers drive CSM prioritisation.

When NOT to use: Sales cycle is too short or data too sparse for ML (<500 labelled examples per segment — use rules + manual); segments change faster than retraining cadence; the primary output needed is a numeric score rather than a tier label (use prediction/propensity scoring).

Key tools: Salesforce Einstein (built-in lead scoring and segmentation within CRM), HubSpot Predictive Lead Scoring (native ML scoring in HubSpot CRM), Segment + dbt + BigML/Python (data warehouse → ML pipeline), scikit-learn (LogisticRegression, RandomForest for custom tier classifiers), Madkudu (B2B ICP scoring SaaS)

Cost: Native CRM ML (Salesforce/HubSpot): included in enterprise tiers. Madkudu: $500–2000/mo depending on volume. Custom scikit-learn: engineering cost to build + maintain; near-zero inference cost.

---

## Medical & Clinical Coding
ID: classification-structured-medical-clinical-coding
Category: Classification & Detection > Structured Data Classification
Complexity: high | Phase: production
Modalities: text (clinical notes), structured data (EHR fields)

When to use: Automating ICD-10/CPT code suggestions from clinical documentation to reduce coder workload; pre-populating codes for coder review in a HIM workflow; clinical trial eligibility screening from structured/unstructured EHR data; accelerating prior authorization workflows.

When NOT to use: Jurisdiction lacks ICD-10 adoption (use local code system); documentation quality is too poor for reliable extraction (garbage in, garbage out — fix documentation quality first); regulatory environment requires fully auditable human decision trail with no ML influence.

Key tools: AWS HealthLake + Comprehend Medical (HIPAA-eligible NLP for clinical text), Google Cloud Healthcare API (HL7 FHIR, clinical NLP, de-identification), Nuance DAX / PowerScribe (AI-assisted clinical documentation and coding), Aidé Health / Optum NLP (specialized clinical coding vendors), BioBERT / ClinicalBERT (HuggingFace — fine-tuneable clinical NLP base models)

Cost: AWS Comprehend Medical: ~$0.01 per 100 characters. Google Healthcare API: ~$0.01 per FHIR operation or NLP request. Nuance DAX: enterprise SaaS pricing (per-provider/year). ClinicalBERT self-hosted: GPU inference cost.

---

## Product & SKU Attribute Tagging
ID: classification-structured-product-attribute-tagging
Category: Classification & Detection > Structured Data Classification
Complexity: medium | Phase: production
Modalities: structured data (product attributes, descriptions), text

When to use: E-commerce catalogues with millions of SKUs needing consistent taxonomy classification; supplier onboarding where product data arrives in inconsistent formats; customs/logistics pipelines requiring HS code classification; product search and faceted navigation requiring normalized attributes.

When NOT to use: Catalogue is small (<10K SKUs) and manually maintained; attribute taxonomy changes so frequently that model retraining can't keep pace; primary input is a product image rather than structured text (use visual classification pipeline).

Key tools: OpenAI GPT-4o (few-shot product classification with taxonomy prompts), Google Product Taxonomy (standard reference taxonomy for retail), Amazon Product Classifier API (within Amazon Selling Partner API ecosystem), Structured product classification fine-tuned on DeBERTa / RoBERTa (HuggingFace), Akeneo PIM with AI enrichment plugins (product information management + classification)

Cost: GPT-4o-mini few-shot: ~$0.0002–0.001 per SKU. Fine-tuned transformer: near-zero marginal inference once hosted. Akeneo: enterprise SaaS pricing. HS code classification vendors: typically volume-based API pricing ($0.01–0.05 per classification).

---

## Real-Time Transaction Fraud Detection
ID: classification-anomaly-fraud-transaction-detection
Category: Classification & Detection > Anomaly & Fraud Detection
Complexity: high | Phase: production
Modalities: structured data (transaction records, behavioral signals)

When to use: Payment card or digital wallet transactions need real-time fraud scoring at authorization time (<100ms latency); new account and account-takeover fraud detection on login or high-risk actions; e-commerce order fraud screening before fulfillment; insurance claim triage for fraud indicators.

When NOT to use: Transaction volume is too low for ML to outperform rules (a fraud team with <1000 labeled fraud cases should start with rules + review queues); latency SLA is incompatible with model inference (pure rules are faster); you need to explain every decision to a regulator (black-box models require explainability layer).

Key tools: Stripe Radar (built-in ML fraud scoring for Stripe transactions), AWS Fraud Detector (managed service, custom ML fraud models), Featurespace ARIC (behavioral analytics + adaptive ML for financial services), XGBoost / LightGBM with feature engineering on transaction history (strong open-source baseline), Databricks Feature Store + MLflow (enterprise ML platform for real-time feature serving)

Cost: Stripe Radar: included in Stripe fees + $0.02/transaction for custom rules tier. AWS Fraud Detector: ~$0.0075 per prediction. Featurespace ARIC: enterprise pricing. Self-built XGBoost: engineering + infrastructure cost; near-zero marginal inference.

---

## Operational & System Anomaly Detection
ID: classification-anomaly-operational-anomaly-detection
Category: Classification & Detection > Anomaly & Fraud Detection
Complexity: medium-high | Phase: production
Modalities: structured data (metrics, logs, time series)

When to use: IT monitoring where normal behavior is defined but anomalies are rare/unknown; manufacturing sensor streams where defect signatures aren't fully catalogued; cloud cost spikes need alerting without manual threshold setting; network intrusion detection where attack patterns evolve faster than rule updates.

When NOT to use: Failure modes are well-catalogued and labeled (use supervised classification with known classes); log volume exceeds feasible processing budget for online ML inference; you need root cause analysis, not just anomaly flagging (anomaly detection identifies *that* something is wrong, not *why*).

Key tools: Elastic (ELK Stack) with ML anomaly detection jobs, Datadog Watchdog (automated anomaly detection on metrics/logs), AWS CloudWatch Anomaly Detection (metric anomaly detection with ML baselines), Isolation Forest / LSTM Autoencoder (scikit-learn / PyTorch — open-source baselines), Splunk IT Service Intelligence (ITSI) with adaptive thresholding

Cost: Datadog Watchdog: included in Datadog Infrastructure subscription. AWS CloudWatch Anomaly Detection: ~$0.10 per evaluated metric per month. Elastic ML jobs: included in Platinum+ license. Self-built Isolation Forest: near-zero inference cost.

---

## Quality & Defect Detection
ID: classification-anomaly-quality-defect-detection
Category: Classification & Detection > Anomaly & Fraud Detection
Complexity: high | Phase: production
Modalities: image, structured data (tabular, time series), text

When to use: Manufacturing visual inspection where defect classes are not fully catalogued upfront; detecting data quality issues in pipelines without labeled 'bad data' examples; code quality anomalies in CI pipelines based on learned normal commit patterns; content quality flagging when 'bad' is hard to define but 'normal' is abundant.

When NOT to use: Defect classes are fully labeled with sufficient training examples per class (supervised detection with object detection/segmentation will outperform anomaly detection); the definition of 'normal' is too heterogeneous (e.g., highly varied product lines) for a single baseline model; real-time latency with high throughput exceeds anomaly model inference budget.

Key tools: PatchCore / PaDiM (state-of-the-art unsupervised visual anomaly detection — MVTec benchmark leaders), MVTec Anomaly Detection Dataset + benchmark (reference dataset for industrial inspection), PyOD (Python Outlier Detection library — 40+ algorithms for tabular anomaly detection), Autoencoder / VAE (PyTorch — learn normal distribution, flag high reconstruction error), AWS Lookout for Vision (managed visual anomaly detection for manufacturing)

Cost: AWS Lookout for Vision: ~$0.018 per image inference. PatchCore self-hosted: GPU required for embedding extraction; inference is fast. PyOD: near-zero marginal cost once trained. Annotation cost advantage: only need 'normal' examples (no defect labeling required for training).

---

## Healthcare & Clinical Outlier Detection
ID: classification-anomaly-healthcare-clinical-outlier
Category: Classification & Detection > Anomaly & Fraud Detection
Complexity: high | Phase: production
Modalities: structured data (EHR, vitals, lab results), text (clinical notes)

When to use: ICU/ward monitoring for early warning system (EWS) triggers on abnormal vitals patterns; pharmacovigilance surveillance for unusual medication order patterns; clinical trial site monitoring for protocol deviations; population health surveillance for rare disease cluster detection; medical billing audit for upcoding patterns.

When NOT to use: Patient risk *prediction* is the goal (use supervised prediction models — readmission, sepsis risk); anomaly is already well-defined by a clinical rule (e.g., a lab value outside reference range is a rule, not ML); regulatory environment requires fully auditable decision logic (ML anomaly scores are harder to defend than rule-based alerts).

Key tools: AWS HealthLake (FHIR data store with ML integration, HIPAA-eligible), Google Cloud Healthcare API (FHIR + DICOMweb + clinical NLP), MIMIC-III/IV (benchmark clinical dataset for model development — requires PhysioNet access), Isolation Forest / HBOS on EHR feature vectors (scikit-learn — tabular clinical outlier detection), Epic Deterioration Index / Sepsis Prediction (built-in EHR alerting — not a standalone tool but the incumbent in clinical settings)

Cost: AWS HealthLake: ~$0.023/GB/month + query costs. Google Healthcare API: ~$0.01 per FHIR operation. MIMIC access: free (academic). Epic built-in models: included in Epic license. Custom Isolation Forest: near-zero inference cost once feature pipeline is built.

---

## Speaker Identification & Verification
ID: classification-audio-speaker-identification
Category: Classification & Detection > Audio & Speech Classification
Complexity: medium-high | Phase: production
Modalities: audio

When to use: Call centre analytics requiring automatic agent/customer diarisation; voice biometric authentication for phone banking or IVR; multi-party meeting transcription with speaker attribution; podcast/media production requiring speaker-tagged transcripts.

When NOT to use: You need to know *what* was said, not who said it (use speech-to-text/ASR); speaker identity is not meaningful for the downstream task; audio quality is too poor (high noise, heavy compression) for speaker embedding reliability; biometric voice authentication in high-security contexts where voice spoofing risk is unacceptable without liveness detection.

Key tools: pyannote.audio (open-source speaker diarisation — state of the art on AMI/VoxConverse benchmarks), AWS Transcribe (speaker diarisation via diarization_config parameter), Google Cloud Speech-to-Text (diarization_speaker_count parameter), AssemblyAI (transcription + diarisation as a service), SpeechBrain (HuggingFace — speaker verification, x-vectors, ECAPA-TDNN)

Cost: AWS Transcribe: ~$0.024 per minute. Google Cloud STT: ~$0.016 per minute. AssemblyAI: ~$0.012 per minute. pyannote.audio self-hosted: CPU feasible for offline; GPU recommended for real-time.

---

## Sound Event & Environmental Classification
ID: classification-audio-sound-event-classification
Category: Classification & Detection > Audio & Speech Classification
Complexity: medium | Phase: production
Modalities: audio

When to use: Smart building/home devices monitoring for security events (glass break, smoke alarm); industrial machinery health monitoring via acoustic signature; wildlife biodiversity monitoring (passive acoustic monitoring for species identification); retail ambient analytics; elder care monitoring for fall/distress sounds.

When NOT to use: Audio contains speech that should be understood (use ASR + text classification); you need to identify who is speaking (use speaker identification); music is the input (use music classification); real-time edge deployment with <50ms latency exceeds acoustic model inference time.

Key tools: YAMNet (Google — AudioSet pretrained, 521 sound event classes, TensorFlow/TFLite), PANNs (Pretrained Audio Neural Networks — CNN14, ResNet38, trained on AudioSet), BirdNET (Cornell Lab of Ornithology — bird species identification from audio), Google AudioSet (large-scale dataset + weakly supervised labels for pretraining), SoundEvent detection libraries in librosa + PyTorch Audio (custom model training)

Cost: YAMNet / PANNs self-hosted: CPU-feasible for most use cases; TFLite version runs on-device. BirdNET: free API for non-commercial use. No major managed cloud API specifically for generic sound event classification (Google/AWS route through their general audio APIs).

---

## Speech Emotion & Intent Classification
ID: classification-audio-speech-emotion-intent
Category: Classification & Detection > Audio & Speech Classification
Complexity: medium-high | Phase: production
Modalities: audio, text (when combined with transcription)

When to use: Contact centre quality analytics where detecting caller frustration/distress drives supervisor escalation; voice UI / IVR where spoken tone modifies response strategy; mental health app monitoring for emotional state changes over sessions; compliance monitoring for agent tone/engagement.

When NOT to use: Text transcript is available and sufficient (text-based sentiment is cheaper and more accurate for most use cases); audio quality is too degraded for prosodic feature extraction (heavy compression, background noise); emotion labels are culturally specific and the model was trained on a different demographic.

Key tools: OpenAI Whisper + GPT-4o (transcribe then classify — hybrid acoustic+linguistic approach), AWS Contact Lens (call centre emotion and sentiment analysis, integrated with Connect), Google CCAI Insights (Contact Center AI — sentiment + entity on call transcripts), SpeechBrain / wav2vec2-based emotion classifiers (HuggingFace — fine-tuneable), Hume AI (multimodal emotion API including prosodic features)

Cost: AWS Contact Lens: ~$0.011 per minute of analyzed speech. Google CCAI Insights: ~$0.006 per minute. Hume AI: API pricing based on volume. Whisper + GPT-4o hybrid: ~$0.006/min (Whisper) + LLM cost.

---

## Music & Audio Content Classification
ID: classification-audio-music-classification
Category: Classification & Detection > Audio & Speech Classification
Complexity: medium | Phase: production
Modalities: audio

When to use: Streaming platform playlist curation needing mood/energy classification; music licensing systems requiring genre and instrument tagging; copyright identification and fingerprint matching; DJ tooling needing BPM/key detection; content ingestion pipelines for broadcast needing speech/music/silence segmentation.

When NOT to use: Primary task is music *generation* (a completely different capability); audio contains speech that needs to be understood (use ASR); you need to identify a specific recording (use acoustic fingerprinting, not classification).

Key tools: AcoustID + Chromaprint (open-source audio fingerprinting for copyright/duplicate detection), Essentia (Music Information Retrieval library by MTG Barcelona — BPM, key, mood, genre), librosa (Python MIR library — spectral features, beat tracking, onset detection), Spotify Audio Analysis API (BPM, key, energy, valence — via track analysis endpoint), Suno / ACRCloud (music fingerprinting and recognition API)

Cost: Essentia / librosa: open-source, CPU-feasible, near-zero inference cost. AcoustID / Chromaprint: free for non-commercial. ACRCloud: API pricing ~$0.001–0.003 per recognition. Spotify Audio Analysis: included in API quota (for indexed tracks only — not arbitrary audio).

---

## Demand & Inventory Forecasting
ID: prediction-time-series-demand-inventory-forecast
Category: Prediction & Forecasting > Time Series Forecasting
Complexity: medium | Phase: production
Modalities: tabular time series, structured transactional data

When to use: You have historical sales or order data at SKU or location level, need to drive automated replenishment or staffing decisions, and can tolerate a forecast horizon of days to weeks. Data is available at regular intervals with at least 1-2 years of history.

When NOT to use: Demand is driven primarily by marketing events or promotions not captured in history. SKU count is very small (<20 items) or history is <6 months. Real-time inventory is not integrated — forecasts won't be actioned. Financial price forecasting or behavioural propensity is the real need.

Key tools: Amazon Forecast (managed AWS service, handles cold start, probabilistic forecasts), NeuralForecast / StatsForecast (Nixtla, open-source, NBEATS/NHITS/TFT), Prophet (Meta, open-source, interpretable seasonal decomposition), LightGBM with lag features (tabular baseline, often competitive with deep learning), Azure Machine Learning AutoML for Forecasting (managed Azure pipeline)

Cost: Medium compute for training; inference is cheap per SKU. Amazon Forecast charges per generated forecast (~$0.60 per 1k forecasts) and per data record ingested — significant at millions of SKUs. Open-source Nixtla stack is free; requires MLOps infrastructure investment.

---

## Financial & Market Time Series Forecasting
ID: prediction-time-series-financial-market
Category: Prediction & Forecasting > Time Series Forecasting
Complexity: high | Phase: production
Modalities: tabular time series, structured financial data

When to use: You need probabilistic forecasts of macro indicators, revenue, or credit portfolio losses over a defined horizon for planning, stress-testing, or regulatory reporting. Input features are structured financial time series with reasonable stationarity or known regime structure.

When NOT to use: Goal is high-frequency trading alpha generation (requires specialist quant infra, not standard ML pipelines). Data is non-financial behavioural or physical sensor data. Regulatory context is unclear — many jurisdictions restrict model-driven credit decisions without explainability requirements.

Key tools: Darts (Unit8, Python, supports ARIMA/TCN/TFT/NBEATS for financial series), GluonTS (Amazon, deep learning time series, probabilistic forecasting), Temporal Fusion Transformer (TFT) via PyTorch Forecasting library, Bloomberg / Refinitiv Eikon data APIs (market data sourcing, not modelling)

Cost: High complexity cost — financial data licensing is often the largest cost (Bloomberg terminal ~$24k-$27k/year; Refinitiv Eikon ~$22k+/year). Model training is moderate compute. Regulatory MRM overhead adds significant hidden cost.

---

## Energy & Utilities Load Forecasting
ID: prediction-time-series-energy-utilities
Category: Prediction & Forecasting > Time Series Forecasting
Complexity: medium | Phase: production
Modalities: tabular time series, sensor/IoT data, weather covariate data

When to use: You have metered consumption or generation data at regular intervals (hourly/15-min), need to plan grid dispatch, capacity, or resource scaling decisions, and weather/seasonal drivers are available as covariates.

When NOT to use: Data granularity is daily or coarser and planning horizon is >1 year (moves into scenario planning, not ML forecasting). No weather covariate data is available and demand is heavily weather-dependent. The real need is anomaly detection on sensor readings, not trajectory forecasting.

Key tools: NeuralForecast (Nixtla, NBEATS/NHITS/TFT — strong performers on energy data), Prophet with custom seasonalities (interpretable, suited for utility reporting), GluonTS / DeepAR (probabilistic forecasting, AWS-native for cloud deployments), TensorFlow Extended (TFX) for production ML pipeline orchestration at grid scale

Cost: Moderate. Smart meter data volumes can be large (millions of meters x 15-min intervals). Cloud storage and compute costs meaningful at grid scale. NeuralForecast/GluonTS are open-source; AWS SageMaker/Azure ML add managed compute costs.

---

## Predictive Maintenance & Equipment Health
ID: prediction-time-series-predictive-maintenance
Category: Prediction & Forecasting > Time Series Forecasting
Complexity: high | Phase: production
Modalities: sensor/IoT time series, structured tabular (equipment metadata), event logs

When to use: Equipment has continuous sensor instrumentation (vibration, temperature, pressure), historical failure events are labeled, and the cost of unplanned downtime or false-negative misses significantly exceeds planned maintenance cost.

When NOT to use: Sensor data is sparse (<1 reading/hour) or failure events are extremely rare (<5 per asset type in history) — insufficient signal to train. Equipment has fixed-interval regulatory maintenance requirements that cannot be modified regardless of predicted health. Real-time anomaly flagging without a time-to-failure horizon is the actual need.

Key tools: tsfresh (Python, automated feature extraction from sensor/time series streams), tslearn (Python, ML toolkit for time series — DTW, clustering, classification), Azure AI predictive maintenance sample templates (end-to-end Azure ML template on GitHub), LSTM / Transformer models via PyTorch (custom RUL regression), AWS IoT SiteWise (sensor data ingestion + built-in anomaly detection)

Cost: High total cost — sensor infrastructure and data historian (OSIsoft PI, Ignition) setup often dominates ML cost. Model training is moderate. Edge inference deployments add hardware cost. AWS IoT SiteWise charges per data point ingested.

---

## Churn & Retention Propensity Modelling
ID: prediction-propensity-churn-retention
Category: Prediction & Forecasting > Propensity & Behavioral Modeling
Complexity: medium | Phase: production
Modalities: tabular (behavioral features), event sequences

When to use: You have longitudinal customer/subscriber records with behavioral signals (logins, feature usage, support contacts, payment history), can identify churn events in history, and have a retention intervention you can act on within the model's scoring window.

When NOT to use: You have no actionable retention lever — scoring without intervention capability is vanity analytics. Churn rate is very low (<2%) and the population is small — insufficient positive examples. The real problem is product-market fit (no ML model fixes that). B2B accounts with bespoke contracts require qualitative account management, not propensity scores.

Key tools: XGBoost / LightGBM (industry standard gradient boosting for tabular churn data), Scikit-learn (preprocessing pipelines, logistic regression baseline), PyMC-Marketing (probabilistic BG/NBD and Pareto/NBD models for non-contractual churn), Feast (open-source feature store for productionising recurring scoring pipelines), Tecton (managed feature platform for real-time feature serving)

Cost: Low-to-medium. Gradient boosting models train quickly on standard tabular data. Feast is open-source; Tecton is enterprise-priced (~$50k+/year). Main cost is feature engineering infrastructure and the intervention system consuming scores.

---

## Conversion & Purchase Propensity Modelling
ID: prediction-propensity-conversion-purchase
Category: Prediction & Forecasting > Propensity & Behavioral Modeling
Complexity: low-to-medium | Phase: production
Modalities: tabular (session/behavioral features), event sequences, CRM data

When to use: You have a defined conversion funnel with observable behavioral signals (page views, clicks, session depth, prior purchases), sufficient conversion events to train (typically >1000 positives), and can personalise outreach or ranking in response to scores.

When NOT to use: Funnel events are too sparse (new product, small user base). Conversion is entirely driven by price — a propensity model won't improve on a discount strategy. The real need is product recommendation ranking (collaborative filtering), not purchase intent scoring.

Key tools: XGBoost / LightGBM (tabular conversion prediction standard), Google Analytics 4 Predictive Audiences (built-in purchase propensity, no-code), Salesforce Einstein Lead Scoring (CRM-native B2B conversion scoring), Meta Conversion API + Advantage+ (platform-native propensity for paid media)

Cost: Low for platform-native solutions (GA4 Predictive Audiences included in GA4 360 ~$50k/year; Salesforce Einstein included in Sales Cloud tiers). Custom models require MLOps investment. Inference is cheap — scores computed on daily batch or on-demand.

---

## Clinical & Health Risk Propensity Modelling
ID: prediction-propensity-health-clinical-risk
Category: Prediction & Forecasting > Propensity & Behavioral Modeling
Complexity: high | Phase: production — requires clinical validation and governance
Modalities: tabular (EHR structured data), claims data, lab results time series

When to use: You have EHR or claims data with sufficient labeled outcomes (readmissions, disease onset events), a clinical workflow exists to act on risk scores (care management outreach, screening pathway activation), and governance/IRB approval is in place.

When NOT to use: No downstream clinical action is defined — risk score without care pathway is unethical and wasteful. Model will be used to deny care or ration resources without human review (high-risk automated decision context). Objective is diagnostic classification of an existing condition (route to Classification). Insurance underwriting pricing is the primary use (route to Credit & Underwriting Risk Scoring).

Key tools: Scikit-learn / XGBoost (tabular EHR feature modelling), Google Health AI / Vertex AI Medical Imaging (for imaging-adjacent risk scoring), MIMIC-III/IV benchmark baselines (community reference implementations in Python), HL7 FHIR R4 standard (data extraction from EHR — Epic, Cerner, Oracle Health)

Cost: High governance cost — IRB approval, data use agreements, clinical validation studies. Compute for training is moderate. Epic/Cerner EHR integration requires specialist implementation. MIMIC data is free with credentialed PhysioNet access.

---

## Engagement & Response Propensity Modelling
ID: prediction-propensity-engagement-response
Category: Prediction & Forecasting > Propensity & Behavioral Modeling
Complexity: low-to-medium | Phase: production
Modalities: tabular (engagement history, recency/frequency features), email metadata

When to use: You have send/open/click history for a communication channel, want to personalise send timing or content to improve response rates, and have sufficient volume to train per-user or per-segment models (typically >10k recipients with response history).

When NOT to use: List is too small or too new to have meaningful response history. The underlying content is so poor that no targeting will improve response — fix the content first. Real need is A/B testing to improve baseline creative, not scoring.

Key tools: Braze Predictive Suite (send-time optimisation, engagement likelihood — platform native), Klaviyo predictive analytics (built-in churn risk and CLV, e-commerce focused), XGBoost / LightGBM on extracted email engagement features (custom modelling), EconML (Microsoft, uplift/causal modelling for measuring incremental response), CausalML (Uber, uplift modelling toolkit)

Cost: Low for platform-native tools (included in Braze/Klaviyo subscription pricing). Braze enterprise starts ~$60k/year. Custom models add MLOps overhead. Main cost is experiment infrastructure to validate incremental lift vs baseline.

---

## Credit & Underwriting Risk Scoring
ID: prediction-risk-credit-underwriting
Category: Prediction & Forecasting > Risk Scoring & Assessment
Complexity: high | Phase: production — regulated
Modalities: tabular (credit bureau data, transactional history, alternative data)

When to use: You are assessing creditworthiness or insurability at origination or renewal, have historical loan/policy performance data with default/claim outcomes, and operate in a regulated lending or insurance context requiring documented model governance.

When NOT to use: You need real-time transaction fraud scoring (route to Anomaly Detection). Customer base lacks sufficient default history (<500 observed defaults). Regulatory jurisdiction prohibits use of alternative data sources you plan to use (e.g., social media data is prohibited in EU credit scoring).

Key tools: scorecardpy (Python WOE/IV scorecard library, open-source), XGBoost with SHAP explanations (gradient boosting + explainability for MRM compliance), H2O AutoML (automated feature selection and ensemble for credit modelling), Zest AI (specialist credit ML platform with regulatory workflow support), Provenir (cloud-native credit decisioning platform)

Cost: High compliance cost — FCRA, ECOA (US); PRA/FCA model risk requirements (UK). Credit bureau data (Experian, Equifax, TransUnion) costs per-inquiry. Zest AI enterprise pricing ~$100k-$500k/year. Open-source scorecardpy is free but requires MRM infrastructure.

---

## Operational & Compliance Risk Scoring
ID: prediction-risk-operational-compliance-risk
Category: Prediction & Forecasting > Risk Scoring & Assessment
Complexity: high | Phase: production — regulated
Modalities: tabular (transaction data, entity attributes), graph/network data, document text (NLP for contract risk)

When to use: You need to score entities (transactions, vendors, documents, customers) for regulatory or operational risk as part of a compliance workflow — AML, KYC, third-party due diligence, contract risk review — and can route high-risk flags to human review.

When NOT to use: Real-time payment fraud scoring at millisecond latency (route to Anomaly Detection — different latency and feature requirements). Risk scores will be used for automated decisions without human-in-the-loop review in a regulated context (high regulatory risk). The scope is pure financial market risk (route to Financial Market Time Series).

Key tools: Quantexa (entity resolution + network risk scoring for AML — enterprise platform), ComplyAdvantage (real-time sanctions/PEP/adverse media screening — API-based), LSEG World-Check / Refinitiv (sanctions and PEP screening data provider), Microsoft Purview (data classification and compliance risk, formerly Azure Purview), XGBoost + graph features (network centrality + entity linkage for custom AML models)

Cost: High. Enterprise AML platforms (Quantexa, NICE Actimize) are $500k+/year. ComplyAdvantage API pricing starts ~$500/month for SMB tiers. LSEG World-Check enterprise licensing is $50k+/year. False positive management drives significant operational cost.

---

## Project & Delivery Risk Scoring
ID: prediction-risk-project-delivery-risk
Category: Prediction & Forecasting > Risk Scoring & Assessment
Complexity: medium | Phase: emerging — most orgs use heuristics
Modalities: tabular (project metadata, milestone history, team metrics), time series (burn rate, velocity trends)

When to use: You have a portfolio of projects with historical delivery data (planned vs. actual schedule, cost, team composition, dependency complexity), need to prioritise management attention or resource allocation based on predicted delivery risk, and projects are sufficiently similar to pool training data.

When NOT to use: Portfolio is too small (<50 historical projects) to build a credible model — use expert elicitation instead. Each project is so unique (bespoke enterprise transformation) that feature generalization breaks down. Real need is portfolio reporting and RAG status tracking, not predictive scoring.

Key tools: Scikit-learn / XGBoost (tabular project feature modelling with historical delivery data), Planview Predictive Services (PPM platform-native risk scoring — enterprise), LinearB (software delivery metrics + ML-driven engineering insights), DORA metrics pipelines (deployment frequency, change fail rate as predictive features)

Cost: Low-to-medium. Model training on internal PM tool data is cheap. Planview is enterprise-priced ($50k+/year). LinearB pricing starts ~$25/user/month. Platform-native risk scoring is included in existing PPM subscriptions (Planview, Clarity).

---

## Safety & Environmental Risk Scoring
ID: prediction-risk-safety-environmental-risk
Category: Prediction & Forecasting > Risk Scoring & Assessment
Complexity: medium-to-high | Phase: emerging
Modalities: tabular (incident history, site attributes), geospatial/satellite data, weather/climate data

When to use: You have structured incident history, environmental monitoring data, or property/location attributes, need to proactively identify high-risk sites or conditions before incidents occur, and have an intervention or inspection resource to direct based on scores.

When NOT to use: Incident history is too sparse to train a data-driven model — use domain expert checklists and fault tree analysis. Regulatory context requires deterministic risk assessments (many environmental permits require approved quantitative models, not ML). Real need is real-time sensor anomaly alerting (route to Anomaly Detection).

Key tools: Scikit-learn / XGBoost (tabular incident/inspection/sensor data modelling), Jupiter Intelligence (climate risk analytics platform for assets and portfolios), Cervest / Terrascope (climate physical risk scoring platform), EPA ECHO database + open geospatial APIs (US environmental compliance open data), QGIS / GeoPandas (geospatial feature engineering for flood, climate, hazard risk)

Cost: Moderate. Open geospatial and climate data (ERA5, NOAA, EPA ECHO) are free. Commercial climate risk platforms (Jupiter Intelligence) cost $50k-$500k/year depending on portfolio size. ESG reporting integration adds consulting overhead.

---

## Price & Valuation Estimation
ID: prediction-regression-price-valuation
Category: Prediction & Forecasting > Regression & Quantitative Estimation
Complexity: medium | Phase: production
Modalities: tabular (asset attributes, comparable transactions), geospatial data, image data (property condition)

When to use: You need to estimate a fair market value or cost for a large volume of assets where individual expert appraisal is too slow or expensive (real estate, auto, insurance claims), and comparable transaction data is available at sufficient volume.

When NOT to use: Asset market is too thin — fewer than a few hundred comparable transactions in the relevant geography/period. Valuation context is litigation or regulatory use where a signed professional appraisal is legally required. Dynamic real-time pricing optimisation is the goal (route to Optimisation).

Key tools: XGBoost / LightGBM (tabular AVM — industry standard for real estate and auto valuation), CoreLogic AVM (off-the-shelf automated valuation for US real estate — commercial data + model), Hedonic regression via scikit-learn (interpretable baseline, required for regulatory contexts), GeoPandas + OpenStreetMap features (geospatial feature engineering for property models), Zillow Zestimate (US residential estimates — methodology is public, limited API access)

Cost: Medium. Training data (MLS comparables, deed records) has licensing costs (CoreLogic data: $10k-$100k+/year). Compute for training is moderate. CoreLogic/Zillow commercial APIs charge per-query (~$0.05-$2/query at scale).

---

## Customer Lifetime Value & Revenue Estimation
ID: prediction-regression-clv-revenue-estimation
Category: Prediction & Forecasting > Regression & Quantitative Estimation
Complexity: medium | Phase: production
Modalities: tabular (transaction history, subscription data, RFM features)

When to use: You want to segment customers by long-run economic value for acquisition budget allocation, pricing decisions, or retention prioritisation, and have at least 12-24 months of transaction or subscription history to calibrate lifetime estimates.

When NOT to use: Business is pre-product-market-fit with fewer than 6 months of customer history — insufficient data for lifetime estimation. The goal is short-horizon revenue forecasting at aggregate level (route to Financial Time Series Forecasting). Real-time personalisation based on next purchase probability is the need (route to Conversion Propensity).

Key tools: PyMC-Marketing (BG/NBD + Gamma-Gamma CLV — probabilistic, best for non-contractual), Lifetimes library (Python, BG/NBD and related models — lightweight alternative to PyMC-Marketing), XGBoost regression (direct CLV regression for contractual SaaS contexts), dbt + SQL (feature pipelines for RFM and cohort-based CLV at data warehouse layer)

Cost: Low. BG/NBD and regression models train quickly on standard tabular transaction data. PyMC-Marketing and Lifetimes are open-source. Main investment is cohort tracking infrastructure and data pipeline.

---

## Resource & Cost Estimation
ID: prediction-regression-resource-cost-estimation
Category: Prediction & Forecasting > Regression & Quantitative Estimation
Complexity: medium | Phase: production — but often combined with expert judgment
Modalities: tabular (project parameters, historical actuals), structured documents (scope descriptions for NLP feature extraction)

When to use: You have historical project or operational data with actual effort/cost outcomes, need to estimate cost at project initiation or for bid pricing, and have enough historical comparable projects (typically 100+ for ML; 30+ for regression baselines) to train from.

When NOT to use: Project or task is highly novel with no historical comparables. The goal is budget variance monitoring on in-flight projects (route to time series anomaly detection). Legal/contractual fixed-price commitment requires a defensible expert estimate, not a model output.

Key tools: Scikit-learn linear/polynomial regression (transparent baseline, preferred for explainability), XGBoost / Random Forest regression (handles non-linear relationships in project features), COCOMO II (parametric software cost estimation model, widely used baseline), Monte Carlo simulation via Python scipy/numpy or @RISK (Lumivero) for uncertainty quantification

Cost: Low-to-medium. Training on internal historical project data. Main cost is data collection (historical actuals are often inconsistently recorded). @RISK (Lumivero) commercial license ~$1,500-$3,000/year; Python scipy is free.

---

## Performance & Outcome Prediction
ID: prediction-regression-performance-outcome-prediction
Category: Prediction & Forecasting > Regression & Quantitative Estimation
Complexity: medium | Phase: production — with ethical review
Modalities: tabular (historical performance features), time series (longitudinal academic/athletic records)

When to use: You have historical outcome data with measurable numeric targets (test scores, treatment effect sizes, athletic metrics), features observed prior to the outcome, and a decision or intervention that the prediction will inform (resource allocation, eligibility, treatment selection).

When NOT to use: Outcome is binary pass/fail (route to Classification or Propensity). The real need is measuring the causal effect of an intervention (route to causal inference / A/B testing framework). Outcome is highly subjective and rater-dependent — model will encode rater biases as ground truth.

Key tools: XGBoost / LightGBM regression (tabular outcome prediction — performance, grades, KPIs), Scikit-learn (preprocessing pipelines, model selection, Fairlearn for fairness evaluation), Stan / PyMC (Bayesian hierarchical models for student/team outcomes with group structure), EconML (Microsoft, causal/treatment effect estimation when intervention effect is needed), DoWhy (Microsoft, causal reasoning and effect identification library)

Cost: Low-to-medium. Training data collection and labeling is often the main cost (ground truth outcome labels require time to accumulate). Compute is cheap. Fairness auditing (Fairlearn) and stakeholder communication add overhead.

---

## Enterprise Knowledge & Document Search
ID: search-semantic-enterprise-knowledge-search
Category: Search & Retrieval > Semantic Search
Complexity: medium | Phase: retrieval
Modalities: text

When to use: When employees waste time hunting for policies, procedures, or internal documentation across multiple systems (Confluence, SharePoint, Notion, Google Drive). Especially valuable when onboarding volume is high, documentation is scattered, or teams duplicate effort because they can't find existing work.

When NOT to use: When the corpus is small enough that a well-maintained wiki with good navigation suffices. Avoid if you need generated answers from the documents — that's RAG (internal knowledge assistant). Don't use if the primary problem is that documentation is out of date or missing rather than unfindable.

Key tools: Elasticsearch / OpenSearch with dense vector (kNN, GA in ES 8.x+, bbq_hnsw default in 9.x), Azure AI Search with semantic ranker (native SharePoint/M365 connector, 1,000 free semantic queries/month), Glean (enterprise search SaaS: Search, Assistant, Agents, 100+ connectors, enterprise-pricing), Coveo (enterprise search with ML relevance tuning, cloud-SaaS), Weaviate / Qdrant (self-hosted vector store with HNSW)

Cost: Azure AI Search semantic ranker: free tier 1,000 queries/month, then pay-as-you-go per 1,000 requests. Glean: enterprise contract (demo-gated pricing). Coveo: enterprise SaaS subscription. Elasticsearch self-hosted: infra + engineering; Elastic Cloud managed: usage-based. Main ongoing cost driver is re-embedding on document changes.

---

## Product & Catalogue Semantic Search
ID: search-semantic-product-catalogue-search
Category: Search & Retrieval > Semantic Search
Complexity: medium | Phase: retrieval
Modalities: text, text+image (extended: see visual search node)

When to use: When customers describe what they want in natural language but the product catalogue uses technical SKU names, category codes, or attribute jargon that keyword search can't bridge. High-value for fashion ('something casual for a summer wedding'), B2B parts ('hex bolt that fits M8 thread for outdoor use'), and marketplace listing search where seller descriptions vary widely.

When NOT to use: When shoppers know exactly what they want and search by SKU, brand+model, or exact product name — keyword search is faster and more predictable. Avoid pure semantic if your product attributes are sparse or inconsistent; the embedding will amplify noise. Don't conflate with personalised recommendation — if user history matters more than the current query, use a recommendation system.

Key tools: Algolia AI Search / NeuralSearch (hybrid keyword + semantic; NeuralSearch on Elevate/annual plan only), Elasticsearch with ELSER v2 (learned sparse retrieval, GA, approx 90% throughput improvement over v1), Coveo Commerce (ML relevance for e-commerce, enterprise SaaS), OpenSearch with k-NN plugin (open source, AWS-managed option), Vespa.ai (query-time ML re-ranking, open source, active)

Cost: Algolia Grow plan: $0.50/1k searches + $0.40/1k records after free tier. Grow Plus: $1.75/1k searches. NeuralSearch requires Elevate (annual contract, custom pricing). Elasticsearch ELSER v2 self-hosted is infra cost; Elastic Cloud is usage-based. Vespa.ai is fully open source, infra cost only.

---

## Code & Repository Semantic Search
ID: search-semantic-code-repository-search
Category: Search & Retrieval > Semantic Search
Complexity: high | Phase: retrieval
Modalities: code, text→code cross-modal

When to use: When developers spend significant time finding existing utilities, understanding how a library is used across a large codebase, or matching a bug report to the likely code location. Especially valuable in large mono-repos (millions of LOC) where keyword grep misses intent ('find the function that validates email addresses' vs. grepping for 'email').

When NOT to use: When the codebase is small (< 100k LOC) and developers have good IDE navigation. Not appropriate if the goal is to generate new code (code generation node) or answer questions about code with synthesised explanations (RAG). Avoid for security-sensitive codebases where exposing code to external embedding APIs is a compliance risk.

Key tools: Sourcegraph Enterprise Search (semantic + structural + Deep Search agentic NL search; $49/user/month cloud), GitHub Copilot Workspace / GitHub semantic search (cloud, GitHub-hosted repos, per-seat), Codeium / Continue.dev with local embedding index (IDE-native, free tier available), tree-sitter + custom embedding pipeline (DIY, AST-aware chunking at function/class boundaries), Voyage AI voyage-code-3 (code embedding model, 32k token context, $0.18/million tokens)

Cost: Sourcegraph Enterprise: $49/user/month (single-tenant cloud). GitHub Copilot: per-seat subscription. Codeium: free tier for individuals. Voyage AI code-3 embeddings: $0.18/million tokens (200M free tokens/month). DIY tree-sitter pipeline: engineering time upfront + vector store infra cost.

---

## Legal & Regulatory Corpus Search
ID: search-semantic-legal-regulatory-search
Category: Search & Retrieval > Semantic Search
Complexity: high | Phase: retrieval
Modalities: text

When to use: When legal or compliance teams need to find relevant precedents, regulatory obligations, or contract clauses across large corpora — faster than manual review, with intent-aware matching that catches semantic equivalents with different terminology. Highest ROI in due diligence, compliance gap analysis, and patent searches.

When NOT to use: When authoritative exact-match is required (statute number lookup, specific citation retrieval) — BM25 keyword is more reliable and auditable. Avoid autonomous use without attorney review for consequential decisions; semantic search surfaces candidates, not conclusions. Not appropriate where jurisdiction-specific legal training is needed and the embedding model wasn't trained on that corpus.

Key tools: Westlaw Edge / LexisNexis AI (managed legal search, jurisdiction-aware, enterprise subscription), Relativity aiR (e-discovery with AI-assisted review: aiR for Review, Privilege, Case Strategy, Data Breach), Qdrant or Weaviate with legal-domain embeddings (self-hosted, Voyage AI voyage-law-2 model), LlamaIndex with metadata filtering for jurisdiction/date (DIY RAG pipeline), CaseMine (AI-powered legal research, case law similarity; replaces deprecated ROSS Intelligence)

Cost: Westlaw Edge and LexisNexis AI: enterprise subscription (typically $10k-$100k+/year, firm-size dependent, no public pricing). Relativity aiR: per-seat/usage pricing within RelativityOne. Voyage AI voyage-law-2: $0.12/million tokens. DIY Qdrant self-hosted: infra cost only; Qdrant Cloud: usage-based.

---

## Customer Support & FAQ Answer Generation
ID: search-rag-customer-support-qa
Category: Search & Retrieval > Retrieval-Augmented Generation (RAG)
Complexity: low-medium | Phase: retrieval + generation
Modalities: text

When to use: When tier-1 support volume is high, questions are repetitive and answerable from existing documentation, and the cost of a human agent handling each query exceeds the cost of RAG infrastructure. Sweet spot: SaaS products, consumer electronics, insurance, telecoms — anywhere with a large, structured knowledge base and high FAQ volume.

When NOT to use: When the query requires account-specific actions (refunds, account changes) — that's an agent with tool use, not RAG. Avoid for emotionally charged support contexts (healthcare emergencies, financial distress) where empathetic human handling is required. Don't deploy if the knowledge base is sparse, poorly maintained, or contains contradictory information — RAG will confidently answer from bad sources.

Key tools: Fin by Intercom (fin.ai; RAG-native AI agent, $0.99/resolved outcome; works standalone with Zendesk/Salesforce/HubSpot), Zendesk AI (Answer Bot + generative AI tier, enterprise pricing), LangChain + OpenAI with knowledge base ingestion (DIY; current RAG tutorial at docs.langchain.com), Salesforce Einstein Bots with generative AI (Salesforce-native), Freshdesk Freddy AI (Freshworks customer support AI)

Cost: Fin (Intercom): $0.99/resolved outcome (50 outcome minimum/month) + $29/seat/month with Intercom Suite; 14-day free trial. Zendesk AI: enterprise SaaS, pricing varies by plan tier. LangChain DIY: LLM API cost per query ($0.002-$0.06/query depending on model). Freshdesk Freddy: included in higher-tier Freshdesk plans.

---

## Internal Knowledge Assistant
ID: search-rag-internal-knowledge-assistant
Category: Search & Retrieval > Retrieval-Augmented Generation (RAG)
Complexity: medium | Phase: retrieval + generation
Modalities: text

When to use: When employees repeatedly ask the same questions to HR, IT, legal, or finance — consuming expensive specialist time on questions that are answerable from existing policy documents. Particularly high ROI during onboarding, when policy documentation is extensive but inconsistently accessed, and in distributed/remote orgs where people can't walk over to ask a colleague.

When NOT to use: When policies are in flux (frequent rewrites mean the knowledge base goes stale faster than you can re-embed). Avoid if the primary friction is that documentation doesn't exist or is genuinely ambiguous — RAG will confidently generate wrong answers. Don't use for queries that require real-time system access (payslip lookup, leave balance) without integrating tool use.

Key tools: Microsoft 365 Copilot (SharePoint/Teams/Outlook RAG; approx $18/user/month annual for business, requires M365 base plan), Glean Chat (RAG over all connected enterprise sources, 100+ connectors, enterprise pricing), Guru (internal knowledge base with AI Q&A; from $25/seat/month annual), LlamaIndex + Azure OpenAI (DIY, integrates with Azure AD for ACL enforcement), Notion AI Q&A (RAG within Notion workspace, add-on to Notion plans)

Cost: Microsoft 365 Copilot: approx $18/user/month (annual, business tier up to 300 users); requires separate M365 base subscription. Glean: enterprise contract, demo-gated. Guru: from $25/seat/month (annual). Notion AI: add-on per seat. LlamaIndex DIY: LLM API cost + vector store infra.

---

## Research & Literature Synthesis Assistant
ID: search-rag-research-synthesis-assistant
Category: Search & Retrieval > Retrieval-Augmented Generation (RAG)
Complexity: medium-high | Phase: retrieval + generation
Modalities: text, text+tables (structured documents)

When to use: When analysts, researchers, or strategists need to synthesise findings across large document corpora — earnings reports, academic literature, regulatory filings, competitor reports — faster than manual reading but with traceable sourcing. High value in financial services, pharma/biotech R&D, strategy consulting, and academic research.

When NOT to use: When the task requires forming original research conclusions or novel insights (the LLM will synthesise what's already in the corpus, not generate new hypotheses). Avoid when the corpus is not curated — web-scraped or unfiltered documents degrade synthesis quality significantly. Not appropriate for real-time news synthesis where recency is critical and documents aren't pre-indexed.

Key tools: Elicit (AI research assistant for academic papers; subscription-based, elicit.com), Perplexity Pro / Perplexity for Teams (RAG over web and custom document sets), LangChain + LlamaIndex research pipeline (DIY, multi-hop retrieval support), Anthropic Claude with large context window (direct document synthesis for corpora up to approx 200k tokens without a vector store), SEC EDGAR full-text search + GPT-4o / Claude for financial filings synthesis (DIY)

Cost: Elicit: subscription (check elicit.com/pricing for current tiers). Claude Opus 4 / GPT-4o synthesis with 50k+ token contexts: $0.50-$5 per synthesis query at current API rates. Perplexity for Teams: per-seat subscription. LlamaIndex DIY: LLM API cost per query. Cost scales with synthesis context window size, not corpus size.

---

## Conversational Product & Sales Assistant
ID: search-rag-conversational-product-assistant
Category: Search & Retrieval > Retrieval-Augmented Generation (RAG)
Complexity: medium | Phase: retrieval + generation
Modalities: text

When to use: When the pre-sales discovery and product configuration phase is document-heavy and slows down the sales cycle — complex B2B products with technical specs, large case study libraries, or CPQ processes where sales reps spend hours manually retrieving the right examples. High ROI in enterprise software sales, professional services, manufacturing capital equipment.

When NOT to use: When the product catalogue is simple enough for a structured configurator or product page (no RAG needed). Avoid if sales conversations require sensitive negotiation or trust-building that an AI assistant would undermine. Not a replacement for a fully autonomous sales agent — this node covers grounded Q&A during human-led sales, not autonomous outreach.

Key tools: Seismic with Aura AI (sales enablement + generative AI; Aigenius rebranded to Aura AI + Summer AI conversational agent), Highspot (agentic sales enablement platform, AI-powered content delivery and coaching), Salesforce Einstein Copilot (Salesforce-native RAG over deal room and CRM), LangChain + Pinecone + Anthropic Claude (DIY sales assistant, flexible corpus ingestion), Showpad Coach (sales readiness + product Q&A, integrated learning)

Cost: Seismic and Highspot: enterprise SaaS, significant annual contracts (six figures typical). Salesforce Einstein Copilot: bundled with Salesforce enterprise plans. DIY (LangChain + Claude): LLM API cost per conversation turn, vector store hosting; lower TCO at high volume. Showpad: enterprise SaaS.

---

## Content & Media Personalisation
ID: search-recommendation-content-personalisation
Category: Search & Retrieval > Recommendation Systems
Complexity: medium-high | Phase: ranking / personalisation
Modalities: text metadata, text+image (thumbnail signals), behavioural event streams

When to use: When a content platform has enough user behaviour history (implicit signals: clicks, watch time, completions) and enough content volume that manual curation or simple recency-sorting leaves engagement on the table. Most valuable when content consumption is habitual and repeated (daily news, streaming, podcast platforms) and users haven't expressed explicit preferences.

When NOT to use: When content volume is too small for collaborative filtering to find meaningful patterns (< ~1000 items). Avoid when the user base is new (cold start problem is severe). Don't use pure recommendation if editorial integrity or serendipity matters — recommendation systems optimise for engagement, which can create filter bubbles or surface sensational content.

Key tools: AWS Personalize (managed collaborative filtering + content-based; free tier: 20GB data, 100h training, 180k requests/month), Vertex AI Search for Commerce (formerly Google Recommendations AI, now unified under Vertex AI Search for commerce), Recombee (recommendation SaaS, flexible event tracking, 30-day free trial), TensorFlow Recommenders (open source, DIY deep learning recommendations), Bloomreach Discovery (AI merchandising + content recommendation for digital commerce)

Cost: AWS Personalize: no minimum fees; free tier 2 months then PAYG (per data processed, training hours, API calls). Vertex AI Search for Commerce: per-request pricing. Recombee: free trial then usage-based (contact for pricing). TensorFlow Recommenders: open source, infra cost only.

---

## E-Commerce Product Recommendation
ID: search-recommendation-ecommerce-product
Category: Search & Retrieval > Recommendation Systems
Complexity: medium | Phase: ranking / personalisation
Modalities: behavioural event streams (purchase, view, add-to-cart), product metadata

When to use: When returning visitors have enough purchase/browse history to produce meaningful personalised recommendations, and the catalogue has enough items to make non-obvious suggestions. Highest ROI on homepage merchandising for returning users, cart cross-sell, and post-purchase follow-up email with next purchase prediction.

When NOT to use: When average order frequency is very low (< 1 purchase/year) — not enough signal for collaborative filtering. Avoid if the product line is too small (< a few hundred SKUs) for meaningful related-item patterns. Don't conflate with semantic search re-ranking driven by the current query — if query intent matters more than user history, that's a different system.

Key tools: Mastercard Dynamic Yield (enterprise personalisation + recommendation, owned by Mastercard), Bloomreach Commerce (AI merchandising + recommendation, active product), Constructor (e-commerce search + recommendation, AI-first; domain is constructor.com), AWS Personalize (managed, purchase event data, free tier available), Barilliance (e-commerce personalisation SaaS, smaller enterprise focus)

Cost: Dynamic Yield (Mastercard) and Bloomreach: enterprise-priced (typically $100k+/year). Constructor: enterprise custom pricing. AWS Personalize: free tier then PAYG. ROI benchmark: well-tuned recommendations typically drive 2-8% uplift in average order value.

---

## Talent & Job Matching
ID: search-recommendation-talent-job-matching
Category: Search & Retrieval > Recommendation Systems
Complexity: high | Phase: ranking / matching
Modalities: text (CV, job description), structured skills ontology

When to use: When a talent marketplace, job board, ATS, or internal mobility platform needs to reduce recruiter screening time or improve candidate-to-job fit beyond keyword matching. High value in high-volume hiring (thousands of applications per role), internal talent marketplaces (large enterprises with frequent role changes), and gig platforms with shift-to-worker matching.

When NOT to use: When the matching criteria are fully structured and rule-based (must have security clearance, must be in specific location) — a filter suffices, not ML. Avoid autonomous matching-to-hire workflows without human review gates; bias auditing is required before any consequential use. Don't use if historical hiring data encodes systematic bias — the model will amplify it.

Key tools: Eightfold AI (talent intelligence platform, skills-based matching across 1.6B+ career profiles), Beamery (talent CRM + AI matching; Agentic AI Ray for skills-based recommendations, integrates with Workday/SAP), LinkedIn Recruiter with AI recommendations (managed, per-seat), Textkernel (semantic CV parsing + matching, B2B; includes LLM Parser with OpenAI integration), Phenom (HXP platform with AI-driven job matching and internal mobility)

Cost: Eightfold AI and Beamery: enterprise-licensed (typically 6-figure annual, demo-gated). LinkedIn Recruiter: per-seat subscription (varies by tier). Textkernel: B2B licensing, contact for pricing. DIY skills matching with ESCO/O*NET: engineering overhead + model training cost.

---

## B2B Account & Opportunity Recommendation
ID: search-recommendation-b2b-account-recommendation
Category: Search & Retrieval > Recommendation Systems
Complexity: high | Phase: ranking / prioritisation
Modalities: structured firmographic data, intent signals (web behaviour), CRM history

When to use: When a B2B sales team needs to prioritise which accounts to pursue next — too many prospects and not enough signal to manually prioritise. High value in territory planning, partner co-selling (which partner to recommend to which prospect), institutional investor deal sourcing, and next-product cross-sell within an existing account base.

When NOT to use: When the sales motion is heavily relationship-driven and account selection is owned by senior relationship managers who won't act on algorithmic recommendations. Avoid if the firmographic data quality is poor (accounts without complete industry/size/revenue metadata are hard to embed). Not appropriate for consumer-facing B2C recommendation, even if the consumer is a business (see description boundary).

Key tools: 6sense (account-based marketing + AI intent signals, enterprise; blocks web crawlers but actively maintained), Demandbase (ABM + AI account recommendation, Pipeline AI, intent detection), Bombora (B2B intent data from 18,000+ publishers, feeds account scoring models), Salesforce Einstein Account Scoring (CRM-native lead and account scoring), HG Insights (technology install data + AI revenue growth intelligence for targeting)

Cost: 6sense and Demandbase: enterprise-priced (typically $60k-$200k+/year depending on contact volume and features). Bombora intent data: separately licensed. HG Insights: enterprise contract. ROI requires high-velocity sales motion; low-volume enterprise sales may not justify analytics overhead.

---

## Visual & Image Search
ID: search-multimodal-image-search
Category: Search & Retrieval > Multimodal & Cross-Modal Retrieval
Complexity: medium-high | Phase: retrieval
Modalities: image, image→text cross-modal

When to use: When users have an image and want to find visually similar items — e-commerce 'shop the look', digital asset management retrieval, brand logo detection, fashion visual search. Also valuable in industrial applications: finding similar defects in quality inspection image libraries, or similar engineering components from a photo.

When NOT to use: When the primary search signal is text (most enterprise search is text-driven). Avoid if your image corpus lacks enough volume and quality for embeddings to produce meaningful similarity clusters (< a few thousand labelled images). Not appropriate for facial recognition applications (significant legal and ethical constraints in most jurisdictions).

Key tools: Google Vision API + Vertex AI Vector Search (managed, scalable, per-image-call pricing), AWS Rekognition + OpenSearch with k-NN (AWS-native stack, stored video and streaming options), CLIP (OpenAI open source, foundation for most modern visual search; 13.5k GitHub stars, actively maintained), Weaviate with multi2vec-clip module (self-hosted multimodal vector store, docs at docs.weaviate.io), Pinecone + CLIP embeddings (managed vector store, per-query pricing)

Cost: Google Vision API: per-image-call (see cloud.google.com/vision/pricing). CLIP inference: GPU compute required, self-hosted GPU infra or cloud GPU instances. Pinecone/Weaviate Cloud: per-query + storage. For product catalogues with millions of SKUs, embedding generation is a one-time batch cost; serving is the ongoing cost.

---

## Text + Image Joint Retrieval
ID: search-multimodal-text-image-joint-retrieval
Category: Search & Retrieval > Multimodal & Cross-Modal Retrieval
Complexity: high | Phase: retrieval
Modalities: text, image, combined text+image query

When to use: When queries naturally combine visual and textual information — 'find marketing materials that look like this logo but with blue colour scheme', product search with 'a bag like this but in leather', or multimodal RAG where documents contain both figures and text that must be retrieved together for a complete answer.

When NOT to use: When queries are consistently either text-only or image-only — adding the complexity of joint retrieval without multimodal query patterns adds engineering cost without benefit. Avoid if your document corpus is text-only (no meaningful visual content). Not appropriate for video-temporal retrieval (see video retrieval node).

Key tools: CLIP (joint embedding space for text and image, OpenAI, open source foundation model), OpenCLIP (open source CLIP variants, community fine-tunes; 13.5k stars, SigLIP2 models added, actively maintained), Weaviate with multi2vec-clip module (self-hosted, docs at docs.weaviate.io), GPT-4o / Claude 3.5+ Vision (multimodal RAG where the LLM processes image chunks directly), Marqo (multimodal vector search, purpose-built for text+image; marqo.ai)

Cost: CLIP/OpenCLIP: open source, GPU compute cost only. GPT-4o / Claude Vision: per-image-token (5-20x more expensive than text-only RAG). Marqo: managed cloud service, contact for pricing. Weaviate self-hosted: infra cost. Multimodal RAG with vision LLMs is typically 5-20x more expensive per query than text-only RAG.

---

## Video & Audio Content Retrieval
ID: search-multimodal-video-retrieval
Category: Search & Retrieval > Multimodal & Cross-Modal Retrieval
Complexity: high | Phase: retrieval
Modalities: video, audio (speech), text-in-video (OCR), scene description

When to use: When users need to find specific moments, clips, or videos from a large library based on what was said or what happened — media asset management, e-learning platform content search, video surveillance clip retrieval, podcast search, or broadcast media archive search. High value when the library has thousands of hours of content not searchable by transcript alone.

When NOT to use: When the video library is small enough that manual tagging suffices, or when queries are primarily metadata-driven (date, speaker, programme title). Avoid if video content is too noisy or low-quality for scene understanding (CCTV footage with poor resolution). Not appropriate for real-time video stream search — this node covers library retrieval, not live stream analysis.

Key tools: Twelve Labs (video understanding API: scene, speech, text-in-video, embedding; twelvelabs.io), Google Video Intelligence API / Vertex AI Video Intelligence (cloud.google.com/video-intelligence), AWS Rekognition Video (stored video analysis + streaming events, S3-based), OpenAI Whisper (speech-to-text preprocessing; v20250625 latest release, 96k GitHub stars, MIT license), FAISS / Qdrant with frame-level CLIP embeddings (DIY, frame-level retrieval)

Cost: Twelve Labs: per minute of video indexed (check twelvelabs.io/pricing). Google Video Intelligence: per minute processed. AWS Rekognition Video: per minute analyzed. Whisper: open source, GPU compute cost only. Video retrieval infrastructure typically 10-50x more expensive per unit than text retrieval.

---

## Document Layout-Aware Retrieval
ID: search-multimodal-document-layout-retrieval
Category: Search & Retrieval > Multimodal & Cross-Modal Retrieval
Complexity: high | Phase: retrieval
Modalities: document image (page-as-image), structured table data, extracted figures

When to use: When documents contain information encoded in layout (tables, figures, charts, diagrams, page structure) that would be lost by extracting only text — technical manuals with schematics, scientific papers with data tables, financial reports with charts, patent documents with drawings, regulatory filings with structured forms. The retrieval query may be about information that only exists visually.

When NOT to use: When documents are plain-text with no meaningful layout (narrative reports, policy text, most emails). Avoid if you have a well-structured database behind the documents — retrieve from the database directly rather than from PDF representations of that data. Not appropriate when document processing costs are prohibitive and the incremental improvement over text-only retrieval is small.

Key tools: ColPali (Visual RAG: embed document pages as images, no OCR needed; ICLR 2025 paper, open source), Unstructured.io (document parsing with layout preservation; 15k pages free, then $0.03/page PAYG), LlamaParse (document parser for RAG, table and layout aware; tiers: Cost Effective, Agentic, Agentic Plus, Fast), Azure AI Document Intelligence (Form Recognizer: structure extraction from PDFs; 500 pages free/month, then PAYG), Docling (IBM open source, multi-format parsing with layout; v2.80.0 released March 2026, 56k GitHub stars)

Cost: Unstructured.io: 15,000 pages free (no expiry), then $0.03/page PAYG. LlamaParse: credit-based tiers (see developers.llamaindex.ai for pricing). Azure AI Document Intelligence: 500 pages free/month (F0 tier), then per-page PAYG with commitment tier discounts. ColPali: open source, GPU inference cost. Docling: open source, compute cost only.

---

## Workforce & Shift Scheduling
ID: optimization-resource-workforce-scheduling
Category: Optimization & Planning > Resource Allocation & Scheduling
Complexity: high | Phase: operational
Modalities: tabular, structured_constraints

When to use: Use when you need to assign employees or contractors to shifts, time slots, or tasks subject to availability, skills, labour law, SLA targets, and fairness constraints — and the constraint space makes manual or spreadsheet-based scheduling infeasible at scale.

When NOT to use: Avoid when headcount is very small (< 15 staff) and constraints are simple — spreadsheets or manual scheduling are faster to operate. Also avoid when the primary bottleneck is physical routing rather than time-slot assignment (use field service dispatch instead).

Key tools: AIMMS / IBM ILOG CPLEX (MIP solvers for constraint-heavy rostering), Google OR-Tools (open-source CP/MIP — strong for shift scheduling), Gurobi (commercial MIP solver, used in workforce optimisation vendors), Deputy / Humanforce / NICE IEX WFM (commercial WFM platforms with embedded optimisation)

Cost: Solver licences range from free (OR-Tools) to $50k+/yr (Gurobi enterprise). Commercial WFM platforms typically $10–30/user/month. Build vs buy decision hinges on whether constraint customisation is needed.

---

## Capacity & Production Planning
ID: optimization-resource-capacity-production-planning
Category: Optimization & Planning > Resource Allocation & Scheduling
Complexity: high | Phase: tactical
Modalities: tabular, structured_constraints

When to use: Use when you need to allocate scarce machine time, theatre slots, compute capacity, or facility throughput across competing job orders or demand, subject to changeover costs, capacity ceilings, and delivery deadlines.

When NOT to use: Avoid when production is fully unconstrained (infinite capacity) or when the problem is purely a sequencing/scheduling problem without resource competition. Also avoid for workforce time allocation (use workforce scheduling) or for deciding what to produce vs. what to purchase (that is S&OP, not pure optimisation).

Key tools: Google OR-Tools (CP-SAT solver — strong for job shop and capacity planning), IBM ILOG CPLEX / Gurobi (MIP solvers for large-scale production allocation), SAP IBP / Kinaxis RapidResponse (commercial S&OP/capacity planning platforms), Simul8 / AnyLogic (discrete-event simulation for capacity stress-testing)

Cost: Solver costs as per workforce scheduling. Commercial S&OP platforms (SAP IBP, Kinaxis) are enterprise-tier ($200k–$1M+ implementation). OR-Tools is free and handles medium-scale instances well.

---

## Project & Portfolio Resource Allocation
ID: optimization-resource-project-resource-allocation
Category: Optimization & Planning > Resource Allocation & Scheduling
Complexity: medium | Phase: tactical
Modalities: tabular, structured_constraints

When to use: Use when you have a portfolio of projects competing for shared specialists, and you need to make allocation decisions that respect skill requirements, deadlines, priorities, and resource levelling constraints across the portfolio — not just within a single project.

When NOT to use: Avoid when projects are fully independent (no shared resources). Avoid for purely financial portfolio allocation decisions (use financial portfolio optimisation). Avoid when the problem reduces to a single-project critical path — that is project scheduling, not optimisation.

Key tools: Microsoft Project / Primavera P6 (portfolio resource levelling — mature, widely deployed), Planview Portfolios / Tempus Resource (enterprise PPM with optimisation), Google OR-Tools (custom MIP/CP formulations for portfolio allocation), Python PuLP or OR-Tools (for bespoke allocation solvers embedded in toolchains)

Cost: MS Project/Primavera are mid-range ($30–100/user/month). Enterprise PPM platforms (Planview) are $100k+/yr. Custom solver builds require 2–6 weeks of analyst/engineer time but give full flexibility.

---

## Financial Portfolio & Budget Optimisation
ID: optimization-resource-financial-portfolio-optimisation
Category: Optimization & Planning > Resource Allocation & Scheduling
Complexity: high | Phase: strategic
Modalities: tabular, time_series

When to use: Use when you need to allocate a fixed budget or capital pool across competing assets, channels, or instruments to maximise a return objective subject to risk, regulatory, or strategic constraints — and the decision space has enough dimensionality that intuition-based allocation leaves meaningful value on the table.

When NOT to use: Avoid when the allocation space is small (fewer than ~5 buckets) and constraints are simple — a simple LP or spreadsheet suffices. Avoid for pricing/bid decisions (route to dynamic or B2B pricing). Avoid for project staffing (use project resource allocation).

Key tools: cvxpy v1.8.1 (Python convex optimisation library — open source, standard for portfolio optimisation), Riskfolio-Lib v7.2.1 (Python portfolio optimisation library; HRP, CVaR, Black-Litterman — open source), Google Meridian (open-source Bayesian MMM; github.com/google/meridian — released 2024), Meta Robyn (open-source MMM; github.com/facebookexperimental/Robyn), Bloomberg PORT / MSCI Barra (institutional investment portfolio optimisation platforms)

Cost: cvxpy and Riskfolio-Lib are open source. Meridian/Robyn are open source but require significant analyst time. Institutional platforms (Bloomberg PORT, Barra) are $50k–$500k+/yr. MMM projects typically cost $50k–$200k including data preparation.

---

## Last-Mile Delivery Route Optimisation
ID: optimization-routing-last-mile-delivery
Category: Optimization & Planning > Route & Logistics Optimization
Complexity: medium | Phase: operational
Modalities: geospatial, tabular

When to use: Use when you have a fleet of vehicles making multi-stop deliveries (or pickups) within a local area, time windows are tight, and the number of stops makes manual route assignment infeasible — typically 20+ stops/vehicle per day.

When NOT to use: Avoid when deliveries are point-to-point without multi-stop complexity (use simple dispatch). Avoid for long-haul intercity freight (use supply chain network optimisation). Avoid when the primary constraint is technician skill matching rather than route efficiency (use field service dispatch).

Key tools: Google OR-Tools (VRP solver — widely used, open source, strong community), Routific / OptimoRoute / Circuit (commercial last-mile route optimisation SaaS), HERE Routing API / Google Routes API (underlying map/travel-time data), VROOM (open-source VRP solver with REST API, production-grade)

Cost: Commercial SaaS (Routific, OptimoRoute) typically $40–$150/driver/month. OR-Tools and VROOM are open source. Google Routes API usage costs apply for travel time matrices (~$5/1000 requests). In-house builds require 4–12 weeks.

---

## Field Service & Technician Dispatch
ID: optimization-routing-field-service-dispatch
Category: Optimization & Planning > Route & Logistics Optimization
Complexity: high | Phase: operational
Modalities: geospatial, tabular, structured_constraints

When to use: Use when dispatching technicians or engineers who require specific skills or certifications to customer sites, with geographic territory constraints, SLA time windows, and tool/parts availability requirements — the combination of skill matching plus routing distinguishes this from pure VRP.

When NOT to use: Avoid when no physical routing is involved (use workforce scheduling). Avoid when skill matching is trivial (all technicians are equivalent) and it reduces to a pure delivery routing problem.

Key tools: Salesforce Field Service (formerly ClickSoftware) — dominant commercial FSM platform, ServiceMax / Microsoft Dynamics 365 Field Service — enterprise FSM with scheduling optimisation, Google OR-Tools (custom skill-constrained VRP formulations), Skedulo / Jobber (SME-focused FSM with scheduling optimisation)

Cost: Enterprise FSM platforms (Salesforce FS, ServiceMax) are $150–$300/user/month. SME platforms (Skedulo, Jobber) are $50–$100/user/month. Custom OR-Tools builds require 6–16 weeks of engineering.

---

## Supply Chain Network & Freight Optimisation
ID: optimization-routing-supply-chain-network
Category: Optimization & Planning > Route & Logistics Optimization
Complexity: high | Phase: strategic
Modalities: geospatial, tabular, structured_constraints

When to use: Use when optimising flows of goods through a fixed or near-fixed network of warehouses, DCs, ports, and carriers — deciding which lanes, modes, and carrier contracts to use to minimise cost or transit time at a strategic or tactical planning horizon.

When NOT to use: Avoid when the question is where to locate facilities (use network infrastructure design). Avoid for last-mile delivery optimisation. Avoid for intra-warehouse movement (warehouse slotting).

Key tools: IBM Decision Optimization / CPLEX (LP/MIP for network flow problems), LLamasoft Supply Chain Guru / Coupa Supply Chain Design (commercial network design/optimisation), o9 Solutions / Blue Yonder (integrated supply chain planning with network optimisation), Python NetworkX + PuLP / OR-Tools (custom min-cost flow and multi-commodity flow)

Cost: Commercial platforms (LLamasoft, o9) are $200k–$1M+/yr for enterprise. CPLEX/Gurobi solver licences $50k–$200k/yr. Open-source Python stacks are cost-effective for custom builds with significant modelling investment.

---

## Public Transit & Traffic Flow Optimisation
ID: optimization-routing-public-transit-traffic
Category: Optimization & Planning > Route & Logistics Optimization
Complexity: high | Phase: operational
Modalities: geospatial, time_series, structured_constraints

When to use: Use when optimising the scheduling, routing, or real-time control of public transport systems or urban traffic infrastructure — where the objective involves network-level throughput or service coverage, not individual vehicle routing.

When NOT to use: Avoid for private fleet delivery optimisation (use last-mile delivery). Avoid for freight/supply chain routing. Avoid when the problem is individual driver dispatch without network effects.

Key tools: PTV Optima / HASTUS (commercial transit scheduling and optimisation platforms), SUMO (Simulation of Urban MObility — open-source traffic simulation), Google OR-Tools (transit scheduling and assignment formulations), Aimsun / Vissim (microsimulation for traffic signal optimisation)

Cost: HASTUS/PTV are enterprise platforms ($500k+ implementation). SUMO is open source. Aimsun/Vissim licences are $20k–$100k/yr. Ride-hailing matching systems are custom builds at scale.

---

## Dynamic Retail & E-Commerce Pricing
ID: optimization-pricing-dynamic-retail-pricing
Category: Optimization & Planning > Pricing & Revenue Optimization
Complexity: high | Phase: operational
Modalities: tabular, time_series

When to use: Use when you have a large catalogue of SKUs with elastic demand responses, inventory constraints, competitive price signals, and the computational capacity to update prices at frequency — typically e-commerce or omnichannel retail with 1000+ SKUs.

When NOT to use: Avoid when brand/channel conflict considerations dominate (pricing decisions are political, not optimisational). Avoid for B2B contract pricing (use B2B deal pricing). Avoid for travel/hospitality yield management (different capacity structure).

Key tools: Revionics / Aptos Pricing (enterprise retail pricing AI platforms), PROS Pricing / Pricefx (pricing optimisation for omnichannel retail), Python + scikit-learn / XGBoost (custom demand elasticity modelling for markdown optimisation), Competera / Prisync (competitor price monitoring + dynamic pricing SaaS)

Cost: Enterprise platforms (Revionics, Pricefx) are $200k–$1M+/yr. Mid-market SaaS (Competera, Prisync) are $1k–$10k/month. Custom builds require significant data science investment (3–12 months).

---

## Travel & Hospitality Yield Management
ID: optimization-pricing-travel-hospitality-yield
Category: Optimization & Planning > Pricing & Revenue Optimization
Complexity: high | Phase: operational
Modalities: time_series, tabular

When to use: Use when managing perishable inventory (seats, rooms, cabins) across a booking horizon where demand is time-dependent, capacity is fixed, and revenue per unit varies with purchase timing and customer segment — the classic yield management setup.

When NOT to use: Avoid for physical goods with replenishable inventory (use dynamic retail pricing). Avoid for B2B contract pricing. Avoid when capacity is effectively unlimited (SaaS licensing).

Key tools: IDeaS Revenue Solutions / Duetto (hotel revenue management systems), PROS / Sabre AirVision (airline revenue management platforms), Python + scipy.optimize (custom EM/LP models for smaller operators), Cloudbeds / Mews (PMS with embedded revenue management for independent hotels)

Cost: Enterprise airline/hotel RM platforms are $500k–$5M+/yr. Mid-market hotel RM SaaS (IDeaS G3 for smaller properties) is $1k–$5k/month. Custom builds are realistic for mid-sized operators with strong data science teams.

---

## B2B Deal & Contract Pricing Optimisation
ID: optimization-pricing-b2b-deal-pricing
Category: Optimization & Planning > Pricing & Revenue Optimization
Complexity: high | Phase: operational
Modalities: tabular

When to use: Use when sales reps are making discretionary discount decisions across a large number of deals, price realisation is measurably below list, and there is sufficient historical deal data (win/loss, discount levels, deal size, customer attributes) to train a price elasticity or propensity model.

When NOT to use: Avoid when deal volume is too low for statistical modelling (fewer than ~500 historical deals per segment). Avoid when pricing is purely cost-plus with no discretion. Avoid for consumer/retail pricing (use dynamic retail pricing).

Key tools: PROS AI / Zilliant (leading commercial B2B price optimisation platforms), Pricefx (mid-market CPQ and deal pricing AI), Salesforce Revenue Cloud / CPQ (with custom pricing rules and approval workflows), Python + XGBoost / LightGBM (custom win-probability and price elasticity models)

Cost: PROS/Zilliant enterprise deployments are $500k–$2M+/yr. Pricefx mid-market tier is $100k–$500k/yr. Custom Python models embedded in Salesforce CPQ are achievable for $50k–$200k build cost.

---

## Subscription & Monetisation Optimisation
ID: optimization-pricing-subscription-monetisation
Category: Optimization & Planning > Pricing & Revenue Optimization
Complexity: medium | Phase: strategic
Modalities: tabular, time_series

When to use: Use when you have a multi-tier or freemium subscription product, sufficient conversion and retention data, and want to use modelling to inform tier structure, price points, upgrade prompts, or retention offers rather than relying on intuition or copying competitors.

When NOT to use: Avoid when the product is too early-stage to have meaningful retention/conversion data (A/B testing is premature when n is small). Avoid for B2B enterprise deal pricing (use B2B deal pricing). Avoid when pricing is regulated.

Key tools: Stripe Sigma / Paddle Analytics (subscription revenue analytics with cohort modelling), Reforge frameworks + custom Python (LTV modelling, cohort-based price sensitivity analysis), ProfitWell / Baremetrics (subscription metrics and pricing experiments), Statsig / LaunchDarkly (feature flag + experimentation platforms for paywall testing)

Cost: ProfitWell/Baremetrics are $200–$1000/month. Statsig/LaunchDarkly are $500–$5k/month. Custom modelling is typically an internal data science project. Dedicated monetisation consultants charge $20k–$100k for pricing strategy engagements.

---

## A/B & Multivariate Testing Automation
ID: optimization-experiment-ab-multivariate-testing
Category: Optimization & Planning > Experiment Design & Optimization
Complexity: medium | Phase: operational
Modalities: tabular, time_series

When to use: Use when you have sufficient live traffic to detect meaningful effect sizes with statistical power, and you want to compare discrete variants (UI changes, algorithm versions, feature flags) with a clean causal interpretation. Typically requires thousands of users/day.

When NOT to use: Avoid when traffic is insufficient for statistical power (use qualitative methods or Bayesian priors instead). Avoid when you cannot randomise assignment (use causal inference methods). Avoid for ML hyperparameter search (use Bayesian optimisation).

Key tools: Statsig / Eppo (modern experimentation platforms with Bayesian and frequentist modes), Optimizely / VWO (mature A/B testing platforms for web/app experimentation), Netflix Experimentation Platform / internal tooling (benchmark for multi-armed bandit), Python statsmodels / scipy.stats (custom sequential testing, power calculation)

Cost: Statsig/Eppo are $500–$5k/month for growth-stage. Optimizely enterprise is $50k–$200k/yr. Python-based custom frameworks are free but require significant engineering investment to productionise.

---

## Bayesian Optimisation & Hyperparameter Tuning
ID: optimization-experiment-bayesian-hyperparameter-tuning
Category: Optimization & Planning > Experiment Design & Optimization
Complexity: medium | Phase: development
Modalities: tabular, structured_constraints

When to use: Use when evaluating each candidate configuration is expensive (training a neural network, running a lab experiment, or a simulation), the objective function is black-box (no gradient), and you want to find good solutions in far fewer evaluations than grid or random search.

When NOT to use: Avoid when the configuration space is small enough for grid search (fewer than ~50 combinations). Avoid when evaluation is cheap and parallelisable (random search or grid search may be faster wall-clock). Avoid for live A/B traffic experiments (use MAB or A/B testing).

Key tools: Optuna (Python — leading open-source hyperparameter optimisation framework, TPE/CMA-ES), Ray Tune (distributed hyperparameter search, integrates with Optuna/Hyperband/ASHA), Weights & Biases Sweeps (managed hyperparameter search with experiment tracking), Ax / BoTorch (Meta's Bayesian optimisation platform — research-grade, strong for scientific experiments)

Cost: Optuna and Ray Tune are open source. W&B Sweeps are included in W&B plans ($50–$150/user/month for teams). Ax/BoTorch are open source. Compute cost is the dominant expense — GPU training runs can cost $1–$100 per evaluation.

---

## Simulation-Based Scenario Optimisation
ID: optimization-experiment-simulation-digital-twin
Category: Optimization & Planning > Experiment Design & Optimization
Complexity: high | Phase: strategic
Modalities: structured_constraints, time_series

When to use: Use when you need to evaluate strategic or design changes without disrupting live operations, the system is too complex for analytical models, and you can build a simulation that captures the key dynamics (demand variability, failure modes, feedback loops).

When NOT to use: Avoid when a simpler analytical model (queueing theory, LP) suffices — simulations are expensive to build and validate. Avoid when you need causal inference from real-world interventions (use causal inference methods). Avoid for live A/B testing decisions.

Key tools: AnyLogic (multi-method simulation — agent-based, DES, system dynamics — market leader), Simul8 / Arena (discrete event simulation for process and operations analysis), Python SimPy (open-source DES library for custom simulations), Azure Digital Twins / AWS IoT TwinMaker (cloud-native digital twin platforms for physical asset twins)

Cost: AnyLogic Professional is $8k–$20k/yr per licence. Simul8/Arena are similar. SimPy is open source. Azure/AWS digital twin services are consumption-priced ($0.10–$1/1000 messages). End-to-end digital twin projects are often $500k–$5M+ for large industrial deployments.

---

## Causal Inference & Lift Measurement
ID: optimization-experiment-causal-inference-measurement
Category: Optimization & Planning > Experiment Design & Optimization
Complexity: high | Phase: analytical
Modalities: tabular, time_series

When to use: Use when you need to estimate the causal effect of an intervention (marketing campaign, product feature, policy change) and a randomised experiment is infeasible — either because randomisation is impossible, unethical, or would contaminate the control group.

When NOT to use: Avoid when a proper RCT is feasible and you have sufficient sample size — RCTs are cleaner and easier to explain. Avoid for prediction tasks where causal interpretation is not required. Avoid when data quality is too poor to support the identification assumptions required.

Key tools: Python CausalML (Uber's open-source causal ML library — uplift, DR, metalearners), DoWhy / EconML (Microsoft open-source causal inference and heterogeneous treatment effects), R CausalImpact (Google's Bayesian structural time series for causal impact of interventions), Meta Robyn / Google Meridian (MMM with causal attribution for marketing)

Cost: All major tools are open source. The cost is predominantly analyst/data scientist time — causal inference projects typically require 4–16 weeks of senior-level effort. Geo-based holdout experiments have media spend costs.

---

## Product Configuration & CPQ
ID: optimization-constraint-product-configuration
Category: Optimization & Planning > Constraint Satisfaction & Configuration
Complexity: high | Phase: operational
Modalities: structured_constraints

When to use: Use when products have complex compatibility rules between components (engineering-to-order, configurable manufactured goods, enterprise software licensing), and the combinatorial space of valid configurations is too large for human experts to verify manually.

When NOT to use: Avoid for simple product selection without compatibility constraints (use recommendation systems). Avoid when pricing is the primary decision and configuration is trivial. Avoid for logistics/warehouse assignment problems.

Key tools: Tacton CPQ / Configit (specialist product configuration platforms with CSP solvers), Salesforce CPQ / SAP CPQ (commercial CPQ platforms with rule-based configuration), MiniZinc / IBM CP Optimizer (constraint programming solvers for custom configurators), Python python-constraint / OR-Tools CP-SAT (lightweight CSP solving for custom builds)

Cost: Tacton/Configit enterprise deployments are $500k–$2M+. Salesforce CPQ is $75–$150/user/month. Custom CP-SAT builds are feasible for $50k–$200k engineering investment for well-defined constraint sets.

---

## Warehouse Layout & Slotting Optimisation
ID: optimization-constraint-warehouse-slotting
Category: Optimization & Planning > Constraint Satisfaction & Configuration
Complexity: medium | Phase: tactical
Modalities: tabular, structured_constraints

When to use: Use when a warehouse has enough SKU velocity diversity that smart slotting (placing fast movers near dispatch, co-locating frequently paired items) would materially reduce pick path length and labour cost — typically relevant for DCs with 500+ active SKUs.

When NOT to use: Avoid when the warehouse is small, SKU count is low, or pick frequency is roughly uniform across SKUs. Avoid for logistics routing outside the warehouse (use last-mile delivery or supply chain routing). Avoid for product configuration problems.

Key tools: Manhattan Associates WMS / Blue Yonder WMS (enterprise WMS with slotting optimisation modules), Optricity / HAAS Alert (specialist warehouse slotting optimisation software), Python + OR-Tools / scipy (custom assignment formulations for slotting optimisation), Swisslog / Dematic (automated storage and retrieval systems with embedded slotting logic)

Cost: Specialist slotting software (Optricity) is $50k–$200k/yr. Enterprise WMS slotting modules are bundled into WMS contracts ($500k–$2M+ full implementation). Custom Python builds are feasible for well-resourced operations teams.

---

## Regulatory & Compliance-Constrained Planning
ID: optimization-constraint-regulatory-compliance-planning
Category: Optimization & Planning > Constraint Satisfaction & Configuration
Complexity: high | Phase: tactical
Modalities: structured_constraints, text

When to use: Use when planning decisions (scheduling, sequencing, architecture, contract execution) must respect a dense set of regulatory constraints, and the constraint space is complex enough that human compliance review is slow, inconsistent, or a bottleneck — particularly in aviation, pharma, food safety, financial services.

When NOT to use: Avoid for pure compliance risk scoring or classification (that is a prediction problem, not optimisation). Avoid when the regulatory environment is simple enough that a checklist suffices. Avoid for product configuration constraints (technical product rules, not regulatory).

Key tools: Quantexa / Relativity (regulatory document analysis and compliance planning for financial/legal), Veeva Vault (pharma/biotech regulatory content and GMP compliance planning), IBM OpenPages / MetricStream (enterprise GRC platforms with constraint-driven planning), Custom constraint programming (OR-Tools, MiniZinc) for bespoke regulatory constraint encoding

Cost: Enterprise GRC platforms (IBM OpenPages, MetricStream) are $200k–$1M+/yr. Veeva Vault is $500k–$2M+/yr for pharma enterprise. Custom CP builds require legal/regulatory SME involvement alongside engineering, typically $200k–$1M projects.

---

## Network & Infrastructure Design Optimisation
ID: optimization-constraint-network-infrastructure-design
Category: Optimization & Planning > Constraint Satisfaction & Configuration
Complexity: high | Phase: strategic
Modalities: geospatial, structured_constraints

When to use: Use when making structural decisions about where to locate facilities, towers, data centres, or retail stores — placement decisions that define the network topology, not decisions about how to route flows through an existing network.

When NOT to use: Avoid when the network structure is fixed and the question is routing/flow optimisation (use supply chain network optimisation). Avoid for real-time traffic or transit routing. Avoid for workforce shift scheduling or warehouse slotting.

Key tools: LLamasoft Supply Chain Guru / Coupa Network Design (facility location optimisation — market leader), IBM CPLEX / Gurobi (MIP solvers for facility location and p-median problems), Python PuLP / OR-Tools (custom facility location formulations — capacitated/uncapacitated variants), ESRI ArcGIS Network Analyst (GIS-based network design with spatial analysis)

Cost: LLamasoft/Coupa is $200k–$1M+ for enterprise network design projects. ESRI licences are $5k–$50k/yr. Gurobi/CPLEX solver licences are $50k–$200k/yr. Custom Python builds are achievable for $50k–$200k engineering investment.

---

## Customer Support Automation Agent
ID: agents-conversational-customer-support-bot
Category: Autonomous Agents & Orchestration > Conversational Agents
Complexity: medium | Phase: production
Modalities: text, voice (IVR)

When to use: High-volume tier-1 support with repetitive, bounded resolution paths (returns, refunds, order status, account queries). Works well when resolution can be verified against structured systems (CRM, OMS) and escalation paths to humans are well-defined.

When NOT to use: Complex or emotionally charged complaints requiring empathy and judgement. Regulated contexts (financial advice, medical) where automated resolution creates liability. Products with too much variance for scripted resolution flows. Do not use when CSAT risk from a bad bot interaction outweighs deflection savings.

Key tools: Zendesk / Freshdesk (CRM integration via API), Intercom / Tidio (native bot + handoff infrastructure), LangChain / LlamaIndex (RAG over knowledge base), Twilio (voice + SMS channel integration), Stripe / Shopify APIs (transactional resolution actions)

Cost: Low inference cost per resolution. Setup cost is moderate (CRM integration, knowledge base ingestion, escalation logic). ROI is driven by deflection rate — typically needs >40% deflection to justify vs. live agents.

---

## Internal Employee Assistant Agent
ID: agents-conversational-internal-employee-assistant
Category: Autonomous Agents & Orchestration > Conversational Agents
Complexity: medium-high | Phase: production
Modalities: text (chat), voice (experimental)

When to use: Organisations with high HR/IT ticket volume from repetitive policy queries and self-service actions. Valuable when employees are distributed (remote/hybrid) and async access to HR/IT systems is needed. Best fit when enterprise systems (HRIS, ITSM) have stable APIs.

When NOT to use: Sensitive HR matters (performance management, disciplinary processes, grievances) — these require human judgement and legal care. Early-stage orgs where policies change too frequently to maintain an accurate knowledge base. Contexts where employees distrust AI-recorded interactions.

Key tools: Microsoft Copilot Studio / Teams integration (enterprise deployment channel), ServiceNow or Jira Service Management (IT ticketing backend), Workday / BambooHR APIs (HRIS integration for leave, benefits), LlamaIndex (enterprise document RAG over policy PDFs), Slack Bolt SDK (Slack-native deployment)

Cost: Moderate setup cost (enterprise SSO, system integrations, policy ingestion pipeline). Inference cost is low per query. Value accrues from IT/HR ticket deflection — typically measured in FTE-hours saved.

---

## Sales & Lead Qualification Agent
ID: agents-conversational-sales-qualification-agent
Category: Autonomous Agents & Orchestration > Conversational Agents
Complexity: medium | Phase: production
Modalities: text (chat, email), voice

When to use: High-volume inbound leads where SDR capacity is the bottleneck. Works well for SaaS/B2B with well-defined ICP criteria that can be operationalised as qualification questions. Also useful for async qualification of leads from content downloads or demo requests where speed-to-response is a competitive factor.

When NOT to use: Enterprise sales with long consultative cycles where relationship is the product — a bot as first touch can damage deal quality. Complex technical sales where qualification requires deep product knowledge the agent can't reliably demonstrate. Outbound cold prospecting where unannounced AI contact may violate regulations (TCPA, GDPR).

Key tools: HubSpot / Salesforce CRM (lead enrichment and CRM write-back), Clearbit / Clay (lead enrichment API for firmographic scoring), Calendly API (demo booking integration), Twilio / Vonage (SMS/voice outreach channel), LangChain (conversation state and tool orchestration)

Cost: Low inference cost per lead. ROI is driven by pipeline conversion rate improvement and SDR time saved. Key cost is integration setup (CRM, enrichment APIs, calendar). Ongoing cost includes enrichment API calls per lead.

---

## Tutoring & Coaching Agent
ID: agents-conversational-tutoring-coaching-agent
Category: Autonomous Agents & Orchestration > Conversational Agents
Complexity: medium-high | Phase: production
Modalities: text, voice, multimodal (image for worked examples)

When to use: Learning contexts where adaptive, personalised dialogue improves outcomes more than static content — interview prep, language acquisition, professional skill development, customer onboarding with knowledge transfer goals. Best when learner progress can be assessed conversationally and the domain has well-defined mastery criteria.

When NOT to use: Compliance training that requires attestation and audit trails — pure conversational agents don't provide the audit infrastructure. Domains requiring physical demonstration or hands-on practice. Contexts where learner safety is at risk from AI misinformation (medical procedures, safety-critical skills).

Key tools: OpenAI Assistants API or Anthropic API (stateful conversation + system prompt), LangChain / LlamaIndex (curriculum document RAG), Whisper API (voice-to-text for spoken practice), LMS APIs — Canvas, Moodle (progress tracking integration), Elo rating libraries (adaptive difficulty scoring for practice problems)

Cost: Moderate inference cost (multi-turn, often with large context). Setup cost dominated by curriculum design and knowledge base construction. Voice adds Whisper transcription cost. For consumer apps, cost-per-session must be balanced against subscription pricing.

---

## Data Pipeline & ETL Automation Agent
ID: agents-task-automation-data-pipeline-agent
Category: Autonomous Agents & Orchestration > Task Automation Agents
Complexity: high | Phase: production (select use cases)
Modalities: structured data, text (for unstructured source parsing)

When to use: Data pipelines with high variability in source data quality, structure, or error patterns that make static ETL brittle. Useful when remediation logic requires reasoning (not just retry rules) — e.g. inferring schema changes, imputing missing values with domain knowledge, or writing fix queries. Also valuable for unstructured source data (emails, PDFs, semi-structured logs).

When NOT to use: High-throughput, low-latency streaming pipelines where inference latency is unacceptable. Well-defined, stable pipelines that work fine with deterministic ETL (Airflow, dbt, Fivetran) — adding agent overhead is waste. Regulated financial pipelines where every transformation must be auditable to a deterministic rule.

Key tools: Apache Airflow / Prefect (pipeline orchestration with agent as a task), dbt (transformation layer the agent writes/modifies), Great Expectations / Soda (data quality assertions the agent responds to), LangChain Tools / function calling (SQL execution, API calls, file operations), Snowflake / BigQuery (data warehouse target with SQL generation)

Cost: Inference cost scales with pipeline frequency and error rate. Batch nightly pipelines are cost-manageable; high-frequency pipelines require careful cost modelling. Primary value is in reducing manual intervention cost for data quality incidents.

---

## Document Processing & Workflow Agent
ID: agents-task-automation-document-processing-agent
Category: Autonomous Agents & Orchestration > Task Automation Agents
Complexity: medium-high | Phase: production
Modalities: document (PDF, image), structured data (JSON output)

When to use: High-volume document workflows with variable structure where human review is the bottleneck — invoice processing, contract review, insurance claims, mortgage documents. Best when extraction can be validated against business rules (amounts reconcile, dates are plausible, required fields are present) and routing decisions follow defined logic.

When NOT to use: Fully structured documents where deterministic OCR + template matching suffices (don't add agent overhead). Documents requiring legal sign-off as the primary output — the agent can assist but cannot be the sole decision-maker in regulated contexts. Low-volume workflows where setup cost exceeds 12-month labour savings.

Key tools: AWS Textract / Google Document AI (OCR and structured extraction), LangChain document loaders + extraction chains (LLM-based field extraction), Apache Airflow / Temporal (workflow orchestration for multi-step approval flows), SAP / Oracle ERP APIs (downstream posting after validation), Docusign / Adobe Sign APIs (signature workflow integration)

Cost: OCR cost is low and predictable. LLM extraction cost scales with document length and volume. For high-volume workflows (10k+ docs/month), cost modelling is essential. ROI is driven by human review time eliminated.

---

## Web & Browser Automation Agent
ID: agents-task-automation-web-browser-scraping-agent
Category: Autonomous Agents & Orchestration > Task Automation Agents
Complexity: medium | Phase: production
Modalities: web (HTML, JavaScript-rendered), structured data (JSON output)

When to use: Monitoring tasks requiring adaptive parsing of web content that changes structure over time (competitor pricing, regulatory filings, job postings). Also valuable for form-filling automation on portals without APIs (government filing systems, legacy vendor portals). Best when the data source has no API and the information is publicly accessible.

When NOT to use: Sites where scraping violates ToS and creates legal/reputational risk. Sites with robust anti-bot measures (Cloudflare, CAPTCHA) where reliability will be <80%. Internal systems that have an API — always prefer API over scraping. Real-time data needs (financial prices, live inventory) where scraping latency is unacceptable.

Key tools: Playwright / Selenium (browser automation with JavaScript rendering), Firecrawl / Jina Reader (LLM-friendly web content extraction), Bright Data / Oxylabs (residential proxy infrastructure for scale), BeautifulSoup / lxml (lightweight HTML parsing for stable structures), Apify (managed scraping infrastructure with scheduling)

Cost: Infrastructure cost (proxies, browser instances) dominates over inference cost. Bright Data residential proxies run $8-15/GB. Apify actor runs are pay-per-use. LLM cost for adaptive parsing is marginal. Scale-up cost is linear with target site count.

---

## Integration & API Orchestration Agent
ID: agents-task-automation-integration-api-agent
Category: Autonomous Agents & Orchestration > Task Automation Agents
Complexity: high | Phase: production
Modalities: structured data (JSON/XML APIs), text (for natural language condition evaluation)

When to use: Multi-system integrations where the orchestration logic requires reasoning, conditional branching, or error handling that is too complex for rule-based iPaaS tools (Zapier, Make). Particularly valuable when integrating systems with inconsistent APIs, when business logic changes frequently, or when errors require intelligent remediation rather than simple retry.

When NOT to use: Simple point-to-point integrations where a Zapier/Make workflow suffices — don't add AI overhead to solved problems. Real-time, high-throughput event streaming where inference latency breaks SLAs. Integrations with regulatory auditability requirements where every decision must trace to a deterministic rule.

Key tools: LangChain / LlamaIndex (agent orchestration + tool use), Temporal / Apache Airflow (durable workflow execution with retry semantics), REST / GraphQL API clients (target system connectors), Zapier / Make (for simpler sub-workflows the agent delegates to), Datadog / OpenTelemetry (observability for multi-system traces)

Cost: Inference cost is low per orchestration step. Primary cost is engineering time for integration development and testing. Operational cost includes API rate limits and per-call fees from target systems. ROI measured against iPaaS subscription and manual integration maintenance cost.

---

## Competitive Intelligence Research Agent
ID: agents-research-competitive-intelligence
Category: Autonomous Agents & Orchestration > Research & Analysis Agents
Complexity: medium | Phase: production
Modalities: web (search, crawl), structured data (filings, APIs), text (report generation)

When to use: Strategy, product, and sales teams needing continuous competitor monitoring without dedicated analyst headcount. High-signal use cases: tracking competitor pricing changes, product update announcements, hiring signals (job postings as leading indicators), regulatory filings, and patent activity. Best when the competitive landscape changes faster than quarterly manual reviews can capture.

When NOT to use: One-off competitive research where a single human analyst session is cheaper. Highly regulated industries where automated intelligence gathering raises compliance concerns. Markets with very few, very opaque competitors where public signal is insufficient.

Key tools: Perplexity API / Tavily (LLM-optimised search for agent workflows), Firecrawl (structured web content extraction for competitor sites), SerpAPI / DataForSEO (SERP monitoring for competitor content), Crunchbase / PitchBook API (funding and firmographic signals), Slack / Email webhooks (delivery channel for intelligence digests)

Cost: Search API costs (Perplexity, Tavily, SerpAPI) are the primary variable cost — typically $0.01-0.05 per search. LLM synthesis cost is low. For daily monitoring of 10-20 competitors, total monthly cost is typically $50-300 in API fees. Primary investment is in signal taxonomy design and output workflow integration.

---

## Scientific Literature & Research Agent
ID: agents-research-scientific-literature-agent
Category: Autonomous Agents & Orchestration > Research & Analysis Agents
Complexity: high | Phase: production (select use cases)
Modalities: text (academic papers, abstracts), structured data (citation graphs)

When to use: Research teams needing to synthesise large literature corpora faster than manual review — systematic reviews, hypothesis generation, patent landscape analysis. High-value when the domain has well-indexed sources (PubMed, arXiv, Semantic Scholar) and the research question is specific enough to be operationalised as search queries.

When NOT to use: Research requiring expert-level interpretation of complex technical content (e.g. clinical trial statistical analysis, safety interpretation of drug interactions) — the agent can retrieve and summarise but cannot replace domain expert review. Fields with poor digital indexing where most literature is not in accessible databases.

Key tools: Semantic Scholar API (paper discovery, citation graphs, author networks), PubMed / Europe PMC APIs (biomedical literature search), arXiv API (CS, physics, maths preprint access), Elicit (AI research assistant with structured literature review workflow), LangChain document loaders + summarisation chains (PDF ingestion and synthesis)

Cost: Literature API access is low cost (Semantic Scholar is free tier, PubMed is free). PDF download and processing costs are low. LLM synthesis cost scales with paper volume and context length — a 100-paper systematic review requires careful chunking strategy to stay within cost bounds.

---

## Due Diligence & Investigation Agent
ID: agents-research-due-diligence-agent
Category: Autonomous Agents & Orchestration > Research & Analysis Agents
Complexity: high | Phase: production (with human review gate)
Modalities: web (search, filings), structured data (registries, sanctions lists), text (report generation)

When to use: M&A, investment, and legal teams with high deal flow who need to accelerate early-stage due diligence screening. Particularly valuable for KYC/AML workflows requiring synthesis across multiple public registries, sanctions lists, and news sources. Also useful for vendor risk reviews at scale where manual research is the bottleneck.

When NOT to use: Final-stage DD on high-stakes transactions — agent output is a research accelerator, not a substitute for accountant/lawyer sign-off. KYC decisions that carry regulatory accountability — an automated adverse media check does not constitute a compliant AML procedure without human review. Private companies with minimal public footprint where public signal is insufficient.

Key tools: Perplexity API / Tavily (web research with citations), EDGAR / Companies House APIs (regulatory filing retrieval), Refinitiv World-Check / OFAC API (sanctions and adverse media screening), LangChain research agent pattern (multi-source synthesis with tool use), DocuSign / SharePoint (due diligence room integration for document collection)

Cost: Web research API costs are modest ($50-200/month for typical DD volume). Sanctions screening APIs (Refinitiv World-Check) carry significant per-entity fees — verify pricing before scaling. Primary cost is engineering and compliance review of the workflow design.

---

## Analyst Report & Insight Generation Agent
ID: agents-research-analyst-report-agent
Category: Autonomous Agents & Orchestration > Research & Analysis Agents
Complexity: medium-high | Phase: production
Modalities: structured data (financial, metrics), text (narrative generation), web (current events)

When to use: Recurring structured reports where the primary bottleneck is data gathering and first-draft writing — equity research briefings, management dashboards, sector updates, portfolio reviews. Best when the report structure is stable and data sources are accessible via API. High-value when the analyst's scarce time should be spent on insight interpretation, not data assembly.

When NOT to use: One-off bespoke reports where setup cost exceeds writing time. Reports requiring original field research (primary interviews, proprietary surveys) — the agent can only synthesise secondary sources. Investment reports with fiduciary sign-off requirements where automated generation without review creates compliance risk.

Key tools: Alpha Vantage / Polygon.io (financial data APIs for equity reports), Perplexity API / Tavily (current news and web research), LangChain research agent (multi-source data collection orchestration), Anthropic API / OpenAI API (structured narrative generation from data), Notion / Confluence API (report delivery and storage)

Cost: Financial data API costs vary significantly — Alpha Vantage free tier is limited; Bloomberg API is enterprise-priced. LLM generation cost for a 10-page report is $0.50-3.00 depending on model. Primary investment is in report template design and data pipeline reliability.

---

## Iterative Code Generation & Debugging Agent
ID: agents-coding-iterative-code-generation
Category: Autonomous Agents & Orchestration > Code & Development Agents
Complexity: medium | Phase: production
Modalities: code, text (requirements, error messages)

When to use: Tasks where correctness can be verified by execution — code that runs tests, produces checkable outputs, or has a clear pass/fail criterion. Ideal for TDD workflows (write tests first, implement to pass), bug reproduction and patching, and any coding task where the feedback loop can be automated. Best when the scope is bounded to a single function, class, or small module.

When NOT to use: Architectural decisions that require design judgement before implementation — the agent will implement a wrong architecture iteratively and produce confident-looking wrong code. Long-running test suites (>5 min per run) where the iteration loop becomes too slow and expensive. Production hotfixes where the fix-and-test loop needs to happen in a safe environment, not against live systems.

Key tools: Claude Code / GitHub Copilot Workspace (native iterative coding environments), Docker (sandboxed execution environment for agent-generated code), pytest / Jest (test frameworks the agent targets), E2B / Modal (cloud sandboxed code execution for agent loops), Git (version control for agent-generated diffs)

Cost: Inference cost scales with iteration count — a 10-round debugging loop at GPT-4 prices costs $0.50-5.00 depending on context length. Sandbox execution cost (E2B, Modal) is $0.10-1.00/hour. Primary productivity gain is measured in developer hours saved vs. cost.

---

## Repository-Level Refactoring Agent
ID: agents-coding-repo-level-refactoring-agent
Category: Autonomous Agents & Orchestration > Code & Development Agents
Complexity: high | Phase: production (with human PR review)
Modalities: code (multi-file), structured data (AST)

When to use: Large-scale codebase transformations where manual execution is error-prone due to volume — dependency upgrades with downstream breakage, API migration across a full repo, large-scale rename or restructure, dead code elimination with safety checking. Best when the transformation has a clear definition and correctness can be verified by test suite passage.

When NOT to use: Refactoring without a comprehensive test suite — the agent will make changes that appear to compile but break untested behaviour. Architectural refactoring where the transformation requires design judgement (microservices extraction, schema redesign) — define the target architecture first, then use the agent to execute. Repos with insufficient CI/CD infrastructure to validate agent changes safely.

Key tools: Claude Code (multi-file editing with codebase context), OpenRewrite (Java/Kotlin automated refactoring recipes), ast-grep / Comby (AST-level search-and-replace for safe structural transformations), GitHub Actions / GitLab CI (automated test validation of agent PRs), Semgrep (custom static analysis rules to verify refactoring postconditions)

Cost: Inference cost is high due to large codebase context windows — a 100k-token repo refactoring run can cost $5-50 in API fees. Primary value is measured against engineer-days for manual refactoring. Requires investment in test infrastructure to validate safely.

---

## DevOps & CI/CD Automation Agent
ID: agents-coding-devops-cicd-agent
Category: Autonomous Agents & Orchestration > Code & Development Agents
Complexity: high | Phase: production (diagnosis and recommendation) / experimental (autonomous remediation)
Modalities: structured data (logs, metrics), code (IaC, scripts), text (incident summaries)

When to use: DevOps teams with alert fatigue and manual triage bottlenecks. Most mature use cases: CI/CD failure triage that synthesises logs and test output to identify root cause; incident response that correlates monitoring alerts with deployment events; IaC generation for standardised infrastructure patterns. Best when runbooks are well-documented and remediations are well-bounded.

When NOT to use: Autonomous production changes without a human approval gate — an incident response agent with write access to production infrastructure that misdiagnoses a failure can cause cascading outages. Cloud cost optimisation agents that right-size or terminate instances without capacity planning review. Early-stage teams without established runbooks to encode into the agent.

Key tools: PagerDuty / OpsGenie API (alert intake and incident management), Datadog / Grafana (monitoring and log data source), Terraform / Pulumi (IaC target for infrastructure provisioning agent), GitHub Actions / GitLab CI API (CI pipeline integration and failure retrieval), LangChain Tools (multi-tool orchestration for diagnosis workflows)

Cost: Inference cost is modest relative to incident impact avoided. Primary investment is in log ingestion pipelines, runbook encoding, and testing the agent against historical incidents. Autonomous remediation requires additional investment in change management and rollback infrastructure.

---

## Code Review & Security Analysis Agent
ID: agents-coding-code-review-security-agent
Category: Autonomous Agents & Orchestration > Code & Development Agents
Complexity: medium | Phase: production
Modalities: code (diff, full file), structured data (CVE data, metrics)

When to use: Engineering teams with PR review bottlenecks or security expertise gaps. Highest-value use cases: security vulnerability scanning on every PR without requiring a security engineer in every review; dependency audit with CVE correlation; code quality gate enforcement for teams with inconsistent review standards. Best deployed as a CI integration, not a standalone tool.

When NOT to use: Replacing human code review entirely — the agent misses architecture-level concerns, business logic errors, and design trade-offs that require domain context. Security-critical code paths (auth, encryption, payment processing) where automated review is a complement to, not a replacement for, expert security review. Teams without CI/CD where the agent has no integration point.

Key tools: GitHub Copilot code review / CodeRabbit (native PR review agents), Semgrep (SAST with custom rule libraries for security scanning), Snyk / Dependabot (dependency vulnerability scanning), SonarQube / CodeClimate (code quality metrics and gate enforcement), OWASP dependency-check (open-source dependency security scanning)

Cost: SaaS tools (CodeRabbit, Snyk) run $10-50/developer/month. Self-hosted Semgrep is free for open-source rules. LLM review cost for a 200-line diff is <$0.10. Total cost is modest relative to a single security incident avoided.

---

## Parallel Specialised Sub-Agent Orchestration
ID: agents-multi-parallel-specialised-subagents
Category: Autonomous Agents & Orchestration > Agent Orchestration & Coordination
Complexity: high | Phase: production (select use cases)
Modalities: varies by sub-agent domain

When to use: Tasks that decompose into independent sub-tasks that can execute simultaneously without shared state — research tasks with multiple sub-domains, document processing at scale, competitive analysis across multiple competitors, report generation with parallel data collection sections. Best when latency reduction from parallelism justifies orchestration overhead and sub-tasks have well-defined interfaces.

When NOT to use: Tasks with tight sequential dependencies where step N requires step N-1's output — parallelism adds complexity without latency benefit. Tasks where the orchestrator would spend more tokens coordinating than the sub-agents spend executing. Simple tasks that fit in a single agent context — don't add orchestration overhead unnecessarily.

Key tools: LangGraph (stateful multi-agent orchestration with fan-out/fan-in patterns), Claude Code with SendMessage / Agent tool (native CC parallel agent dispatch), Temporal (durable workflow execution for parallel sub-agent coordination), Redis / message queues (sub-agent result aggregation), Pydantic (structured output schema enforcement across sub-agents)

Cost: Inference cost scales linearly with sub-agent count but latency is bounded by the slowest sub-agent (not additive). Orchestration overhead (coordinator token cost) is typically 10-20% of total cost. For time-sensitive tasks, parallelism can reduce wall-clock time by 3-10x at equivalent cost.

---

## Multi-Agent Debate & Consensus
ID: agents-multi-debate-consensus-agent
Category: Autonomous Agents & Orchestration > Agent Orchestration & Coordination
Complexity: high | Phase: production (QA use cases) / experimental (autonomous decision-making)
Modalities: text (argumentation), structured data (findings, verdicts)

When to use: High-stakes decisions where overconfidence or blind spots in a single agent are unacceptable — code security review, factual claim verification, medical or legal reasoning tasks, LLM evaluation at scale. Also valuable for quality assurance pipelines where a single-agent reviewer is too easy to game or too inconsistent. Best when the debate structure maps clearly to opposing perspectives (attack/defend, pro/con, multiple independent evaluators).

When NOT to use: Routine tasks where a single confident agent suffices — debate adds latency and cost (2-3x minimum) without benefit for low-stakes outputs. Tasks without a verifiable ground truth or clear quality criteria — debate without criteria produces the most confident argument, not the correct one. Real-time latency-sensitive applications where multi-round debate is too slow.

Key tools: LangGraph (stateful adversarial agent loop orchestration), Claude Code agents (native forked context for isolated debate participants), Anthropic API with multiple system prompts (attacker/defender persona isolation), Redis / shared filesystem (structured artifact passing between debaters), Pydantic (schema-enforced structured debate turn format)

Cost: 2-5x cost of single-agent equivalent (multiple agents, multiple rounds). For N rounds with 2 debaters, cost is O(N * context_growth). Long debates with large contexts become expensive quickly — budget 3 rounds maximum for most use cases. Justified by quality lift on high-stakes outputs.

---

## Human-in-the-Loop Escalation Orchestration
ID: agents-multi-human-in-loop-escalation
Category: Autonomous Agents & Orchestration > Agent Orchestration & Coordination
Complexity: medium-high | Phase: production
Modalities: structured data (approval requests), text (explanation/context for reviewers)

When to use: Regulated workflows where human sign-off is legally required (financial approvals, medical decisions, compliance attestations). Also valuable for any AI workflow where confidence varies — a HITL gate that only triggers on low-confidence outputs gets the best of both worlds (automation at scale, human review where it matters). Best when human review can happen async without blocking the pipeline.

When NOT to use: Fully automatable tasks where adding human gates creates more cost than value — review latency can kill workflows that require fast turnaround. Organisations where human reviewers are already the bottleneck — adding more AI output for the same humans to review doesn't help. Use cases where the human reviewer lacks the expertise to add value over the AI (rare but real).

Key tools: Scale AI / Labelbox (managed human review with quality infrastructure), Slack / email webhooks (lightweight async human approval channel), Temporal (durable workflow with human task wait states), LangGraph (agent graph with explicit human node integration), AWS Step Functions Human Approval Task (managed approval gate pattern)

Cost: Human review cost dominates — $0.10-5.00 per review item depending on task complexity and whether managed (Scale AI) or internal (staff time). Threshold tuning is the primary lever: a 90% confidence threshold routes 10% to humans; a 70% threshold routes 30%. Calibrate thresholds against human accuracy on the same task to ensure humans add value.

---

## Long-Horizon Planning & Goal Decomposition
ID: agents-multi-long-horizon-planning-agent
Category: Autonomous Agents & Orchestration > Agent Orchestration & Coordination
Complexity: very high | Phase: experimental (most use cases) / production (specific bounded workflows)
Modalities: varies by domain (text, code, structured data)

When to use: Complex multi-day or multi-week tasks that require planning, state persistence across sessions, and replanning on failure or new information — autonomous research projects, software development tasks spanning multiple work sessions, strategic planning workflows, complex procurement or project management workflows. Best when the task decomposition is too complex for a human to coordinate manually at speed.

When NOT to use: Tasks completable in a single session — long-horizon architecture adds overhead without benefit. Tasks with highly dynamic environments where the plan becomes obsolete faster than it can be executed. High-stakes irreversible actions where an autonomous planning agent making mistakes over multiple days can cause compounding damage before detection.

Key tools: LangGraph (stateful agent graph with persistence across sessions), Temporal (durable workflow execution with long-running timers and state), Redis / PostgreSQL (persistent agent memory and task state store), Anthropic API with extended context (large context for plan + history), Claude Code with background agents (native long-running agent infrastructure)

Cost: Cost is dominated by context accumulation over time — a 10-day task with daily replanning passes large contexts repeatedly. Budget $10-100+ for complex long-horizon runs. Primary engineering cost is in state management, checkpointing, and recovery infrastructure. Requires robust cost monitoring with budget caps.

---