Learn-skills.dev bedrock-knowledge-bases
Amazon Bedrock Knowledge Bases for RAG (Retrieval-Augmented Generation). Create knowledge bases with vector stores, ingest data from S3/web/Confluence/SharePoint, configure chunking strategies, query with retrieve and generate APIs, manage sessions. Use when building RAG applications, implementing semantic search, creating document Q&A systems, integrating knowledge bases with agents, optimizing chunking for accuracy, or querying enterprise knowledge.
git clone https://github.com/NeverSight/learn-skills.dev
T=$(mktemp -d) && git clone --depth=1 https://github.com/NeverSight/learn-skills.dev "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/skills-md/adaptationio/skrillz/bedrock-knowledge-bases" ~/.claude/skills/neversight-learn-skills-dev-bedrock-knowledge-bases && rm -rf "$T"
data/skills-md/adaptationio/skrillz/bedrock-knowledge-bases/SKILL.mdAmazon Bedrock Knowledge Bases
Amazon Bedrock Knowledge Bases is a fully managed RAG (Retrieval-Augmented Generation) solution that handles data ingestion, embedding generation, vector storage, retrieval with reranking, source attribution, and session context management.
Overview
What It Does
Amazon Bedrock Knowledge Bases provides:
- Data Ingestion: Automatically process documents from S3, web, Confluence, SharePoint, Salesforce
- Embedding Generation: Convert text to vectors using foundation models
- Vector Storage: Store embeddings in multiple vector database options
- Retrieval: Semantic and hybrid search with metadata filtering
- Generation: RAG workflows with source attribution
- Session Management: Multi-turn conversations with context
- Chunking Strategies: Fixed, semantic, hierarchical, and custom chunking
When to Use This Skill
Use this skill when you need to:
- Build RAG applications for document Q&A
- Implement semantic search over enterprise knowledge
- Create chatbots with knowledge bases
- Integrate retrieval with Bedrock Agents
- Configure optimal chunking strategies
- Query documents with source attribution
- Manage multi-turn conversations with context
- Optimize RAG performance and cost
Key Capabilities
- Multiple Vector Store Options: OpenSearch, S3 Vectors, Neptune, Pinecone, MongoDB, Redis
- Flexible Data Sources: S3, web crawlers, Confluence, SharePoint, Salesforce
- Advanced Chunking: Fixed-size, semantic, hierarchical, custom Lambda
- Hybrid Search: Combine semantic (vector) and keyword search
- Session Management: Built-in conversation context tracking
- GraphRAG: Relationship-aware retrieval with Neptune Analytics
- Cost Optimization: S3 Vectors for up to 90% storage savings
Quick Start
Basic RAG Workflow
import boto3 import json # Initialize clients bedrock_agent = boto3.client('bedrock-agent', region_name='us-east-1') bedrock_agent_runtime = boto3.client('bedrock-agent-runtime', region_name='us-east-1') # 1. Create Knowledge Base kb_response = bedrock_agent.create_knowledge_base( name='enterprise-docs-kb', description='Company documentation knowledge base', roleArn='arn:aws:iam::123456789012:role/BedrockKBRole', knowledgeBaseConfiguration={ 'type': 'VECTOR', 'vectorKnowledgeBaseConfiguration': { 'embeddingModelArn': 'arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v2:0' } }, storageConfiguration={ 'type': 'OPENSEARCH_SERVERLESS', 'opensearchServerlessConfiguration': { 'collectionArn': 'arn:aws:aoss:us-east-1:123456789012:collection/kb-collection', 'vectorIndexName': 'bedrock-knowledge-base-index', 'fieldMapping': { 'vectorField': 'bedrock-knowledge-base-default-vector', 'textField': 'AMAZON_BEDROCK_TEXT_CHUNK', 'metadataField': 'AMAZON_BEDROCK_METADATA' } } } ) knowledge_base_id = kb_response['knowledgeBase']['knowledgeBaseId'] print(f"Knowledge Base ID: {knowledge_base_id}") # 2. Add S3 Data Source ds_response = bedrock_agent.create_data_source( knowledgeBaseId=knowledge_base_id, name='s3-documents', description='Company documents from S3', dataSourceConfiguration={ 'type': 'S3', 's3Configuration': { 'bucketArn': 'arn:aws:s3:::my-docs-bucket', 'inclusionPrefixes': ['documents/'] } }, vectorIngestionConfiguration={ 'chunkingConfiguration': { 'chunkingStrategy': 'FIXED_SIZE', 'fixedSizeChunkingConfiguration': { 'maxTokens': 512, 'overlapPercentage': 20 } } } ) data_source_id = ds_response['dataSource']['dataSourceId'] # 3. Start Ingestion ingestion_response = bedrock_agent.start_ingestion_job( knowledgeBaseId=knowledge_base_id, dataSourceId=data_source_id, description='Initial document ingestion' ) print(f"Ingestion Job ID: {ingestion_response['ingestionJob']['ingestionJobId']}") # 4. Query with Retrieve and Generate response = bedrock_agent_runtime.retrieve_and_generate( input={ 'text': 'What is our vacation policy?' }, retrieveAndGenerateConfiguration={ 'type': 'KNOWLEDGE_BASE', 'knowledgeBaseConfiguration': { 'knowledgeBaseId': knowledge_base_id, 'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0', 'retrievalConfiguration': { 'vectorSearchConfiguration': { 'numberOfResults': 5, 'overrideSearchType': 'HYBRID' } } } } ) print(f"Answer: {response['output']['text']}") print(f"\nSources:") for citation in response['citations']: for reference in citation['retrievedReferences']: print(f" - {reference['location']['s3Location']['uri']}")
Vector Store Options
1. Amazon OpenSearch Serverless
Best for: Production RAG applications with auto-scaling requirements
Benefits:
- Fully managed, serverless operation
- Auto-scaling compute and storage
- High availability with multi-AZ deployment
- Fast query performance
Configuration:
storageConfiguration={ 'type': 'OPENSEARCH_SERVERLESS', 'opensearchServerlessConfiguration': { 'collectionArn': 'arn:aws:aoss:us-east-1:123456789012:collection/kb-collection', 'vectorIndexName': 'bedrock-knowledge-base-index', 'fieldMapping': { 'vectorField': 'bedrock-knowledge-base-default-vector', 'textField': 'AMAZON_BEDROCK_TEXT_CHUNK', 'metadataField': 'AMAZON_BEDROCK_METADATA' } } }
2. Amazon S3 Vectors (Preview)
Best for: Cost-optimized, large-scale RAG applications
Benefits:
- Up to 90% cost reduction for vector storage
- Built-in vector support in S3
- Subsecond query performance
- Massive scale and durability
Ideal Use Cases:
- Large document collections (millions of chunks)
- Cost-sensitive applications
- Archival knowledge bases
- Low-to-medium QPS workloads
Configuration:
storageConfiguration={ 'type': 'S3_VECTORS', 's3VectorsConfiguration': { 'bucketArn': 'arn:aws:s3:::my-vector-bucket', 'prefix': 'vectors/' } }
Limitations:
- Still in preview (no CloudFormation/CDK support yet)
- Not suitable for high QPS, millisecond-latency requirements
- Best for cost optimization over ultra-low latency
3. Amazon Neptune Analytics (GraphRAG)
Best for: Interconnected knowledge domains requiring relationship-aware retrieval
Benefits:
- Automatic graph creation linking related content
- Improved retrieval accuracy through relationships
- Comprehensive responses leveraging knowledge graph
- Explainable results with relationship context
Use Cases:
- Legal document analysis with case precedents
- Scientific research with paper citations
- Product catalogs with dependencies
- Organizational knowledge with team relationships
Configuration:
storageConfiguration={ 'type': 'NEPTUNE_ANALYTICS', 'neptuneAnalyticsConfiguration': { 'graphArn': 'arn:aws:neptune-graph:us-east-1:123456789012:graph/g-12345678', 'vectorSearchConfiguration': { 'vectorField': 'embedding' } } }
4. Amazon OpenSearch Service Managed Cluster
Best for: Existing OpenSearch infrastructure, advanced customization
Configuration:
storageConfiguration={ 'type': 'OPENSEARCH_SERVICE', 'opensearchServiceConfiguration': { 'clusterArn': 'arn:aws:es:us-east-1:123456789012:domain/my-domain', 'vectorIndexName': 'bedrock-kb-index', 'fieldMapping': { 'vectorField': 'embedding', 'textField': 'text', 'metadataField': 'metadata' } } }
5. Third-Party Vector Databases
Pinecone:
storageConfiguration={ 'type': 'PINECONE', 'pineconeConfiguration': { 'connectionString': 'https://my-index-abc123.svc.us-west1-gcp.pinecone.io', 'credentialsSecretArn': 'arn:aws:secretsmanager:us-east-1:123456789012:secret:pinecone-api-key', 'namespace': 'bedrock-kb', 'fieldMapping': { 'textField': 'text', 'metadataField': 'metadata' } } }
MongoDB Atlas:
storageConfiguration={ 'type': 'MONGODB_ATLAS', 'mongoDbAtlasConfiguration': { 'endpoint': 'https://cluster0.mongodb.net', 'credentialsSecretArn': 'arn:aws:secretsmanager:us-east-1:123456789012:secret:mongodb-creds', 'databaseName': 'bedrock_kb', 'collectionName': 'vectors', 'vectorIndexName': 'vector_index', 'fieldMapping': { 'vectorField': 'embedding', 'textField': 'text', 'metadataField': 'metadata' } } }
Redis Enterprise Cloud:
storageConfiguration={ 'type': 'REDIS_ENTERPRISE_CLOUD', 'redisEnterpriseCloudConfiguration': { 'endpoint': 'redis-12345.c1.us-east-1-2.ec2.cloud.redislabs.com:12345', 'credentialsSecretArn': 'arn:aws:secretsmanager:us-east-1:123456789012:secret:redis-creds', 'vectorIndexName': 'bedrock-kb-index', 'fieldMapping': { 'vectorField': 'embedding', 'textField': 'text', 'metadataField': 'metadata' } } }
Data Source Configuration
1. Amazon S3
Supported File Types: PDF, TXT, MD, HTML, DOC, DOCX, CSV, XLS, XLSX
bedrock_agent.create_data_source( knowledgeBaseId=knowledge_base_id, name='s3-technical-docs', description='Technical documentation from S3', dataSourceConfiguration={ 'type': 'S3', 's3Configuration': { 'bucketArn': 'arn:aws:s3:::my-docs-bucket', 'inclusionPrefixes': ['docs/technical/', 'docs/manuals/'], 'exclusionPrefixes': ['docs/archive/'] } } )
2. Web Crawler
Automatic website scraping and indexing:
bedrock_agent.create_data_source( knowledgeBaseId=knowledge_base_id, name='company-website', description='Public company website content', dataSourceConfiguration={ 'type': 'WEB', 'webConfiguration': { 'sourceConfiguration': { 'urlConfiguration': { 'seedUrls': [ {'url': 'https://www.example.com/docs'}, {'url': 'https://www.example.com/blog'} ] } }, 'crawlerConfiguration': { 'crawlerLimits': { 'rateLimit': 300 # Pages per minute } } } } )
3. Confluence
bedrock_agent.create_data_source( knowledgeBaseId=knowledge_base_id, name='confluence-wiki', description='Company Confluence knowledge base', dataSourceConfiguration={ 'type': 'CONFLUENCE', 'confluenceConfiguration': { 'sourceConfiguration': { 'hostUrl': 'https://company.atlassian.net/wiki', 'hostType': 'SAAS', 'authType': 'BASIC', 'credentialsSecretArn': 'arn:aws:secretsmanager:us-east-1:123456789012:secret:confluence-creds' }, 'crawlerConfiguration': { 'filterConfiguration': { 'type': 'PATTERN', 'patternObjectFilter': { 'filters': [ { 'objectType': 'Space', 'inclusionFilters': ['Engineering', 'Product'], 'exclusionFilters': ['Archive'] } ] } } } } } )
4. SharePoint
bedrock_agent.create_data_source( knowledgeBaseId=knowledge_base_id, name='sharepoint-docs', description='SharePoint document library', dataSourceConfiguration={ 'type': 'SHAREPOINT', 'sharePointConfiguration': { 'sourceConfiguration': { 'siteUrls': [ 'https://company.sharepoint.com/sites/Engineering', 'https://company.sharepoint.com/sites/Product' ], 'tenantId': 'tenant-id', 'domain': 'company', 'authType': 'OAUTH2_CLIENT_CREDENTIALS', 'credentialsSecretArn': 'arn:aws:secretsmanager:us-east-1:123456789012:secret:sharepoint-creds' } } } )
5. Salesforce
bedrock_agent.create_data_source( knowledgeBaseId=knowledge_base_id, name='salesforce-knowledge', description='Salesforce knowledge articles', dataSourceConfiguration={ 'type': 'SALESFORCE', 'salesforceConfiguration': { 'sourceConfiguration': { 'hostUrl': 'https://company.my.salesforce.com', 'authType': 'OAUTH2_CLIENT_CREDENTIALS', 'credentialsSecretArn': 'arn:aws:secretsmanager:us-east-1:123456789012:secret:salesforce-creds' }, 'crawlerConfiguration': { 'filterConfiguration': { 'type': 'PATTERN', 'patternObjectFilter': { 'filters': [ { 'objectType': 'Knowledge', 'inclusionFilters': ['Product_Documentation', 'Support_Articles'] } ] } } } } } )
Chunking Strategies
1. Fixed-Size Chunking
Best for: Simple documents with uniform structure
How it works: Splits text into chunks of fixed token size with overlap
Parameters:
: 200-8192 tokens (typically 512-1024)maxTokens
: 10-50% (typically 20%)overlapPercentage
Configuration:
vectorIngestionConfiguration={ 'chunkingConfiguration': { 'chunkingStrategy': 'FIXED_SIZE', 'fixedSizeChunkingConfiguration': { 'maxTokens': 512, 'overlapPercentage': 20 } } }
Use Cases:
- Blog posts and articles
- Technical documentation with consistent formatting
- FAQs and Q&A content
- Simple text files
Pros:
- Fast and predictable
- No additional costs
- Easy to tune
Cons:
- May split semantic units awkwardly
- Doesn't respect document structure
- Can break context mid-sentence
2. Semantic Chunking
Best for: Documents without clear boundaries (legal, technical, academic)
How it works: Uses sentence similarity to group related content
Parameters:
: 20-8192 tokens (typically 300-500)maxTokens
: Number of neighboring sentences (default: 1)bufferSize
: Similarity threshold (recommended: 95%)breakpointPercentileThreshold
Configuration:
vectorIngestionConfiguration={ 'chunkingConfiguration': { 'chunkingStrategy': 'SEMANTIC', 'semanticChunkingConfiguration': { 'maxTokens': 300, 'bufferSize': 1, 'breakpointPercentileThreshold': 95 } } }
Use Cases:
- Legal documents and contracts
- Academic papers
- Technical specifications
- Medical records
- Research reports
Pros:
- Preserves semantic meaning
- Better context preservation
- Improved retrieval accuracy
Cons:
- Additional cost (foundation model usage)
- Slower ingestion
- Less predictable chunk sizes
Cost Consideration: Semantic chunking uses foundation models for similarity analysis, incurring additional costs beyond storage and retrieval.
3. Hierarchical Chunking
Best for: Complex documents with nested structure
How it works: Creates parent and child chunks; retrieves child, returns parent for context
Parameters:
: Array of chunk sizes (parent → child)levelConfigurations
: Overlap between chunksoverlapTokens
Configuration:
vectorIngestionConfiguration={ 'chunkingConfiguration': { 'chunkingStrategy': 'HIERARCHICAL', 'hierarchicalChunkingConfiguration': { 'levelConfigurations': [ { 'maxTokens': 1500 # Parent chunk (comprehensive context) }, { 'maxTokens': 300 # Child chunk (focused retrieval) } ], 'overlapTokens': 60 } } }
Use Cases:
- Technical manuals with sections and subsections
- Academic papers with abstract, sections, and subsections
- Legal documents with articles and clauses
- Product documentation with categories and details
How Retrieval Works:
- Query matches against child chunks (fast, focused)
- Returns parent chunks (comprehensive context)
- Best of both: precision retrieval + complete context
Pros:
- Optimal balance of precision and context
- Excellent for nested documents
- Better accuracy for complex queries
Cons:
- More complex configuration
- Larger storage footprint
- Requires understanding of document structure
4. Custom Chunking (Lambda)
Best for: Specialized domain logic, custom parsing requirements
How it works: Invoke Lambda function for custom chunking logic
Configuration:
vectorIngestionConfiguration={ 'chunkingConfiguration': { 'chunkingStrategy': 'NONE' # Custom via Lambda }, 'customTransformationConfiguration': { 'intermediateStorage': { 's3Location': { 'uri': 's3://my-kb-bucket/intermediate/' } }, 'transformations': [ { 'stepToApply': 'POST_CHUNKING', 'transformationFunction': { 'transformationLambdaConfiguration': { 'lambdaArn': 'arn:aws:lambda:us-east-1:123456789012:function:custom-chunker' } } } ] } }
Example Lambda Handler:
# Lambda function for custom chunking import json def lambda_handler(event, context): """ Custom chunking logic for specialized documents Input: event contains document content and metadata Output: array of chunks with text and metadata """ # Extract document content document = event['document'] content = document['content'] metadata = document.get('metadata', {}) # Custom chunking logic (example: split by custom delimiter) chunks = [] sections = content.split('---SECTION---') for idx, section in enumerate(sections): if section.strip(): chunks.append({ 'text': section.strip(), 'metadata': { **metadata, 'chunk_id': f'section_{idx}', 'chunk_type': 'custom_section' } }) return { 'chunks': chunks }
Use Cases:
- Medical records with structured sections (SOAP notes)
- Financial documents with tables and calculations
- Code documentation with code blocks and explanations
- Domain-specific formats (HL7, FHIR, etc.)
Pros:
- Complete control over chunking logic
- Can handle any document format
- Integrate domain expertise
Cons:
- Requires Lambda development and maintenance
- Additional operational complexity
- Harder to debug and iterate
Chunking Strategy Selection Guide
| Document Type | Recommended Strategy | Rationale |
|---|---|---|
| Blog posts, articles | Fixed-size | Simple, uniform structure |
| Legal documents | Semantic | Preserve legal reasoning flow |
| Technical manuals | Hierarchical | Nested sections and subsections |
| Academic papers | Hierarchical | Abstract, sections, subsections |
| FAQs | Fixed-size | Independent Q&A pairs |
| Medical records | Custom Lambda | Structured sections (SOAP, HL7) |
| Code documentation | Custom Lambda | Code blocks + explanations |
| Product catalogs | Fixed-size | Uniform product descriptions |
| Research reports | Semantic | Preserve research narrative |
Retrieval Operations
1. Retrieve API (Retrieval Only)
Returns raw retrieved chunks without generation.
Use Cases:
- Custom generation logic
- Debugging retrieval quality
- Building custom RAG pipelines
- Integrating with non-Bedrock models
bedrock_agent_runtime = boto3.client('bedrock-agent-runtime', region_name='us-east-1') response = bedrock_agent_runtime.retrieve( knowledgeBaseId='KB123456', retrievalQuery={ 'text': 'What are the benefits of hierarchical chunking?' }, retrievalConfiguration={ 'vectorSearchConfiguration': { 'numberOfResults': 5, 'overrideSearchType': 'HYBRID', # SEMANTIC, HYBRID 'filter': { 'andAll': [ { 'equals': { 'key': 'document_type', 'value': 'technical_guide' } }, { 'greaterThan': { 'key': 'publish_year', 'value': 2024 } } ] } } } ) # Process retrieved chunks for result in response['retrievalResults']: print(f"Score: {result['score']}") print(f"Content: {result['content']['text']}") print(f"Location: {result['location']}") print(f"Metadata: {result.get('metadata', {})}") print("---")
2. Retrieve and Generate API (RAG)
Returns generated response with source attribution.
Use Cases:
- Complete RAG workflows
- Question answering
- Document summarization
- Chatbots with knowledge bases
response = bedrock_agent_runtime.retrieve_and_generate( input={ 'text': 'Explain semantic chunking benefits and when to use it' }, retrieveAndGenerateConfiguration={ 'type': 'KNOWLEDGE_BASE', 'knowledgeBaseConfiguration': { 'knowledgeBaseId': 'KB123456', 'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0', 'retrievalConfiguration': { 'vectorSearchConfiguration': { 'numberOfResults': 5, 'overrideSearchType': 'HYBRID' } }, 'generationConfiguration': { 'inferenceConfig': { 'textInferenceConfig': { 'temperature': 0.7, 'maxTokens': 2048, 'topP': 0.9 } }, 'promptTemplate': { 'textPromptTemplate': '''You are a helpful assistant. Answer the user's question based on the provided context. Context: $search_results$ Question: $query$ Answer:''' } } } } ) print(f"Generated Response: {response['output']['text']}") print(f"\nSources:") for citation in response['citations']: for reference in citation['retrievedReferences']: print(f" - {reference['location']}") print(f" Relevance Score: {reference.get('score', 'N/A')}")
3. Multi-Turn Conversations with Session Management
Bedrock automatically manages conversation context across turns.
# First turn - creates session automatically response1 = bedrock_agent_runtime.retrieve_and_generate( input={ 'text': 'What is Amazon Bedrock Knowledge Bases?' }, retrieveAndGenerateConfiguration={ 'type': 'KNOWLEDGE_BASE', 'knowledgeBaseConfiguration': { 'knowledgeBaseId': 'KB123456', 'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0' } } ) session_id = response1['sessionId'] print(f"Session ID: {session_id}") print(f"Response: {response1['output']['text']}\n") # Follow-up turn - reuse session for context response2 = bedrock_agent_runtime.retrieve_and_generate( input={ 'text': 'What chunking strategies does it support?' }, retrieveAndGenerateConfiguration={ 'type': 'KNOWLEDGE_BASE', 'knowledgeBaseConfiguration': { 'knowledgeBaseId': 'KB123456', 'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0' } }, sessionId=session_id # Continue conversation with context ) print(f"Follow-up Response: {response2['output']['text']}") # Third turn response3 = bedrock_agent_runtime.retrieve_and_generate( input={ 'text': 'Which strategy would you recommend for legal documents?' }, retrieveAndGenerateConfiguration={ 'type': 'KNOWLEDGE_BASE', 'knowledgeBaseConfiguration': { 'knowledgeBaseId': 'KB123456', 'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0' } }, sessionId=session_id ) print(f"Third Response: {response3['output']['text']}")
4. Advanced Metadata Filtering
Filter retrieval by metadata attributes for precision.
response = bedrock_agent_runtime.retrieve( knowledgeBaseId='KB123456', retrievalQuery={ 'text': 'Security best practices for production deployments' }, retrievalConfiguration={ 'vectorSearchConfiguration': { 'numberOfResults': 10, 'overrideSearchType': 'HYBRID', 'filter': { 'andAll': [ { 'equals': { 'key': 'document_type', 'value': 'security_guide' } }, { 'greaterThanOrEquals': { 'key': 'publish_year', 'value': 2024 } }, { 'in': { 'key': 'category', 'value': ['production', 'security', 'compliance'] } } ] } } } )
Supported Filter Operators:
: Exact matchequals
: Not equalnotEquals
,greaterThan
: Numeric comparisongreaterThanOrEquals
,lessThan
: Numeric comparisonlessThanOrEquals
: Match any value in arrayin
: Not match any value in arraynotIn
: String prefix matchstartsWith
: Combine filters with ANDandAll
: Combine filters with ORorAll
Ingestion Management
1. Start Ingestion Job
ingestion_response = bedrock_agent.start_ingestion_job( knowledgeBaseId=knowledge_base_id, dataSourceId=data_source_id, description='Monthly document sync', clientToken='unique-idempotency-token-123' ) job_id = ingestion_response['ingestionJob']['ingestionJobId'] print(f"Ingestion Job ID: {job_id}")
2. Monitor Ingestion Job
# Get job status job_status = bedrock_agent.get_ingestion_job( knowledgeBaseId=knowledge_base_id, dataSourceId=data_source_id, ingestionJobId=job_id ) print(f"Status: {job_status['ingestionJob']['status']}") print(f"Started: {job_status['ingestionJob']['startedAt']}") print(f"Updated: {job_status['ingestionJob']['updatedAt']}") if 'statistics' in job_status['ingestionJob']: stats = job_status['ingestionJob']['statistics'] print(f"Documents Scanned: {stats['numberOfDocumentsScanned']}") print(f"Documents Indexed: {stats['numberOfDocumentsIndexed']}") print(f"Documents Failed: {stats['numberOfDocumentsFailed']}") # Wait for completion import time while True: status = bedrock_agent.get_ingestion_job( knowledgeBaseId=knowledge_base_id, dataSourceId=data_source_id, ingestionJobId=job_id ) current_status = status['ingestionJob']['status'] if current_status in ['COMPLETE', 'FAILED']: print(f"Ingestion job {current_status}") break print(f"Status: {current_status}, waiting...") time.sleep(30)
3. List Ingestion Jobs
list_response = bedrock_agent.list_ingestion_jobs( knowledgeBaseId=knowledge_base_id, dataSourceId=data_source_id, maxResults=50 ) for job in list_response['ingestionJobSummaries']: print(f"Job ID: {job['ingestionJobId']}") print(f"Status: {job['status']}") print(f"Started: {job['startedAt']}") print(f"Updated: {job['updatedAt']}") print("---")
Integration with Bedrock Agents
1. Agent with Knowledge Base Action
bedrock_agent = boto3.client('bedrock-agent', region_name='us-east-1') # Create agent with knowledge base agent_response = bedrock_agent.create_agent( agentName='customer-support-agent', description='Customer support agent with knowledge base access', instruction='''You are a customer support agent. When answering questions: 1. Search the knowledge base for relevant information 2. Provide accurate answers based on retrieved context 3. Cite your sources 4. Admit when you don't know something''', foundationModel='anthropic.claude-3-sonnet-20240229-v1:0', agentResourceRoleArn='arn:aws:iam::123456789012:role/BedrockAgentRole' ) agent_id = agent_response['agent']['agentId'] # Associate knowledge base with agent kb_association = bedrock_agent.associate_agent_knowledge_base( agentId=agent_id, agentVersion='DRAFT', knowledgeBaseId='KB123456', description='Company documentation knowledge base', knowledgeBaseState='ENABLED' ) # Prepare and create alias bedrock_agent.prepare_agent(agentId=agent_id) alias_response = bedrock_agent.create_agent_alias( agentId=agent_id, agentAliasName='production', description='Production alias' ) agent_alias_id = alias_response['agentAlias']['agentAliasId'] # Invoke agent (automatically queries knowledge base) bedrock_agent_runtime = boto3.client('bedrock-agent-runtime', region_name='us-east-1') response = bedrock_agent_runtime.invoke_agent( agentId=agent_id, agentAliasId=agent_alias_id, sessionId='session-123', inputText='What is our return policy for defective products?' ) for event in response['completion']: if 'chunk' in event: chunk = event['chunk'] print(chunk['bytes'].decode())
2. Agent with Multiple Knowledge Bases
# Associate multiple knowledge bases bedrock_agent.associate_agent_knowledge_base( agentId=agent_id, agentVersion='DRAFT', knowledgeBaseId='KB-PRODUCT-DOCS', description='Product documentation' ) bedrock_agent.associate_agent_knowledge_base( agentId=agent_id, agentVersion='DRAFT', knowledgeBaseId='KB-SUPPORT-ARTICLES', description='Support knowledge articles' ) bedrock_agent.associate_agent_knowledge_base( agentId=agent_id, agentVersion='DRAFT', knowledgeBaseId='KB-COMPANY-POLICIES', description='Company policies and procedures' ) # Agent automatically searches all knowledge bases and combines results
Best Practices
1. Chunking Strategy Selection
Decision Framework:
-
Simple, uniform documents → Fixed-size chunking
- Blog posts, articles, simple FAQs
- Fast, predictable, cost-effective
-
Documents without clear boundaries → Semantic chunking
- Legal documents, contracts, academic papers
- Preserves semantic meaning, better accuracy
- Consider additional cost
-
Nested, hierarchical documents → Hierarchical chunking
- Technical manuals, product docs, research papers
- Best balance of precision and context
- Optimal for complex structures
-
Specialized formats → Custom Lambda chunking
- Medical records (HL7, FHIR), code docs, custom formats
- Complete control, domain expertise
- Higher operational complexity
Tuning Guidelines:
- Fixed-size: Start with 512 tokens, 20% overlap
- Semantic: Start with 300 tokens, bufferSize=1, threshold=95%
- Hierarchical: Parent 1500 tokens, child 300 tokens, overlap 60 tokens
- Custom: Test extensively with domain experts
2. Retrieval Optimization
Number of Results:
- Start with 5-10 results
- Increase if answers lack detail
- Decrease if too much noise
Search Type:
- SEMANTIC: Pure vector similarity (faster, good for conceptual queries)
- HYBRID: Vector + keyword (better recall, recommended for production)
Use Hybrid Search when:
- Queries contain specific terms or names
- Need to match exact keywords
- Domain has specialized vocabulary
Use Semantic Search when:
- Purely conceptual queries
- Prioritizing speed over perfect recall
- Well-embedded domain knowledge
Metadata Filters:
- Always use when applicable
- Dramatically improves precision
- Reduces retrieval latency
- Examples: document_type, publish_date, category, author
3. Cost Optimization
S3 Vectors:
- Use for large-scale knowledge bases (millions of chunks)
- Up to 90% cost savings vs. OpenSearch
- Ideal for cost-sensitive applications
- Trade-off: Slightly higher latency
Semantic Chunking:
- Incurs foundation model costs during ingestion
- Consider cost vs. accuracy benefit
- May not be worth it for simple documents
- Best for complex, high-value content
Ingestion Frequency:
- Schedule ingestion during off-peak hours
- Use incremental updates when possible
- Don't re-ingest unchanged documents
Model Selection:
- Use smaller embedding models when accuracy permits
- Titan Embed Text v2 is cost-effective
- Consider Cohere Embed for multilingual
Token Usage:
- Monitor generation token usage
- Set appropriate maxTokens limits
- Use prompt templates to control verbosity
4. Session Management
Always Reuse Sessions:
- Pass
for follow-up turnssessionId - Bedrock handles context automatically
- No manual conversation history needed
Session Lifecycle:
- Sessions expire after inactivity (default: 60 minutes)
- Create new session for unrelated conversations
- Use unique sessionId per user/conversation
Context Limits:
- Monitor conversation length
- Long sessions may hit context limits
- Consider summarization for very long conversations
5. GraphRAG with Neptune
When to Use:
- Interconnected knowledge domains
- Relationship-aware queries
- Need for explainability
- Complex knowledge graphs
Benefits:
- Automatic graph creation
- Improved accuracy through relationships
- Comprehensive answers
- Explainable results
Considerations:
- Higher setup complexity
- Neptune Analytics costs
- Best for domains with rich relationships
6. Data Source Management
S3 Best Practices:
- Organize with clear prefixes
- Use inclusion/exclusion filters
- Maintain consistent metadata
- Version documents when updating
Web Crawler:
- Set appropriate rate limits
- Use robots.txt for guidance
- Monitor for broken links
- Schedule regular re-crawls
Confluence/SharePoint:
- Filter by spaces/sites
- Exclude archived content
- Use fine-grained permissions
- Schedule incremental syncs
Metadata Enrichment:
- Add custom metadata to documents
- Include: document_type, publish_date, category, author, version
- Enables powerful filtering
- Improves retrieval precision
7. Monitoring and Debugging
Enable CloudWatch Logs:
# Monitor retrieval quality # Track: query latency, retrieval scores, generation quality # Set alarms for: high latency, low scores, high error rates
Test Retrieval Quality:
# Use retrieve API to debug response = bedrock_agent_runtime.retrieve( knowledgeBaseId='KB123456', retrievalQuery={'text': 'test query'} ) # Analyze retrieval scores for result in response['retrievalResults']: print(f"Score: {result['score']}") print(f"Content preview: {result['content']['text'][:200]}")
Common Issues:
-
Low Retrieval Scores:
- Check chunking strategy
- Verify embedding model
- Ensure documents are properly ingested
- Consider semantic or hierarchical chunking
-
Irrelevant Results:
- Add metadata filters
- Use hybrid search
- Refine chunking strategy
- Increase numberOfResults
-
Missing Information:
- Verify data source configuration
- Check ingestion job status
- Ensure documents are not excluded by filters
- Increase numberOfResults
-
Slow Retrieval:
- Use metadata filters to narrow scope
- Optimize vector database configuration
- Consider S3 Vectors for cost over latency
- Reduce numberOfResults
8. Security Best Practices
IAM Permissions:
- Use least privilege for Knowledge Base role
- Separate roles for data sources, ingestion, retrieval
- Enable VPC endpoints for private connectivity
Data Encryption:
- All data encrypted at rest (AWS KMS)
- Data encrypted in transit (TLS)
- Use customer-managed KMS keys for compliance
Access Control:
- Use IAM policies to control who can query
- Implement fine-grained access control
- Monitor access with CloudTrail
PII Handling:
- Use Bedrock Guardrails for PII redaction
- Implement data masking for sensitive fields
- Consider custom Lambda for advanced PII handling
Complete Production Example
End-to-End RAG Application
import boto3 import json from typing import List, Dict, Optional class BedrockKnowledgeBaseRAG: """Production RAG application with Amazon Bedrock Knowledge Bases""" def __init__(self, region_name: str = 'us-east-1'): self.bedrock_agent = boto3.client('bedrock-agent', region_name=region_name) self.bedrock_agent_runtime = boto3.client('bedrock-agent-runtime', region_name=region_name) def create_knowledge_base( self, name: str, description: str, role_arn: str, vector_store_config: Dict, embedding_model: str = 'amazon.titan-embed-text-v2:0' ) -> str: """Create knowledge base with vector store""" response = self.bedrock_agent.create_knowledge_base( name=name, description=description, roleArn=role_arn, knowledgeBaseConfiguration={ 'type': 'VECTOR', 'vectorKnowledgeBaseConfiguration': { 'embeddingModelArn': f'arn:aws:bedrock:us-east-1::foundation-model/{embedding_model}' } }, storageConfiguration=vector_store_config ) return response['knowledgeBase']['knowledgeBaseId'] def add_s3_data_source( self, knowledge_base_id: str, name: str, bucket_arn: str, inclusion_prefixes: List[str], chunking_strategy: str = 'FIXED_SIZE', chunking_config: Optional[Dict] = None ) -> str: """Add S3 data source with chunking configuration""" if chunking_config is None: chunking_config = { 'maxTokens': 512, 'overlapPercentage': 20 } vector_ingestion_config = { 'chunkingConfiguration': { 'chunkingStrategy': chunking_strategy } } if chunking_strategy == 'FIXED_SIZE': vector_ingestion_config['chunkingConfiguration']['fixedSizeChunkingConfiguration'] = chunking_config elif chunking_strategy == 'SEMANTIC': vector_ingestion_config['chunkingConfiguration']['semanticChunkingConfiguration'] = chunking_config elif chunking_strategy == 'HIERARCHICAL': vector_ingestion_config['chunkingConfiguration']['hierarchicalChunkingConfiguration'] = chunking_config response = self.bedrock_agent.create_data_source( knowledgeBaseId=knowledge_base_id, name=name, description=f'S3 data source: {name}', dataSourceConfiguration={ 'type': 'S3', 's3Configuration': { 'bucketArn': bucket_arn, 'inclusionPrefixes': inclusion_prefixes } }, vectorIngestionConfiguration=vector_ingestion_config ) return response['dataSource']['dataSourceId'] def ingest_data(self, knowledge_base_id: str, data_source_id: str) -> str: """Start ingestion job and wait for completion""" import time # Start ingestion response = self.bedrock_agent.start_ingestion_job( knowledgeBaseId=knowledge_base_id, dataSourceId=data_source_id, description='Automated ingestion' ) job_id = response['ingestionJob']['ingestionJobId'] # Wait for completion while True: status_response = self.bedrock_agent.get_ingestion_job( knowledgeBaseId=knowledge_base_id, dataSourceId=data_source_id, ingestionJobId=job_id ) status = status_response['ingestionJob']['status'] if status == 'COMPLETE': print(f"Ingestion completed successfully") if 'statistics' in status_response['ingestionJob']: stats = status_response['ingestionJob']['statistics'] print(f"Documents indexed: {stats.get('numberOfDocumentsIndexed', 0)}") break elif status == 'FAILED': print(f"Ingestion failed") break print(f"Ingestion status: {status}") time.sleep(30) return job_id def query( self, knowledge_base_id: str, query: str, model_arn: str = 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0', num_results: int = 5, search_type: str = 'HYBRID', metadata_filter: Optional[Dict] = None, session_id: Optional[str] = None ) -> Dict: """Query knowledge base with retrieve and generate""" retrieval_config = { 'type': 'KNOWLEDGE_BASE', 'knowledgeBaseConfiguration': { 'knowledgeBaseId': knowledge_base_id, 'modelArn': model_arn, 'retrievalConfiguration': { 'vectorSearchConfiguration': { 'numberOfResults': num_results, 'overrideSearchType': search_type } }, 'generationConfiguration': { 'inferenceConfig': { 'textInferenceConfig': { 'temperature': 0.7, 'maxTokens': 2048 } } } } } # Add metadata filter if provided if metadata_filter: retrieval_config['knowledgeBaseConfiguration']['retrievalConfiguration']['vectorSearchConfiguration']['filter'] = metadata_filter # Build request request = { 'input': {'text': query}, 'retrieveAndGenerateConfiguration': retrieval_config } # Add session if provided if session_id: request['sessionId'] = session_id response = self.bedrock_agent_runtime.retrieve_and_generate(**request) return { 'answer': response['output']['text'], 'citations': response.get('citations', []), 'session_id': response['sessionId'] } def multi_turn_conversation( self, knowledge_base_id: str, queries: List[str], model_arn: str = 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0' ) -> List[Dict]: """Execute multi-turn conversation with context""" session_id = None conversation = [] for query in queries: result = self.query( knowledge_base_id=knowledge_base_id, query=query, model_arn=model_arn, session_id=session_id ) session_id = result['session_id'] conversation.append({ 'query': query, 'answer': result['answer'], 'citations': result['citations'] }) return conversation # Example Usage if __name__ == '__main__': rag = BedrockKnowledgeBaseRAG(region_name='us-east-1') # Create knowledge base kb_id = rag.create_knowledge_base( name='production-docs-kb', description='Production documentation knowledge base', role_arn='arn:aws:iam::123456789012:role/BedrockKBRole', vector_store_config={ 'type': 'OPENSEARCH_SERVERLESS', 'opensearchServerlessConfiguration': { 'collectionArn': 'arn:aws:aoss:us-east-1:123456789012:collection/kb-collection', 'vectorIndexName': 'bedrock-kb-index', 'fieldMapping': { 'vectorField': 'bedrock-knowledge-base-default-vector', 'textField': 'AMAZON_BEDROCK_TEXT_CHUNK', 'metadataField': 'AMAZON_BEDROCK_METADATA' } } } ) # Add data source ds_id = rag.add_s3_data_source( knowledge_base_id=kb_id, name='technical-docs', bucket_arn='arn:aws:s3:::my-docs-bucket', inclusion_prefixes=['docs/'], chunking_strategy='HIERARCHICAL', chunking_config={ 'levelConfigurations': [ {'maxTokens': 1500}, {'maxTokens': 300} ], 'overlapTokens': 60 } ) # Ingest data rag.ingest_data(kb_id, ds_id) # Single query result = rag.query( knowledge_base_id=kb_id, query='What are the best practices for RAG applications?', metadata_filter={ 'equals': { 'key': 'document_type', 'value': 'best_practices' } } ) print(f"Answer: {result['answer']}") print(f"\nSources:") for citation in result['citations']: for ref in citation['retrievedReferences']: print(f" - {ref['location']}") # Multi-turn conversation conversation = rag.multi_turn_conversation( knowledge_base_id=kb_id, queries=[ 'What is hierarchical chunking?', 'When should I use it?', 'What are the configuration parameters?' ] ) for turn in conversation: print(f"\nQ: {turn['query']}") print(f"A: {turn['answer']}")
Related Skills
Amazon Bedrock Core Skills
- bedrock-guardrails: Content safety, PII redaction, hallucination detection
- bedrock-agents: Agentic workflows with tool use and knowledge bases
- bedrock-flows: Visual workflow builder for generative AI
- bedrock-model-customization: Fine-tuning, reinforcement fine-tuning, distillation
- bedrock-prompt-management: Prompt versioning and deployment
AWS Infrastructure Skills
- opensearch-serverless: Vector database configuration and management
- neptune-analytics: GraphRAG configuration and queries
- s3-management: S3 bucket configuration for data sources and vectors
- iam-bedrock: IAM roles and policies for Knowledge Bases
Observability Skills
- cloudwatch-bedrock-monitoring: Monitor Knowledge Bases metrics and logs
- bedrock-cost-optimization: Track and optimize Knowledge Bases costs
Additional Resources
Official Documentation
- Amazon Bedrock Knowledge Bases
- Knowledge Bases User Guide
- Chunking Strategies
- Boto3 Knowledge Bases API
Best Practices
Research Document
- Section 2 (Complete Knowledge Bases research)/mnt/c/data/github/skrillz/AMAZON-BEDROCK-COMPREHENSIVE-RESEARCH-2025.md