Improve productivity when processing scanned PDFs using Amazon Q Business | Amazon Web Services – AWS Blog
Amazon Q Businessis a generative AI-powered assistant that can answer questions, provide summaries, generate content, and extract insights directly from the content in digital as well as scanned PDF documents in your enterprise data sources without needing to extract the text first.
Customers across industries such as finance, insurance, healthcare life sciences, and more need to derive insights from various document types, such as receipts, healthcare plans, or tax statements, which are frequently in scanned PDF format. These document types often have a semi-structured or unstructured format, which requires processing to extract text before indexing with Amazon Q Business.
The launch of scanned PDF document support with Amazon Q Business can help you seamlessly process a variety of multi-modal document types through the AWS Management Console and APIs, across all supported Amazon Q Business AWS Regions. You can ingest documents, including scanned PDFs, from your data sources using supported connectors, index them, and then use the documents to answer questions, provide summaries, and generate content securely and accurately from your enterprise systems. This feature eliminates the development effort required to extract text from scanned PDF documents outside of Amazon Q Business, and improves the document processing pipeline for building your generative artificial intelligence (AI) assistant with Amazon Q Business.
In this post, we show how to asynchronously index and run real-time queries with scanned PDF documents using Amazon Q Business.
You can use Amazon Q Business for scanned PDF documents from the console, AWS SDKs, or AWS Command Line Interface (AWS CLI).
Amazon Q Business provides a versatile suite of data connectors that can integrate with a wide range of enterprise data sources, empowering you to develop generative AI solutions with minimal setup and configuration. To learn more, visit Amazon Q Business, now generally available, helps boost workforce productivity with generative AI.
After your Amazon Q Business application is ready to use, you can directly upload the scanned PDFs into an Amazon Q Business index using either the console or the APIs. Amazon Q Business offers multiple data source connectors that can integrate and synchronize data from multiple data repositories into single index. For this post, we demonstrate two scenarios to use documents: one with the direct document upload option, and another using the Amazon Simple Storage Service (Amazon S3) connector. If you need to ingest documents from other data sources, refer to Supported connectors for details on connecting additional data sources.
In this post, we use three scanned PDF documents as examples: an invoice, a health plan summary, and an employment verification form, along with some text documents.
The first step is to index these documents. Complete the following steps to index documents using the direct upload feature of Amazon Q Business. For this example, we upload the scanned PDFs.
You can monitor the uploaded files on the Data sources tab. The Upload status changes from Received to Processing to Indexed or Updated, as which point the file has been successfully indexed into the Amazon Q Business data store. The following screenshot shows the successfully indexed PDFs.
The following steps demonstrate how to integrate and synchronize documents using an Amazon S3 connector with Amazon Q Business. For this example, we index the text documents.
When the sync job is complete, your data source is ready to use. The following screenshot shows all five documents (scanned and digital PDFs, and text files) are successfully indexed.
The following screenshot shows a comprehensive view of the two data sources: the directly uploaded documents and the documents ingested through the Amazon S3 connector.
Now lets run some queries with Amazon Q Business on our data sources.
Your documents might be dense, unstructured, scanned PDF document types. Amazon Q Business can identify and extract the most salient information-dense text from it. In this example, we use the multi-page health plan summary PDF we indexed earlier. The following screenshot shows an example page.
This is an example of a health plan summary document.
In the Amazon Q Business web UI, we ask What is the annual total out-of-pocket maximum, mentioned in the health plan summary?
Amazon Q Business searches the indexed document, retrieves the relevant information, and generates an answer while citing the source for its information. The following screenshot shows the sample output.
Documents might also contain structured data elements in tabular format. Amazon Q Business can automatically identify, extract, and linearize structured data from scanned PDFs to accurately resolve any user queries. In the following example, we use the invoice PDF we indexed earlier. The following screenshot shows an example.
This is an example of an invoice.
In the Amazon Q Business web UI, we ask How much were the headphones charged in the invoice?
Amazon Q Business searches the indexed document and retrieves the answer with reference to the source document. The following screenshot shows that Amazon Q Business is able to extract bill information from the invoice.
Your documents might also contain semi-structured data elements in a form, such as key-value pairs. Amazon Q Business can accurately satisfy queries related to these data elements by extracting specific fields or attributes that are meaningful for the queries. In this example, we use the employment verification PDF. The following screenshot shows an example.
This is an example of an employment verification form.
In the Amazon Q Business web UI, we ask What is the applicants date of employment in the employment verification form? Amazon Q Business searches the indexed employment verification document and retrieves the answer with reference to the source document.
In this section, we show you how to use the AWS CLI to ingest structured and unstructured documents stored in an S3 bucket into an Amazon Q Business index. You can quickly retrieve detailed information about your documents, including their statuses and any errors occurred during indexing. If youre an existing Amazon Q Business user and have indexed documents in various formats, such as scanned PDFs and other supported types, and you now want to reindex the scanned documents, complete the following steps:
"errorMessage": "Document cannot be indexed since it contains no text to index and search on. Document must contain some text."
If youre a new user and havent indexed any documents, you can skip this step.
The following is an example of using the ListDocuments API to filter documents with a specific status and their error messages:
The following screenshot shows the AWS CLI output with a list of failed documents with error messages.
Now you batch-process the documents. Amazon Q Business supports adding one or more documents to an Amazon Q Business index.
The following screenshot shows the AWS CLI output. You should see failed documents as an empty list.
The following screenshot shows that the documents are indexed in the data source.
If you created a new Amazon Q Business application and dont plan to use it further, unsubscribe and remove assigned users from the application and delete it so that your AWS account doesnt accumulate costs. Moreover, if you dont need to use the indexed data sources further, refer to Managing Amazon Q Business data sources for instructions to delete your indexed data sources.
This post demonstrated the support for scanned PDF document types with Amazon Q Business. We highlighted the steps to sync, index, and query supported document typesnow including scanned PDF documentsusing generative AI with Amazon Q Business. We also showed examples of queries on structured, unstructured, or semi-structured multi-modal scanned documents using the Amazon Q Business web UI and AWS CLI.
To learn more about this feature, refer toSupported document formats in Amazon Q Business. Give it a try on theAmazon Q Business consoletoday! For more information, visitAmazon Q Businessand theAmazon Q Business User Guide. You can send feedback toAWS re:Post for Amazon Qor through your usual AWS support contacts.
Sonali Sahu is leading the Generative AI Specialist Solutions Architecture team in AWS. She is an author, thought leader, and passionate technologist. Her core area of focus is AI and ML, and she frequently speaks at AI and ML conferences and meetups around the world. She has both breadth and depth of experience in technology and the technology industry, with industry expertise in healthcare, the financial sector, and insurance.
Chinmayee Rane is a Generative AI Specialist Solutions Architect at AWS. She is passionate about applied mathematics and machine learning. She focuses on designing intelligent document processing and generative AI solutions for AWS customers. Outside of work, she enjoys salsa and bachata dancing.
Himesh Kumar is a seasoned Senior Software Engineer, currently working at Amazon Q Business in AWS. He is passionate about building distributed systems in the generative AI/ML space. His expertise extends to develop scalable and efficient systems, ensuring high availability, performance, and reliability. Beyond the technical skills, he is dedicated to continuous learning and staying at the forefront of technological advancements in AI and machine learning.
Qing Wei is a Senior Software Developer for Amazon Q Business team in AWS, and passionate about building modern applications using AWS technologies. He loves community-driven learning and sharing of technology especially for machine learning hosting and inference related topics. His main focus right now is on building serverless and event-driven architectures for RAG data ingestion.
- HS-SPME/GCMS and Machine Learning Enable Volatile Fingerprinting and Classification of Commercial Vinegars - Chromatography Online - April 12th, 2026 [April 12th, 2026]
- Role of Artificial Intelligence and Machine Learning in Diagnosing Knee Lesions: Where Are We Now? - Cureus - April 12th, 2026 [April 12th, 2026]
- CMML2AML: machine-learning discovery of co-mutations and specific single mutations predictive of blast transformation in chronic myelomonocytic... - April 12th, 2026 [April 12th, 2026]
- Machine-learning-based reconstruction of Ming-dynasty defensive corridors in Yuxian - Nature - April 12th, 2026 [April 12th, 2026]
- Have you published a disruptive paper? New machine-learning tool helps you check - Physics World - April 12th, 2026 [April 12th, 2026]
- Microsoft is automatically updating Windows 11 24H2 to 25H2 using machine learning - TweakTown - April 5th, 2026 [April 5th, 2026]
- Inside the Magic of Machine Learning That Powers Enemy AI in Arc Raiders - 80 Level - April 3rd, 2026 [April 3rd, 2026]
- We analyzed Philly street scenes and identified signs of gentrification using machine learning trained on longtime residents observations - The... - April 3rd, 2026 [April 3rd, 2026]
- Boston University To Apply Machine Learning To Alzheimers Biomarker And Cognitive Data - Quantum Zeitgeist - April 3rd, 2026 [April 3rd, 2026]
- Sony buys machine-learning company to help "enhance gameplay visuals, improve rendering techniques, and unlock new levels of visual... - April 3rd, 2026 [April 3rd, 2026]
- The Machine Learning Stack Is Being Rebuilt From Scratch Here's What Developers Need to Know in 2026 - HackerNoon - April 3rd, 2026 [April 3rd, 2026]
- Closing the Revenue Gap: Leveraging Machine Learning to Solve the $260 Billion Denial Crisis - vocal.media - April 3rd, 2026 [April 3rd, 2026]
- Machine Learning for Pharmaceuticals Set to Witness Rapid - openPR.com - April 3rd, 2026 [April 3rd, 2026]
- You Must Address These 4 Concerns To Deploy Predictive AI - Machine Learning Week US - March 30th, 2026 [March 30th, 2026]
- Google and the rise of space-based machine learning - Latitude Media - March 30th, 2026 [March 30th, 2026]
- Researchers use machine learning and social network theory to identify formation patterns in digital forums - techxplore.com - March 30th, 2026 [March 30th, 2026]
- Mayo Clinic Study Uses Wearables and Machine Learning to Predict COPD Rehab Participation - HIT Consultant - March 30th, 2026 [March 30th, 2026]
- Machine learning at the edge in retail: constraints and gains - IoT News - March 26th, 2026 [March 26th, 2026]
- AI agents are flashy, but machine learning still pays the bills - TechRadar - March 26th, 2026 [March 26th, 2026]
- Single-cell imaging and machine learning reveal hidden coordination in algae's response to light stress - Phys.org - March 26th, 2026 [March 26th, 2026]
- Machine learning analysis of CT scans - National Institutes of Health (.gov) - March 22nd, 2026 [March 22nd, 2026]
- TransUnion Machine Learning Fraud Tools Tested Against Weak Share Price Momentum - simplywall.st - March 22nd, 2026 [March 22nd, 2026]
- Machine learning could help predict how people with depression respond to treatment - Medical Xpress - March 22nd, 2026 [March 22nd, 2026]
- KR approves machine learning-based fuel reduction methodology - Smart Maritime Network - March 22nd, 2026 [March 22nd, 2026]
- Available solar energy in Andalusia will increase through the end of the century, machine learning model finds - Tech Xplore - March 22nd, 2026 [March 22nd, 2026]
- How Machine Learning Is Reshaping Environmental Policy and Water Governance - Devdiscourse - March 22nd, 2026 [March 22nd, 2026]
- Chemistry student uses machine learning to transform gene therapy production - The University of North Carolina at Chapel Hill - March 13th, 2026 [March 13th, 2026]
- AI and Machine Learning - City of Brownsville to build smart city safety solution - Smart Cities World - March 13th, 2026 [March 13th, 2026]
- AI and Machine Learning - London borough overhauls public safety infrastructure - Smart Cities World - March 13th, 2026 [March 13th, 2026]
- Titan Technology Corp. Responds to Alberta Innovates RFP AI, Machine Learning and Automation Services - TradingView - March 13th, 2026 [March 13th, 2026]
- Vietnam FPT's AI automation solution secures new machine learning patent on overseas market - VnExpress International - March 13th, 2026 [March 13th, 2026]
- AI Healthcare Technology: The Power of Machine Learning Diagnosis in Modern Medicine - Tech Times - March 13th, 2026 [March 13th, 2026]
- Future Perspectives: Key Trends Shaping the Machine Learning Market in Financial Services Until 2030 - openPR.com - March 13th, 2026 [March 13th, 2026]
- How to Build an Autonomous Machine Learning Research Loop in Google Colab Using Andrej Karpathys AutoResearch Framework for Hyperparameter Discovery... - March 13th, 2026 [March 13th, 2026]
- The Arc in Arc Raiders have multiple "brains," and they all love pursuing you because Embark gives them "rewards" in real-time via... - March 13th, 2026 [March 13th, 2026]
- OnPoint AI to Present its Augmented Reality and Machine Learning Surgical Platform at the 2026 Canaccord Genuity Musculoskeletal Conference - Yahoo... - February 27th, 2026 [February 27th, 2026]
- TD Bank continues to develop AI, machine learning tools - Auto Finance News - February 27th, 2026 [February 27th, 2026]
- AI and Machine Learning - Tech companies team to scale private 5G and physical AI - Smart Cities World - February 27th, 2026 [February 27th, 2026]
- AI and Machine Learning in Dating Apps: Smarter Matchmaking Algorithms - Programming Insider - February 27th, 2026 [February 27th, 2026]
- Machine-Learning App Helps Anesthesiologists Navigate Critical Surgical Equipment in Real Time - Carle Illinois College of Medicine - February 24th, 2026 [February 24th, 2026]
- Fractal Launches PiEvolve, an Evolutionary Agentic Engine for Autonomous Machine Learning and Scientific Discovery - Yahoo Finance - February 24th, 2026 [February 24th, 2026]
- How Brain Data and Machine Learning Could Transform the Aging Industry - gritdaily.com - February 24th, 2026 [February 24th, 2026]
- AI and machine learning trends for Arizona leaders to watch in healthcare delivery and traveler services - AZ Big Media - February 24th, 2026 [February 24th, 2026]
- AI and machine learning are the future of Wi-Fi management: WBA report - Telecompetitor - February 22nd, 2026 [February 22nd, 2026]
- Machine learning streamlines the complexities of making better proteins - Science News - February 20th, 2026 [February 20th, 2026]
- WBA Publishes Guidance on Artificial Intelligence and Machine Learning for Intelligent Wi-Fi - ARC Advisory Group - February 20th, 2026 [February 20th, 2026]
- Machine learning-predicted insulin resistance is a risk factor for 12 types of cancer - Nature - February 20th, 2026 [February 20th, 2026]
- Exploring Machine Learning at the DOF - University of the Philippines Diliman - February 20th, 2026 [February 20th, 2026]
- AI and Machine Learning - Where US agencies are finding measurable value from AI - Smart Cities World - February 20th, 2026 [February 20th, 2026]
- Modeling visual perception of Chinese classical private gardens with image parsing and interpretable machine learning - Nature - February 16th, 2026 [February 16th, 2026]
- Analysis of Market Segments and Major Growth Areas in the Machine Learning (ML) Feature Lineage Tools Market - openPR.com - February 16th, 2026 [February 16th, 2026]
- Apple Makes One Of Its Largest Ever Acquisitions, Buys The Israeli Machine Learning Firm, Q.ai - Wccftech - February 1st, 2026 [February 1st, 2026]
- Keysights Machine Learning Toolkit to Speed Device Modeling and PDK Dev - All About Circuits - February 1st, 2026 [February 1st, 2026]
- University of Missouri Study: AI/Machine Learning Improves Cardiac Risk Prediction Accuracy - Quantum Zeitgeist - February 1st, 2026 [February 1st, 2026]
- How AI and Machine Learning Are Transforming Mobile Banking Apps - vocal.media - February 1st, 2026 [February 1st, 2026]
- Machine Learning in Production? What This Really Means - Towards Data Science - January 28th, 2026 [January 28th, 2026]
- Best Machine Learning Stocks of 2026 and How to Invest in Them - The Motley Fool - January 28th, 2026 [January 28th, 2026]
- Machine learning-based prediction of mortality risk from air pollution-induced acute coronary syndrome in the Western Pacific region - Nature - January 28th, 2026 [January 28th, 2026]
- Machine Learning Predicts the Strength of Carbonated Recycled Concrete - AZoBuild - January 28th, 2026 [January 28th, 2026]
- Vertiv Next Predict is a new AI-powered, managed service that combines field expertise and advanced machine learning algorithms to anticipate issues... - January 28th, 2026 [January 28th, 2026]
- Machine Learning in Network Security: The 2026 Firewall Shift - openPR.com - January 28th, 2026 [January 28th, 2026]
- Why IBMs New Machine-Learning Model Is a Big Deal for Next-Generation Chips - TipRanks - January 24th, 2026 [January 24th, 2026]
- A no-compromise amplifier solution: Synergy teams up with Wampler and Friedman to launch its machine-learning power amp and promises to change the... - January 24th, 2026 [January 24th, 2026]
- Our amplifier learns your cabinets impedance through controlled sweeps and continues to monitor it in real-time: Synergys Power Amp Machine-Learning... - January 24th, 2026 [January 24th, 2026]
- Machine Learning Studied to Predict Response to Advanced Overactive Bladder Therapies - Sandip Vasavada - UroToday - January 24th, 2026 [January 24th, 2026]
- Blending Education, Machine Learning to Detect IV Fluid Contaminated CBCs, With Carly Maucione, MD - HCPLive - January 24th, 2026 [January 24th, 2026]
- Why its critical to move beyond overly aggregated machine-learning metrics - MIT News - January 24th, 2026 [January 24th, 2026]
- Machine Learning Lends a Helping Hand to Prosthetics - AIP Publishing LLC - January 24th, 2026 [January 24th, 2026]
- Hassan Taher Explains the Fundamentals of Machine Learning and Its Relationship to AI - mitechnews.com - January 24th, 2026 [January 24th, 2026]
- Keysight targets faster PDK development with machine learning toolkit - eeNews Europe - January 24th, 2026 [January 24th, 2026]
- Training and external validation of machine learning supervised prognostic models of upper tract urothelial cancer (UTUC) after nephroureterectomy -... - January 24th, 2026 [January 24th, 2026]
- Age matters: a narrative review and machine learning analysis on shared and separate multidimensional risk domains for early and late onset suicidal... - January 24th, 2026 [January 24th, 2026]
- Uncovering Hidden IV Fluid Contamination Through Machine Learning, With Carly Maucione, MD - HCPLive - January 24th, 2026 [January 24th, 2026]
- Machine learning identifies factors that may determine the age of onset of Huntington's disease - Medical Xpress - January 24th, 2026 [January 24th, 2026]
- AI and Machine Learning - WEF expands Fourth Industrial Revolution Network - Smart Cities World - January 24th, 2026 [January 24th, 2026]
- Machine-learning analysis reclassifies armed conflicts into three new archetypes - The Brighter Side of News - January 24th, 2026 [January 24th, 2026]
- Machine learning and AI the future of drought monitoring in Canada - sasktoday.ca - January 24th, 2026 [January 24th, 2026]
- Machine learning revolutionises the development of nanocomposite membranes for CO capture - European Coatings - January 24th, 2026 [January 24th, 2026]
- AI and Machine Learning - Leading data infrastructure is helping power better lives in Sunderland - Smart Cities World - January 24th, 2026 [January 24th, 2026]
- How banks are responsibly embedding machine learning and GenAI into AML surveillance - Compliance Week - January 20th, 2026 [January 20th, 2026]