Blog

Three Lessons Learned from Working on Document AI

Franz Bender / 19 Jun 2024

Many companies are already exploring what AI can do for them. We observe a rise of GenAI uses cases that automate the process of extracting information from documents, know as document intelligence. When it comes to document intelligence, companies often turn to methods such as Retrieval Augmented Generation (RAG) to pull insights from documents. Discussions on RAG often overshadow other aspects of the process. Recently, we had the chance to push the boundaries of Document AI to automate a labor-intensive manual process. Here is what we learned.

We partnered with a leader in the insurance industry that processes numerous technical insurance documents on a regular basis. These documents are very diverse, filled with tables, formulas, text, and explanations in a domain specific terminology. Currently, a large team of experts painstakingly “parses” these documents and integrates them into their system. This process requires significant domain knowledge, time, and financial resources. While there’s clear potential for automation, the diversity of these documents has made it extremely challenging – until now.

Over the last three months, we developed a system capable of extracting critical information from these documents. We’ve set all of this up in AWS using Bedrock, Lambda, Event Bridge and a third party OCR service. Here’s what we learned.

1. Perfecting OCR is Crucial

The documents we deal with are both complex and varied. Achieving accurate OCR (Optical Character Recognition) is essential. Unfortunately, no single service out there is perfect. So, we adopted a fusion approach: we combined one of the most respected OCR systems (which gets us 90% there) with additional OCR tools and computer vision methods to capture the remaining 9%. Many OCR systems struggle with elements like checkboxes and large formulas, so we combine the strengths of different approaches. An additional and oftern overlooks aspect is, that we have to linearize the documents into text form to pass them to a language model. Detecting the reading order in complex layouts is another tough challenge we had to overcome. So if you are thinking about getting into Document AI, you should keep in mind how you can get good source material.

2. Maximise Accuracy with Iterative Prompting

Despite the recent support for larger context windows, we found better control over model output and consistency by implementing a workflow of chained prompts. By decomposing complex extraction tasks into smaller, manageable steps, we could guide the language model more effectively. Each prompt in the workflow builds on the previous one, informing and improving subsequent steps. Implementing this requires accommodating a more sophisticated prompting style and iterative refinement, but the improved accuracy is worth the effort.

3. Prioritize Accurate Citations to Combat Hallucinations

In business contexts, having explainable AI is non-negotiable. This presents a significant challenge. Our solution involves asking the language model to first quote relevant sentences before answering any questions. Coupled with intelligent output parsing and a fuzzy search against the provided document, we achieve high citation accuracy. This approach not only combats hallucinations but also pinpoints where the AI sources its information. Moreover, it allows us to measure accuracy by verifying whether the AI extracts information from the correct places.

About the Author

Franz is an Associate Manager at Netlight. Working at the intersection of AI, data, and cloud, with a focus on real business impact and a soft spot for sales and strategy.

Franz Bender

Next up

Blog

Tech
Strategy
GenAI

Blog

MCPs business case

Tech
Strategy
GenAI

To find the real business case for MCP, we must look past the technical plumbing and examine the economic lever. Why are pragmatic engineering organizations like Block and Stripe investing here? It is not merely engineering hygiene. It is an attempt to solve two expensive enterprise problems: productivity at the edge and integration at the core.

Blog

Tech
Software Engineering
GenAI

Blog

The Next Platform Shift: Why ChatGPT Apps Matter

Tech
Software Engineering
GenAI

Today, we can build apps inside ChatGPT using prompt engineering. But the real transformation is happening one level deeper — with MCP servers and the ChatGPT Apps SDK, which let you deliver fully interactive applications directly inside the chat interface.

Blog

Design
Tech
Product

Blog

Design Systems on Hard Mode: Multi-Brand Edition

Design
Tech
Product

Practical lessons to keep complexity under control and consistency alive without drowning in tokens and variants.