Quarkus Docling
This is a Quarkus extension for the Docling project. Docling simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem.
Currently, this extension is a set of wrappers around the Docling Serve project, which exposes Docling as a REST API. It also provides a Dev Service and Dev UI integrations.
The eventual goal is to unify the DoclingDocument format with LangChain4j’s Document abstraction so that Docling can be used in a LangChain4j RAG pipeline for ingesting data.
Docling Features
- 
🗂️ Parsing of multiple document formats incl. PDF, DOCX, XLSX, HTML, images, and more
 - 
📑 Advanced PDF understanding incl. page layout, reading order, table structure, code, formulas, image classification, and more
 - 
🧬 Unified, expressive
DoclingDocumentrepresentation format - 
↪️ Various export formats and options, including Markdown, HTML, and lossless JSON
 - 
🔒 Local execution capabilities for sensitive data and air-gapped environments
 - 
🤖 Plug-and-play integrations incl. LangChain, LlamaIndex, Crew AI & Haystack for agentic AI
 - 
🔍 Extensive OCR support for scanned PDFs and images
 - 
🥚 Support of several Visual Language Models SmolDocling
 - 
💻 Simple and convenient CLI
 
