March 2025

Going Beyond OCR+LLM: Building Apps with Agentic Document Extraction API (Part 2)

Mar 31, 2025 | Tutorial

Introduction Modern Large Language Models (LLMs) have revolutionized text analysis—until they encounter the complexities of PDFs. PDFs often feature intricate layouts, visual elements, flowcharts, images, and tables with interdependent contexts and relationships. This...

VisionAgent: An Agentic Approach for Complex Visual Reasoning

Mar 26, 2025 | Product

Vision Language Models (VLMs) such as GPT-4o and Claude-3.5 have done well and continue to improve at textual tasks but they still struggle with visual tasks. For example, let’s ask these VLMs to count the number of missing soda cans in this image: The Soda Can Puzzle...

Going Beyond OCR+LLM: Introducing Agentic Document Extraction

Mar 21, 2025 | Tutorial

Introduction If you’ve ever tried to extract meaningful data from PDFs—especially documents with complex layouts like tables, charts, or forms—you’ve likely run into OCR’s limitations. OCR is great for raw text, but it ignores structural relationships critical for...

Performance Benchmark: LandingLens Vision Model Improves Retinopathy Classification

Mar 7, 2025 | Product

This article shares the results of a benchmarking study conducted with an open access fundus image dataset. Using the same starting images, same partitions, and same ground truth labels, LandingLens produced a multi-class classification model with an F1 score of 92.1%...

Going Beyond OCR+LLM: Building Apps with Agentic Document Extraction API (Part 2)

VisionAgent: An Agentic Approach for Complex Visual Reasoning

Going Beyond OCR+LLM: Introducing Agentic Document Extraction

Performance Benchmark: LandingLens Vision Model Improves Retinopathy Classification

Recent Posts

Archives

Categories

Meta