Introducing VisionAgent: Your Visual AI Pilot

6 Visual AI Use Cases for Utilities from Easy to Advanced

LandingAI

The LandingAI team has developed VisionAgent, which is a generative Visual AI application builder that accelerates the development and deployment of vision-enabled applications.

VisionAgent acts as your Visual AI pilot when it comes to building vision-enabled applications. Going beyond just code writing assistance, VisionAgent creates multiple plans when prompted with a vision task, selects the best-performing one and provides all the necessary code, tools and models for a deployment-ready solution. Developers can iterate on vision tasks in minutes rather than weeks, getting to production faster.

The VisionAgent Ecosystem

The VisionAgent ecosystem consists of three distinct components:

Understanding what the components do and how they interact can be crucial. This blog describes each component of the VisionAgent ecosystem.

The VisionAgent Web App

For those looking for a quick and easy way to get started with VisionAgent, the hosted VisionAgent web app is the perfect solution. This web app allows you to prototype, iterate and deploy computer vision code without the need for extensive setup or configuration.

The web app provides an intuitive interface that guides you through the process of uploading data, generating code, testing code, and visualizing results. You can edit the generated code if you want finer control. You can deploy the code as a cloud endpoint or ask the agent to generate and deploy a Streamlit app (hosted in our cloud) to test or share with others.

The web app is an excellent option for users who want to quickly test ideas and see results without diving into the complexities of local development environments. It serves as a UI around the VisionAgent library, which we explore in detail in the following section.

The VisionAgent Library

The VisionAgent library is designed to help you leverage agent frameworks to create a code solution for your computer vision tasks. The framework can use a set of tools to solve a vision task. These tools might range from a simple Python math function to a sophisticated computer vision model. The framework comes with a selection of built-in tools for common computer vision tasks, and also supports the creation of custom tools built by users.

The VisionAgent library contains the core functionalities of the VisionAgent framework, including agent planning, tool selection, plan execution, code generation, evaluation, etc. It provides Python programming interfaces for configuring, running and interacting with the agent locally. Additionally, there is a Streamlit app available for those who just need a simple chat interface. This library also provides a set of built-in tools for VisionAgent to use, which can also be used independently with Python.

Many built-in tools are backed by a computer vision model, and thus requires a GPU for inference. To make it easy to run VisionAgent locally, we host all necessary models in LandingAI cloud and provide an HTTP endpoint for each tool, i.e. tool services. These VisionAgent tool services are stateless and accessible by all VisionAgent users, and they are configured with auto-scaling capabilities to serve many users simultaneously.

Thus, you will notice that many tool implementations are simply HTTP clients that send requests to the tool service that actually perform the inference on GPU

The VisionAgent Tools Library

The VisionAgent Tools library complements the VisionAgent repository by offering a suite of tool implementations designed to work with the VisionAgent framework. Most of these tools are used by the VisionAgent framework by default, i.e. the built-in tools.

In the context of this library, a tool is a Python abstraction that wraps around one or more models to accomplish specific tasks. A task could be object detection, image classification, QR reading, counting items, etc. Each tool accepts either image or video as input, and is designed to work with different models via a dynamic model registry, allowing users to switch between models. The VisionAgent Tools repo doesn’t include any web service code or deployment code.

If there’s a tool you want to use that’s not currently in the repo, you can register it as a custom tool in the VisionAgent repository and use it locally. See an example here.

If you think none of the existing tools or models can solve your problem, we encourage you to open an issue in the VisionAgent Tools repo, contribute your solution, or reach out to us on Discord.

There are many innovations happening in the field of AI at a rapid pace. We regularly evaluate and add new and better models to the repo. The following screenshot shows the list of models we support as of Oct 10, 2024.

Next Steps

Try the VisionAgent web app if you want to:

understand VisionAgent’s capabilities
test a new idea to prototype with VisionAgent
upgrade your existing vision solution with VisionAgent
deploy a web endpoint or Streamlit app for your VisionAgent-generated code

Go to the VisionAgent repo if you want to:

see more examples or tutorials
run or customize VisionAgent locally
learn how the VisionAgent framework works internally
contribute to the repo

Go to the VisionAgent Tools repo if you want to:

explore existing tools and the computer vision models used by VisionAgent
run and test an individual tool or model locally
contribute to the repo