What is GIANT?¶
GIANT (Gigapixel Image Agent for Navigating Tissue) is an agentic system that uses large language models (LLMs) to autonomously navigate whole-slide images (WSIs) for pathology analysis.
The Problem¶
Whole-slide images are massive - often 100,000+ pixels on each side, resulting in gigapixel-scale images. A typical WSI can be 50,000 x 80,000 pixels or larger.
This creates fundamental challenges:
- Too large for direct analysis: LLMs have input size limits (~1-2K pixels typically)
- Information overload: Most of the slide is background or irrelevant tissue
- Multi-scale features: Diagnosis requires both architectural patterns (low magnification) and cellular details (high magnification)
The Solution¶
GIANT treats WSI analysis as a navigation problem. Instead of trying to analyze the entire slide at once, an LLM-powered agent:
- Starts with a thumbnail - A low-resolution overview with coordinate axis guides
- Iteratively zooms in - Selects regions of interest based on the question
- Accumulates evidence - Remembers what it has seen across navigation steps
- Provides an answer - When sufficient evidence is gathered
This mimics how pathologists work: scan at low power, identify regions of interest, zoom in for cellular detail.
Key Innovations¶
Axis Guides¶
The thumbnail is overlaid with coordinate markers showing Level-0 pixel positions. This allows the LLM to specify exact crop coordinates using natural language reasoning:
"I can see a suspicious region around coordinates (45000, 32000). Let me zoom in there..."
Multi-turn Context¶
The agent maintains conversation history, remembering: - Previously examined regions - Observations and reasoning at each step - The original question being answered
Structured Actions¶
The LLM outputs structured JSON actions:
{
"reasoning": "The thumbnail shows a dark region that may be tumor...",
"action": {
"action_type": "crop",
"x": 45000,
"y": 32000,
"width": 10000,
"height": 10000
}
}
Or when ready to answer:
{
"reasoning": "Based on the cellular morphology observed...",
"action": {
"action_type": "answer",
"answer_text": "This is adenocarcinoma, moderately differentiated."
}
}
Supported Tasks¶
GIANT can answer various pathology questions:
| Task Type | Example Question |
|---|---|
| Classification | "What organ is this tissue from?" |
| Diagnosis | "What type of cancer is present?" |
| Grading | "What is the Gleason grade?" |
| VQA | "Are there mitotic figures visible?" |
Architecture Overview¶
┌─────────────────────────────────────────────────────┐
│ GIANTAgent │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │WSIReader │───▶│CropEngine│───▶│OverlayGen│ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ │ │ │ │
│ └──────────────┴───────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ContextManager│ │
│ └──────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ LLMProvider │◀──▶ OpenAI/Anthropic │
│ └──────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ Trajectory │───▶ Evaluation │
│ └──────────────┘ │
└─────────────────────────────────────────────────────┘
Research Origin¶
GIANT is based on the paper:
GIANT: Gigapixel Image Agent for Navigating Tissue arXiv:2511.19652
This implementation reproduces and extends the paper's methodology, achieving competitive results on the MultiPathQA benchmark.
Next Steps¶
- Architecture - Detailed system design
- Algorithm - The core navigation algorithm
- Quickstart - Try it yourself