LLM Integration¶
GIANT uses large multimodal models (LMMs) to analyze images and decide navigation actions. This page explains how LLM providers are integrated.
Provider Architecture¶
GIANT abstracts LLM interactions behind a protocol interface:
class LLMProvider(Protocol):
async def generate_response(self, messages: list[Message]) -> LLMResponse:
"""Generate a response from the LLM."""
...
def get_model_name(self) -> str:
"""Get the model identifier."""
...
def get_target_size(self) -> int:
"""Get optimal image size for this provider."""
...
This allows swapping providers without changing agent code.
Supported Providers¶
OpenAI¶
Uses the Responses API with structured outputs:
Features:
- Native JSON schema enforcement via response_format
- Image handling via base64 data URLs
- Token and cost tracking from response metadata
Target Size: 1000px (higher resolution for detail)
Anthropic¶
Uses the Messages API with tool use:
Features:
- Tool use for structured output (submit_step tool)
- Image handling via base64 content blocks
- Token and cost tracking from response metadata
Target Size: 500px (cost-optimized)
Message Format¶
Internally, GIANT uses a unified message format:
from pydantic import BaseModel
from typing import Literal
class MessageContent(BaseModel):
type: Literal["text", "image"]
text: str | None = None
image_base64: str | None = None
media_type: str = "image/jpeg"
class Message(BaseModel):
role: Literal["system", "user", "assistant"]
content: list[MessageContent]
Converters translate to provider-specific formats:
# OpenAI format
{
"role": "user",
"content": [
{"type": "input_text", "text": "Analyze this image..."},
{"type": "input_image", "image_url": "data:image/jpeg;base64,..."}
]
}
# Anthropic format
{
"role": "user",
"content": [
{"type": "text", "text": "Analyze this image..."},
{"type": "image", "source": {"type": "base64", "data": "...", "media_type": "image/jpeg"}}
]
}
Structured Output¶
GIANT requires structured JSON responses:
class StepResponse(BaseModel):
reasoning: str
action: BoundingBoxAction | FinalAnswerAction
class BoundingBoxAction(BaseModel):
action_type: Literal["crop"] = "crop"
x: int
y: int
width: int
height: int
class FinalAnswerAction(BaseModel):
action_type: Literal["answer"] = "answer"
answer_text: str
OpenAI: JSON Schema¶
response = client.responses.create(
model="gpt-5.2",
input=messages,
text={
"format": {
"type": "json_schema",
"name": "step_response",
"schema": StepResponse.model_json_schema(),
}
}
)
Anthropic: Tool Use¶
response = client.messages.create(
model="claude-sonnet-4-5-20250929",
messages=messages,
tools=[{
"name": "submit_step",
"description": "Provide your response",
"input_schema": StepResponse.model_json_schema(),
}],
tool_choice={"type": "tool", "name": "submit_step"},
)
Model Registry¶
Only approved models are allowed:
| Provider | Model ID | Status |
|---|---|---|
| OpenAI | gpt-5.2 |
Default |
| Anthropic | claude-sonnet-4-5-20250929 |
Supported |
gemini-3-pro-preview |
Reserved (provider not yet implemented) |
Models are validated at runtime:
from giant.llm.model_registry import validate_model_id
validate_model_id("gpt-5.2") # OK
validate_model_id("gpt-4o") # Raises ValueError
See Model Registry for details.
Cost Tracking¶
Each response includes usage and cost:
@dataclass
class TokenUsage:
prompt_tokens: int
completion_tokens: int
total_tokens: int
cost_usd: float
Costs are calculated using pricing tables:
# Example pricing (per 1M tokens)
PRICING = {
"gpt-5.2": {"input": 1.75, "output": 14.00},
"claude-sonnet-4-5-20250929": {"input": 3.00, "output": 15.00},
"gemini-3-pro-preview": {"input": 2.00, "output": 12.00},
}
Error Handling¶
LLMError¶
Base exception for API failures:
class LLMError(Exception):
"""Raised when API calls fail after retries."""
provider: str | None
model: str | None
cause: Exception | None
LLMParseError¶
When response can't be parsed:
class LLMParseError(LLMError):
"""Raised when output doesn't match expected schema."""
raw_output: str | None
Circuit Breaker¶
Protects against cascading failures:
class CircuitBreakerOpenError(LLMError):
"""Raised when too many consecutive failures occur."""
cooldown_remaining_seconds: float
Configuration¶
Environment Variables¶
Provider-Specific Settings¶
| Provider | Target Size | Notes |
|---|---|---|
| OpenAI | 1000px | Higher resolution, higher cost |
| Anthropic | 500px | Cost-optimized, still effective |
Adding New Providers¶
To add a new LLM provider:
- Create client class implementing
LLMProviderprotocol - Add converter functions for message format
- Add to
create_provider()factory - Add model to registry with pricing
- Add tests
Example skeleton:
class NewProvider:
def __init__(self, model: str):
validate_model_id(model, provider="newprovider")
self.model = model
self.client = NewProviderClient()
async def generate_response(self, messages: list[Message]) -> LLMResponse:
# Convert messages
# Call API
# Parse response
# Return LLMResponse
...
def get_model_name(self) -> str:
return self.model
def get_target_size(self) -> int:
return 1000
Next Steps¶
- Prompt Design - Navigation prompts
- Model Registry - Approved models
- Configuring Providers - Setup guide