chandra-FP8-Latest
chandra-FP8-Latest is an FP8-compressed evolution built on top of datalab-to/chandra. This variant leverages BF16 · FP8 (F8_E4M3) precision formats to significantly reduce memory footprint and improve inference efficiency while preserving the high-precision OCR and layout-aware reasoning capabilities of the original architecture. The result is a highly efficient document intelligence vision-language model optimized for complex parsing, structured output generation, and production-scale deployment.
FP8 (8-bit floating point) weight and activation quantization using hardware acceleration on GPUs – FP8 W8A8. Quantization W8A8 FP8-dynamic recipe – examples.
About the Base Model
Chandra from datalab-to is a state-of-the-art open-source OCR vision-language model designed for complex document parsing and high-fidelity layout reconstruction.
It excels at:
- Handwriting Recognition across diverse styles
- Table Structure Preservation, including merged and nested cells
- Mathematical Equation Rendering into clean LaTeX
- Form Reconstruction with checkboxes and radio buttons
- Multi-Column Layout Parsing
- 40+ Language Support
- Precise Bounding Box Extraction for every text block, table, and image
Chandra outputs structured Markdown, HTML, or JSON with layout-aware coordinates, enabling seamless integration into document intelligence pipelines.
It handles challenging real-world inputs such as:
- Doctor notes
- Financial filings
- Invoices
- Textbooks
- Government forms
- Low-quality or messy scanned documents
What FP8 Adds
The chandra-FP8-Latest variant introduces:
- BF16 · FP8 (F8_E4M3) Compression: Transformer Engine–based quantization reduces VRAM usage while maintaining OCR precision and layout fidelity.
- Higher Throughput: Faster document parsing at scale.
- Lower Memory Footprint: Improved deployment feasibility on Hopper-class and compatible GPUs.
- Production Optimization: Ideal for high-volume PDF ingestion and enterprise document processing.
Deployment Support
Chandra supports:
- Hugging Face Transformers for local inference
- vLLM server deployment for high-throughput production environments
- Layout-aware prompts such as
"ocr_layout" - Configurable
max_output_tokensup to 8192 per page - CLI workflows with environment-based configuration
- Page-range processing for PDFs
This makes it well-suited for enterprise-scale document AI systems.
Quick Start with Transformers
from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info
import torch
# Load the FP8-compressed chandra model
model = Qwen3VLForConditionalGeneration.from_pretrained(
"prithivMLmods/chandra-FP8",
torch_dtype="auto",
device_map="auto"
)
processor = AutoProcessor.from_pretrained(
"prithivMLmods/chandra-FP8"
)
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
},
{"type": "text", "text": "Analyze the fine-grained details in this image."},
],
}
]
text = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
videos=video_inputs,
padding=True,
return_tensors="pt",
).to("cuda")
generated_ids = model.generate(**inputs, max_new_tokens=256)
generated_ids_trimmed = [
out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed,
skip_special_tokens=True,
clean_up_tokenization_spaces=False
)
print(output_text)
Intended Use
- High-precision OCR pipelines
- Financial and legal document processing
- Academic and textbook digitization
- Automated form parsing
- Enterprise document intelligence systems
- AI data ingestion pipelines
License
Licensed under a modified OpenRAIL-M framework:
- Apache 2.0 for code
- Commercial restrictions for competitors exceeding $2M revenue
Please review the base model license terms before commercial deployment.
Limitations & Considerations
- FP8 requires compatible GPU hardware for optimal acceleration.
- Extremely low-resolution or heavily degraded scans may still impact recognition quality.
- Users are responsible for ensuring lawful and compliant deployment in regulated environments.
- Downloads last month
- -
Model tree for prithivMLmods/chandra-FP8-Latest
Base model
datalab-to/chandra