chandra-FP8-Latest

chandra-FP8-Latest is an FP8-compressed evolution built on top of datalab-to/chandra. This variant leverages BF16 · FP8 (F8_E4M3) precision formats to significantly reduce memory footprint and improve inference efficiency while preserving the high-precision OCR and layout-aware reasoning capabilities of the original architecture. The result is a highly efficient document intelligence vision-language model optimized for complex parsing, structured output generation, and production-scale deployment.

FP8 (8-bit floating point) weight and activation quantization using hardware acceleration on GPUs – FP8 W8A8. Quantization W8A8 FP8-dynamic recipe – examples.

About the Base Model

Chandra from datalab-to is a state-of-the-art open-source OCR vision-language model designed for complex document parsing and high-fidelity layout reconstruction.

It excels at:

Handwriting Recognition across diverse styles
Table Structure Preservation, including merged and nested cells
Mathematical Equation Rendering into clean LaTeX
Form Reconstruction with checkboxes and radio buttons
Multi-Column Layout Parsing
40+ Language Support
Precise Bounding Box Extraction for every text block, table, and image

Chandra outputs structured Markdown, HTML, or JSON with layout-aware coordinates, enabling seamless integration into document intelligence pipelines.

It handles challenging real-world inputs such as:

Doctor notes
Financial filings
Invoices
Textbooks
Government forms
Low-quality or messy scanned documents

What FP8 Adds

The chandra-FP8-Latest variant introduces:

BF16 · FP8 (F8_E4M3) Compression: Transformer Engine–based quantization reduces VRAM usage while maintaining OCR precision and layout fidelity.
Higher Throughput: Faster document parsing at scale.
Lower Memory Footprint: Improved deployment feasibility on Hopper-class and compatible GPUs.
Production Optimization: Ideal for high-volume PDF ingestion and enterprise document processing.

Deployment Support

Chandra supports:

Hugging Face Transformers for local inference
vLLM server deployment for high-throughput production environments
Layout-aware prompts such as "ocr_layout"
Configurable max_output_tokens up to 8192 per page
CLI workflows with environment-based configuration
Page-range processing for PDFs

This makes it well-suited for enterprise-scale document AI systems.

Quick Start with Transformers

from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info
import torch

# Load the FP8-compressed chandra model
model = Qwen3VLForConditionalGeneration.from_pretrained(
    "prithivMLmods/chandra-FP8",
    torch_dtype="auto",
    device_map="auto"
)

processor = AutoProcessor.from_pretrained(
    "prithivMLmods/chandra-FP8"
)

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
            },
            {"type": "text", "text": "Analyze the fine-grained details in this image."},
        ],
    }
]

text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)

image_inputs, video_inputs = process_vision_info(messages)

inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
).to("cuda")

generated_ids = model.generate(**inputs, max_new_tokens=256)

generated_ids_trimmed = [
    out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]

output_text = processor.batch_decode(
    generated_ids_trimmed,
    skip_special_tokens=True,
    clean_up_tokenization_spaces=False
)

print(output_text)

Intended Use

High-precision OCR pipelines
Financial and legal document processing
Academic and textbook digitization
Automated form parsing
Enterprise document intelligence systems
AI data ingestion pipelines

License

Licensed under a modified OpenRAIL-M framework:

Apache 2.0 for code
Commercial restrictions for competitors exceeding $2M revenue

Please review the base model license terms before commercial deployment.

Limitations & Considerations

FP8 requires compatible GPU hardware for optimal acceleration.
Extremely low-resolution or heavily degraded scans may still impact recognition quality.
Users are responsible for ensuring lawful and compliant deployment in regulated environments.

Downloads last month: -

Safetensors

Model size

9B params

Tensor type

BF16

F8_E4M3

Model tree for prithivMLmods/chandra-FP8-Latest

Base model

datalab-to/chandra

Quantized

(11)

this model