1

chandra-FP8-Latest

chandra-FP8-Latest is an FP8-compressed evolution built on top of datalab-to/chandra. This variant leverages BF16 · FP8 (F8_E4M3) precision formats to significantly reduce memory footprint and improve inference efficiency while preserving the high-precision OCR and layout-aware reasoning capabilities of the original architecture. The result is a highly efficient document intelligence vision-language model optimized for complex parsing, structured output generation, and production-scale deployment.

FP8 (8-bit floating point) weight and activation quantization using hardware acceleration on GPUs – FP8 W8A8. Quantization W8A8 FP8-dynamic recipe – examples.

About the Base Model

Chandra from datalab-to is a state-of-the-art open-source OCR vision-language model designed for complex document parsing and high-fidelity layout reconstruction.

It excels at:

  • Handwriting Recognition across diverse styles
  • Table Structure Preservation, including merged and nested cells
  • Mathematical Equation Rendering into clean LaTeX
  • Form Reconstruction with checkboxes and radio buttons
  • Multi-Column Layout Parsing
  • 40+ Language Support
  • Precise Bounding Box Extraction for every text block, table, and image

Chandra outputs structured Markdown, HTML, or JSON with layout-aware coordinates, enabling seamless integration into document intelligence pipelines.

It handles challenging real-world inputs such as:

  • Doctor notes
  • Financial filings
  • Invoices
  • Textbooks
  • Government forms
  • Low-quality or messy scanned documents

What FP8 Adds

The chandra-FP8-Latest variant introduces:

  • BF16 · FP8 (F8_E4M3) Compression: Transformer Engine–based quantization reduces VRAM usage while maintaining OCR precision and layout fidelity.
  • Higher Throughput: Faster document parsing at scale.
  • Lower Memory Footprint: Improved deployment feasibility on Hopper-class and compatible GPUs.
  • Production Optimization: Ideal for high-volume PDF ingestion and enterprise document processing.

Deployment Support

Chandra supports:

  • Hugging Face Transformers for local inference
  • vLLM server deployment for high-throughput production environments
  • Layout-aware prompts such as "ocr_layout"
  • Configurable max_output_tokens up to 8192 per page
  • CLI workflows with environment-based configuration
  • Page-range processing for PDFs

This makes it well-suited for enterprise-scale document AI systems.

Quick Start with Transformers

from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info
import torch

# Load the FP8-compressed chandra model
model = Qwen3VLForConditionalGeneration.from_pretrained(
    "prithivMLmods/chandra-FP8",
    torch_dtype="auto",
    device_map="auto"
)

processor = AutoProcessor.from_pretrained(
    "prithivMLmods/chandra-FP8"
)

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
            },
            {"type": "text", "text": "Analyze the fine-grained details in this image."},
        ],
    }
]

text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)

image_inputs, video_inputs = process_vision_info(messages)

inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
).to("cuda")

generated_ids = model.generate(**inputs, max_new_tokens=256)

generated_ids_trimmed = [
    out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]

output_text = processor.batch_decode(
    generated_ids_trimmed,
    skip_special_tokens=True,
    clean_up_tokenization_spaces=False
)

print(output_text)

Intended Use

  • High-precision OCR pipelines
  • Financial and legal document processing
  • Academic and textbook digitization
  • Automated form parsing
  • Enterprise document intelligence systems
  • AI data ingestion pipelines

License

Licensed under a modified OpenRAIL-M framework:

  • Apache 2.0 for code
  • Commercial restrictions for competitors exceeding $2M revenue

Please review the base model license terms before commercial deployment.

Limitations & Considerations

  • FP8 requires compatible GPU hardware for optimal acceleration.
  • Extremely low-resolution or heavily degraded scans may still impact recognition quality.
  • Users are responsible for ensuring lawful and compliant deployment in regulated environments.
Downloads last month
-
Safetensors
Model size
9B params
Tensor type
BF16
·
F8_E4M3
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for prithivMLmods/chandra-FP8-Latest

Base model

datalab-to/chandra
Quantized
(11)
this model