MODEL_NAME

This repository contains layoutlm-camembertv2 weights exported to safetensors format.

Source

These weights are derived from pretrained models:

Layout encoder (LayoutLM): microsoft/layoutlm-base-uncased — pretrained on IIT-CDIP + masked visual-language modeling (LayoutLM paper)
Text encoder: almanach/camembertv2-base — French language model (RoBERTa-like architecture)

Methodology

This checkpoint was produced by weight merging, not end-to-end training.

Load the pretrained layout encoder weights (LiLT or LayoutLM) — kept intact
Replace the text encoder weights (embeddings, attention layers, FFN) with those from the French model
Update the tokenizer and vocabulary configuration accordingly

No training or fine-tuning was performed at this stage.
This checkpoint is intended as a starting point for downstream fine-tuning on French document understanding tasks (NER, token classification, extractive QA…).

Files

File	Description
`model.safetensors`	Model weights
`pytorch_model.bin`	Model weights (PyTorch format)
`config.json`	Model configuration
`tokenizer_config.json`	Tokenizer configuration
`README.md`	This model card

Usage

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("USERNAME/MODEL_NAME")
model     = AutoModel.from_pretrained("USERNAME/MODEL_NAME")

Limitations

This model has not been fine-tuned on any French document dataset
Performance on downstream tasks is not guaranteed without task-specific fine-tuning
Intended for research and experimentation purposes

License

Weights are derived from models released under the MIT and Apache-2.0 licenses.
Please refer to the original repositories for full license terms.

Acknowledgements

LayoutLM: Pre-training of Text and Layout for Document Image Understanding — Xu et al., 2020
microsoft/layoutlm-base-uncased

Note: This is not an official release from any of the above organizations.

Downloads last month: 34

Safetensors

Model size

0.1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for RomDev2/layoutlm-camembertv2

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Paper • 1912.13318 • Published Dec 31, 2019 • 5