Files

Ethan Villarosa ecbb5ee194 standardized BARThez model card (#39701 )

* standardized barthez model card according to template

* Update docs/source/en/model_doc/barthez.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/barthez.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/barthez.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/barthez.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/barthez.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/barthez.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* suggested changes to barthez model card

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

2025-07-30 08:33:13 -07:00

6.3 KiB

Raw Blame History

BARThez

BARThez is a BART model designed for French language tasks. Unlike existing French BERT models, BARThez includes a pretrained encoder-decoder, allowing it to generate text as well. This model is also available as a multilingual variant, mBARThez, by continuing pretraining multilingual BART on a French corpus.

You can find all of the original BARThez checkpoints under the BARThez collection.

Tip

This model was contributed by moussakam. Refer to the BART docs for more usage examples.

The example below demonstrates how to predict the <mask> token with [Pipeline], [AutoModel], and from the command line.

import torch
from transformers import pipeline

pipeline = pipeline(
    task="fill-mask",
    model="moussaKam/barthez",
    torch_dtype=torch.float16,
    device=0
)
pipeline("Les plantes produisent <mask> grâce à un processus appelé photosynthèse.")

import torch
from transformers import AutoModelForMaskedLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(
    "moussaKam/barthez",
)
model = AutoModelForMaskedLM.from_pretrained(
    "moussaKam/barthez",
    torch_dtype=torch.float16,
    device_map="auto",
)
inputs = tokenizer("Les plantes produisent <mask> grâce à un processus appelé photosynthèse.", return_tensors="pt").to("cuda")

with torch.no_grad():
    outputs = model(**inputs)
    predictions = outputs.logits

masked_index = torch.where(inputs['input_ids'] == tokenizer.mask_token_id)[1]
predicted_token_id = predictions[0, masked_index].argmax(dim=-1)
predicted_token = tokenizer.decode(predicted_token_id)

print(f"The predicted token is: {predicted_token}")

echo -e "Les plantes produisent <mask> grâce à un processus appelé photosynthèse." | transformers run --task fill-mask --model moussaKam/barthez --device 0

BarthezTokenizer

autodoc BarthezTokenizer

BarthezTokenizerFast

autodoc BarthezTokenizerFast

6.3 KiB Raw Blame History

BARThez

BarthezTokenizer

BarthezTokenizerFast

6.3 KiB

Raw Blame History