Add Watermarking LogitsProcessor and WatermarkDetector (#29676)

* add watermarking processor * remove the other hashing (context width=1 always) * make style * Update src/transformers/generation/logits_process.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/logits_process.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/logits_process.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/configuration_utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * update watermarking process * add detector * update tests to use detector * fix failing tests * rename `input_seq` * make style * doc for processor * minor fixes * docs * make quality * Update src/transformers/generation/configuration_utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/logits_process.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/watermarking.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/watermarking.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/watermarking.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * add PR suggestions * let's use lru_cache's default max size (128) * import processor if torch available * maybe like this * lets move the config to torch independet file * add docs * tiny docs fix to make the test happy * Update src/transformers/generation/configuration_utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/watermarking.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * PR suggestions * add docs * fix test * fix docs * address pr comments * style * Revert "style" This reverts commit 7f33cc34ff08b414f8e7f90060889877606b43b2. * correct style * make doctest green --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2024-05-14 13:31:39 +05:00
parent 65ea1904ff
commit 5ad960f1f4
12 changed files with 738 additions and 4 deletions
--- a/docs/source/en/generation_strategies.md
+++ b/docs/source/en/generation_strategies.md
@@ -173,6 +173,55 @@ your screen, one word at a time:
 An increasing sequence: one, two, three, four, five, six, seven, eight, nine, ten, eleven,
 ```

+
+## Watermarking
+
+The `generate()` supports watermarking the generated text by randomly marking a portion of tokens as "green". 
+When generating the "green" will have a small 'bias' value added to their logits, thus having a higher chance to be generated.
+The watermarked text can be detected by calculating the proportion of "green" tokens in the text and estimating how likely it is
+statistically to obtain that amount of "green" tokens for human-generated text. This watermarking strategy was proposed in the paper 
+["On the Reliability of Watermarks for Large Language Models"](https://arxiv.org/abs/2306.04634). For more information on 
+the inner functioning of watermarking, it is recommended to refer to the paper.
+
+The watermarking can be used with any generative model in `tranformers` and does not require an extra classification model
+to detect watermarked text. To trigger watermarking, pass in a [`WatermarkingConfig`] with needed arguments directly to the
+`.generate()` method or add it to the [`GenerationConfig`]. Watermarked text can be later detected with a [`WatermarkDetector`].
+
+
+<Tip warning={true}>
+
+The WatermarkDetector internally relies on the proportion of "green" tokens, and whether generated text follows the coloring pattern.
+That is why it is recommended to strip off the prompt text, if it is much longer than the generated text.
+This also can have an effect when one sequence in the batch is a lot longer causing other rows to be padded.
+Additionally, the detector **must** be initiated with identical watermark configuration arguments used when generating.
+
+</Tip>
+
+Let's generate some text with watermarking. In the below code snippet, we set the bias to 2.5 which is a value that
+will be added to "green" tokens' logits. After generating watermarked text, we can pass it directly to the `WatermarkDetector`
+to check if the text is machine-generated (outputs `True` for machine-generated and `False` otherwise).
+
+```python
+>>> from transformers import AutoTokenizer, AutoModelForCausalLM, WatermarkDetector, WatermarkingConfig
+
+>>> model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2")
+>>> tok = AutoTokenizer.from_pretrained("openai-community/gpt2")
+>>> tok.pad_token_id = tok.eos_token_id
+>>> tok.padding_side = "left"
+
+>>> inputs = tok(["This is the beginning of a long story", "Alice and Bob are"], padding=True, return_tensors="pt")
+>>> input_len = inputs["input_ids"].shape[-1]
+
+>>> watermarking_config = WatermarkingConfig(bias=2.5, seeding_scheme="selfhash")
+>>> out = model.generate(**inputs, watermarking_config=watermarking_config, do_sample=False, max_length=20)
+
+>>> detector = WatermarkDetector(model_config=model.config, device="cpu", watermarking_config=watermarking_config)
+>>> detection_out = detector(out, return_dict=True)
+>>> detection_out.prediction
+array([True, True])
+```
+
+
 ## Decoding strategies

 Certain combinations of the `generate()` parameters, and ultimately `generation_config`, can be used to enable specific
--- a/docs/source/en/internal/generation_utils.md
+++ b/docs/source/en/internal/generation_utils.md
@@ -209,6 +209,10 @@ generation.
 [[autodoc]] WhisperTimeStampLogitsProcessor
    - __call__

+[[autodoc]] WatermarkLogitsProcessor
+    - __call__
+
+
 ### TensorFlow

 [[autodoc]] TFForcedBOSTokenLogitsProcessor
@@ -372,3 +376,10 @@ A [`Constraint`] can be used to force the generation to include specific tokens
    - update
    - get_seq_length
    - reorder_cache
+
+
+## Watermark Utils
+
+[[autodoc]] WatermarkDetector
+    - __call__
+
--- a/docs/source/en/main_classes/text_generation.md
+++ b/docs/source/en/main_classes/text_generation.md
@@ -41,6 +41,8 @@ like token streaming.
 	- validate
 	- get_generation_mode

+[[autodoc]] generation.WatermarkingConfig
+
 ## GenerationMixin

 [[autodoc]] generation.GenerationMixin
--- a/src/transformers/init.py
+++ b/src/transformers/init.py
@@ -117,7 +117,12 @@ _import_structure = {
    "feature_extraction_sequence_utils": ["SequenceFeatureExtractor"],
    "feature_extraction_utils": ["BatchFeature", "FeatureExtractionMixin"],
    "file_utils": [],
-    "generation": ["GenerationConfig", "TextIteratorStreamer", "TextStreamer"],
+    "generation": [
+        "GenerationConfig",
+        "TextIteratorStreamer",
+        "TextStreamer",
+        "WatermarkingConfig",
+    ],
    "hf_argparser": ["HfArgumentParser"],
    "hyperparameter_search": [],
    "image_transforms": [],
@@ -1232,6 +1237,8 @@ else:
            "TopPLogitsWarper",
            "TypicalLogitsWarper",
            "UnbatchedClassifierFreeGuidanceLogitsProcessor",
+            "WatermarkDetector",
+            "WatermarkLogitsProcessor",
            "WhisperTimeStampLogitsProcessor",
        ]
    )
@@ -4617,7 +4624,7 @@ if TYPE_CHECKING:
    from .feature_extraction_utils import BatchFeature, FeatureExtractionMixin

    # Generation
-    from .generation import GenerationConfig, TextIteratorStreamer, TextStreamer
+    from .generation import GenerationConfig, TextIteratorStreamer, TextStreamer, WatermarkingConfig
    from .hf_argparser import HfArgumentParser

    # Integrations
@@ -5797,6 +5804,8 @@ if TYPE_CHECKING:
            TopPLogitsWarper,
            TypicalLogitsWarper,
            UnbatchedClassifierFreeGuidanceLogitsProcessor,
+            WatermarkDetector,
+            WatermarkLogitsProcessor,
            WhisperTimeStampLogitsProcessor,
        )
        from .modeling_utils import PreTrainedModel
--- a/src/transformers/generation/init.py
+++ b/src/transformers/generation/init.py
@@ -18,7 +18,7 @@ from ..utils import OptionalDependencyNotAvailable, _LazyModule, is_flax_availab


 _import_structure = {
-    "configuration_utils": ["GenerationConfig", "GenerationMode"],
+    "configuration_utils": ["GenerationConfig", "GenerationMode", "WatermarkingConfig"],
    "streamers": ["TextIteratorStreamer", "TextStreamer"],
 }

@@ -78,6 +78,7 @@ else:
        "TypicalLogitsWarper",
        "UnbatchedClassifierFreeGuidanceLogitsProcessor",
        "WhisperTimeStampLogitsProcessor",
+        "WatermarkLogitsProcessor",
    ]
    _import_structure["stopping_criteria"] = [
        "MaxNewTokensCriteria",
@@ -106,6 +107,10 @@ else:
        "GenerateDecoderOnlyOutput",
        "GenerateEncoderDecoderOutput",
    ]
+    _import_structure["watermarking"] = [
+        "WatermarkDetector",
+        "WatermarkDetectorOutput",
+    ]

 try:
    if not is_tf_available():
@@ -174,7 +179,7 @@ else:
    ]

 if TYPE_CHECKING:
-    from .configuration_utils import GenerationConfig, GenerationMode
+    from .configuration_utils import GenerationConfig, GenerationMode, WatermarkingConfig
    from .streamers import TextIteratorStreamer, TextStreamer

    try:
@@ -218,6 +223,7 @@ if TYPE_CHECKING:
            TopPLogitsWarper,
            TypicalLogitsWarper,
            UnbatchedClassifierFreeGuidanceLogitsProcessor,
+            WatermarkLogitsProcessor,
            WhisperTimeStampLogitsProcessor,
        )
        from .stopping_criteria import (
@@ -247,6 +253,10 @@ if TYPE_CHECKING:
            SampleDecoderOnlyOutput,
            SampleEncoderDecoderOutput,
        )
+        from .watermarking import (
+            WatermarkDetector,
+            WatermarkDetectorOutput,
+        )

    try:
        if not is_tf_available():
--- a/src/transformers/generation/configuration_utils.py
+++ b/src/transformers/generation/configuration_utils.py
@@ -18,6 +18,7 @@ import copy
 import json
 import os
 import warnings
+from dataclasses import dataclass, is_dataclass
 from typing import TYPE_CHECKING, Any, Dict, Optional, Union

 from .. import __version__
@@ -221,6 +222,23 @@ class GenerationConfig(PushToHubMixin):
        low_memory (`bool`, *optional*):
            Switch to sequential beam search and sequential topk for contrastive search to reduce peak memory.
            Used with beam search and contrastive search.
+        watermarking_config (Union[`WatermarkingConfig`, `dict`], *optional*):
+            Arguments used to watermark the model outputs by adding a small bias to randomly selected set of "green" tokens.
+            If passed as `Dict`, it will be converted to a `WatermarkingConfig` internally.
+            See [this paper](https://arxiv.org/abs/2306.04634) for more details. Accepts the following keys:
+            - greenlist_ratio (`float`):
+                Used for watermarking. The ratio of "green" tokens used to the vocabulary size. Defaults to 0.25.
+            - bias (`float`):
+                Used with watermarking. The bias added to the selected "green" tokens' logits. Defaults to 2.0.
+            - hashing_key (`int`):
+                Hahsing key used for watermarking. Defaults to 15485863 (the millionth prime).
+            - seeding_scheme (`str`):
+                Algorithm to use for watermarking. Accepts values:
+                    - "lefthash" (default): "green" tokens selection depend on the last token (Algorithm 2 from the paper)
+                    - "selfhash": "green" tokens selection depends on the current token itself (Algorithm 3 from the paper)
+                        The downside of this scheme is that it considers all possible next tokens and can be slower than "lefthash".
+            - context_width(`int`):
+                The context length of previous tokens to use in seeding. Higher context length makes watermarking more robust.

        > Parameters that define the output variables of generate

@@ -333,6 +351,13 @@ class GenerationConfig(PushToHubMixin):
        self.sequence_bias = kwargs.pop("sequence_bias", None)
        self.guidance_scale = kwargs.pop("guidance_scale", None)
        self.low_memory = kwargs.pop("low_memory", None)
+        watermarking_config = kwargs.pop("watermarking_config", None)
+        if watermarking_config is None:
+            self.watermarking_config = None
+        elif isinstance(watermarking_config, WatermarkingConfig):
+            self.watermarking_config = watermarking_config
+        else:
+            self.watermarking_config = WatermarkingConfig.from_dict(watermarking_config)

        # Parameters that define the output variables of `generate`
        self.num_return_sequences = kwargs.pop("num_return_sequences", 1)
@@ -613,6 +638,12 @@ class GenerationConfig(PushToHubMixin):
                    f"({self.num_beams})."
                )

+        # check watermarking arguments
+        if self.watermarking_config is not None:
+            if not isinstance(self.watermarking_config, WatermarkingConfig):
+                self.watermarking_config = WatermarkingConfig.from_dict(self.watermarking_config)
+            self.watermarking_config.validate()
+
        # 5. check common issue: passing `generate` arguments inside the generation config
        generate_arguments = (
            "logits_processor",
@@ -1021,7 +1052,16 @@ class GenerationConfig(PushToHubMixin):
            else:
                return obj

+        def convert_dataclass_to_dict(obj):
+            if isinstance(obj, dict):
+                return {key: convert_dataclass_to_dict(value) for key, value in obj.items()}
+            elif is_dataclass(obj):
+                return obj.to_dict()
+            else:
+                return obj
+
        config_dict = convert_keys_to_string(config_dict)
+        config_dict = convert_dataclass_to_dict(config_dict)

        return json.dumps(config_dict, indent=2, sort_keys=True) + "\n"

@@ -1093,3 +1133,142 @@ class GenerationConfig(PushToHubMixin):
        # Remove all the attributes that were updated, without modifying the input dict
        unused_kwargs = {key: value for key, value in kwargs.items() if key not in to_remove}
        return unused_kwargs
+
+
+@dataclass
+class WatermarkingConfig:
+    """
+    Class that holds arguments for watermark generation and should be passed into `GenerationConfig` during `generate`.
+    See [this paper](https://arxiv.org/abs/2306.04634) for more details on the arguments.
+
+    Accepts the following keys:
+        - greenlist_ratio (`float`):
+            Used for watermarking. The ratio of "green" tokens used to the vocabulary size. Defaults to 0.25.
+        - bias (`float`):
+            Used with watermarking. The bias added to the selected "green" tokens' logits. Defaults to 2.0.
+        - hashing_key (`int`):
+            Hashing key used for watermarking. Defaults to 15485863 (the millionth prime).
+        - seeding_scheme (`str`):
+            Algorithm to use for watermarking. Accepts values:
+                - "lefthash" (default): "green" tokens selection depend on the last token (Algorithm 2 from the paper)
+                - "selfhash": "green" tokens selection depends on the current token itself (Algorithm 3 from the paper)
+                    The downside of this scheme is that it considers all possible next tokens and can be slower than "lefthash".
+        - context_width(`int`):
+            The context length of previous tokens to use in seeding. Higher context length makes watermarking more robust.
+    """
+
+    def __init__(
+        self,
+        greenlist_ratio: Optional[float] = 0.25,
+        bias: Optional[float] = 2.0,
+        hashing_key: Optional[int] = 15485863,
+        seeding_scheme: Optional[str] = "lefthash",
+        context_width: Optional[int] = 1,
+    ):
+        self.greenlist_ratio = greenlist_ratio
+        self.bias = bias
+        self.hashing_key = hashing_key
+        self.seeding_scheme = seeding_scheme
+        self.context_width = context_width
+
+    @classmethod
+    def from_dict(cls, config_dict, **kwargs):
+        """
+        Constructs a WatermarkingConfig instance from a dictionary of parameters.
+
+        Args:
+            config_dict (Dict[str, Any]): Dictionary containing configuration parameters.
+            **kwargs: Additional keyword arguments to override dictionary values.
+
+        Returns:
+            WatermarkingConfig: Instance of WatermarkingConfig constructed from the dictionary.
+        """
+        config = cls(**config_dict)
+        to_remove = []
+        for key, value in kwargs.items():
+            if hasattr(config, key):
+                setattr(config, key, value)
+                to_remove.append(key)
+        for key in to_remove:
+            kwargs.pop(key, None)
+        return config
+
+    def to_json_file(self, json_file_path: Union[str, os.PathLike]):
+        """
+        Save this instance to a JSON file.
+
+        Args:
+            json_file_path (Union[str, os.PathLike]): Path to the JSON file in which this configuration instance's parameters will be saved.
+        """
+        with open(json_file_path, "w", encoding="utf-8") as writer:
+            config_dict = self.to_dict()
+            json_string = json.dumps(config_dict, indent=2, sort_keys=True) + "\n"
+
+            writer.write(json_string)
+
+    def to_dict(self) -> Dict[str, Any]:
+        """
+        Serializes this instance to a Python dictionary.
+
+        Returns:
+            Dict[str, Any]: Dictionary of all the attributes that make up this configuration instance.
+        """
+        output = copy.deepcopy(self.__dict__)
+        return output
+
+    def __iter__(self):
+        for attr, value in copy.deepcopy(self.__dict__).items():
+            yield attr, value
+
+    def __repr__(self):
+        return f"{self.__class__.__name__} {self.to_json_string()}"
+
+    def to_json_string(self):
+        """
+        Serializes this instance to a JSON formatted string.
+
+        Returns:
+            str: JSON formatted string representing the configuration instance.
+        """
+        return json.dumps(self.__dict__, indent=2) + "\n"
+
+    def update(self, **kwargs):
+        """
+        Update the configuration attributes with new values.
+
+        Args:
+            **kwargs: Keyword arguments representing configuration attributes and their new values.
+        """
+        for key, value in kwargs.items():
+            if hasattr(self, key):
+                setattr(self, key, value)
+
+    def validate(self):
+        watermark_missing_arg_msg = (
+            "Some of the keys in `watermarking_config` are defined incorrectly. `{key}` should be {correct_value}` "
+            "but found {found_value}"
+        )
+        if self.seeding_scheme not in ["selfhash", "lefthash"]:
+            raise ValueError(
+                watermark_missing_arg_msg.format(
+                    key="seeding_scheme",
+                    correct_value="[`selfhash`, `lefthash`]",
+                    found_value=self.seeding_scheme,
+                ),
+            )
+        if not 0.0 <= self.greenlist_ratio <= 1.0:
+            raise ValueError(
+                watermark_missing_arg_msg.format(
+                    key="greenlist_ratio",
+                    correct_value="in range between 0.0 and 1.0",
+                    found_value=self.seeding_scheme,
+                ),
+            )
+        if not self.context_width >= 1:
+            raise ValueError(
+                watermark_missing_arg_msg.format(
+                    key="context_width",
+                    correct_value="a positive integer",
+                    found_value=self.context_width,
+                ),
+            )
--- a/src/transformers/generation/logits_process.py
+++ b/src/transformers/generation/logits_process.py
@@ -2321,3 +2321,144 @@ class BarkEosPrioritizerLogitsProcessor(LogitsProcessor):
            scores_processed = torch.where(do_early_stop, early_stop_scores, scores)

        return scores_processed
+
+
+class WatermarkLogitsProcessor(LogitsProcessor):
+    r"""
+    Logits processor for watermarking generated text. The processor modifies model output scores by adding a small bias to
+    randomized set of "green" tokens before generating the next token. "Green" tokens selection process depends on the
+    `seeding_scheme` used. The code was based on the [original repo](https://github.com/jwkirchenbauer/lm-watermarking/tree/main).
+
+    The text generated by this `LogitsProcessor` can be detected using `WatermarkDetector`. See [`~WatermarkDetector.__call__`] for details,
+
+    See [the paper](https://arxiv.org/abs/2306.04634) for more information.
+
+    Args:
+        vocab_size (`int`):
+            The model tokenizer's vocab_size. Used to calculate "green" tokens ratio.
+        device (`str`):
+            The device where model is allocated.
+        greenlist_ratio (`float`, optional, *optional*, defaults to 0.25):
+            The ratio of "green" tokens used to the vocabulary size. Defaults to 0.25.
+        bias (`float`, optional, *optional*, defaults to 2.0):
+            The bias added to the selected "green" tokens' logits. Consider lowering the
+            `bias` if the text generation quality degrades. Recommended values are in the
+            range of [0.5, 2.0]. Defaults to 2.0.
+        hashing_key (`int`, optional, *optional*, defaults to 15485863):
+            Key used for hashing. If you deploy this watermark, we advise using another private key.
+            Defaults to 15485863 (the millionth prime).
+        seeding_scheme (`str`, optional, *optional*, defaults to `"lefthash"`):
+            The seeding scheme used for selecting "green" tokens. Accepts values:
+                - "lefthash" (default): "green" tokens selection depend on the last token (Algorithm 2 from paper)
+                - "selfhash": "green" tokens selection depends on the current token itself (Algorithm 3 from paper)
+                    The downside of this scheme is that it considers all possible next tokens and can be slower than "lefthash".
+            The context length of previous tokens to use in seeding. Higher context length makes watermarking more robust.
+        context_width (`int`, *optional*, defaults to 1):
+            The number of previous tokens to use when setting the seed.
+
+    Examples:
+
+    ```python
+    >>> from transformers import AutoTokenizer, AutoModelForCausalLM, WatermarkingConfig
+
+    >>> model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2")
+    >>> tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")
+    >>> inputs = tokenizer(["Alice and Bob are"], return_tensors="pt")
+
+    >>> # normal generation
+    >>> out = model.generate(inputs["input_ids"], max_length=20, do_sample=False)
+    >>> tokenizer.batch_decode(out, skip_special_tokens=True)[0]
+    'Alice and Bob are both in the same room.\n\n"I\'m not sure if you\'re'
+
+    >>> # watermarked generation
+    >>> watermarking_config = WatermarkingConfig(bias=2.5, context_width=2, seeding_scheme="selfhash")
+    >>> out = model.generate(inputs["input_ids"], watermarking_config=watermarking_config, max_length=20, do_sample=False)
+    >>> tokenizer.batch_decode(out, skip_special_tokens=True)[0]
+    'Alice and Bob are both still alive and well and the story is pretty much a one-hour adventure'
+
+    >>> # to detect watermarked text use the WatermarkDetector class
+    >>> from transformers import WatermarkDetector
+    >>> detector = WatermarkDetector(model_config=model.config, device="cpu", watermarking_config= watermarking_config)
+    >>> detection_preds = detector(out)
+    >>> detection_preds
+    array([ True])
+    ```
+    """
+
+    def __init__(
+        self,
+        vocab_size,
+        device,
+        greenlist_ratio: float = 0.25,
+        bias: float = 2.0,
+        hashing_key: int = 15485863,
+        seeding_scheme: str = "lefthash",
+        context_width: int = 1,
+    ):
+        if seeding_scheme not in ["selfhash", "lefthash"]:
+            raise ValueError(f"seeding_scheme has to be one of [`selfhash`, `lefthash`], but found {seeding_scheme}")
+        if greenlist_ratio >= 1.0 or greenlist_ratio <= 0.0:
+            raise ValueError(
+                f"greenlist_ratio has be in range between 0.0 and 1.0, exclusively. but found {greenlist_ratio}"
+            )
+
+        self.vocab_size = vocab_size
+        self.greenlist_size = int(self.vocab_size * greenlist_ratio)
+        self.bias = bias
+        self.seeding_scheme = seeding_scheme
+        self.rng = torch.Generator(device=device)
+        self.hash_key = hashing_key
+        self.context_width = context_width
+
+        self.rng.manual_seed(hashing_key)
+        self.table_size = 1_000_003
+        self.fixed_table = torch.randperm(self.table_size, generator=self.rng, device=device)
+
+    def set_seed(self, input_seq: torch.LongTensor):
+        input_seq = input_seq[-self.context_width :]
+        if self.seeding_scheme == "selfhash":
+            a = self.fixed_table[input_seq % self.table_size] + 1
+            b = self.fixed_table[input_seq[-1] % self.table_size] + 1
+            seed = (self.hash_key * a * b).min().item()
+        else:
+            seed = self.hash_key * input_seq[-1].item()
+        self.rng.manual_seed(seed % (2**64 - 1))
+
+    def _get_greenlist_ids(self, input_seq: torch.LongTensor) -> torch.LongTensor:
+        self.set_seed(input_seq)
+        vocab_permutation = torch.randperm(self.vocab_size, device=input_seq.device, generator=self.rng)
+        greenlist_ids = vocab_permutation[: self.greenlist_size]
+        return greenlist_ids
+
+    def _score_rejection_sampling(self, input_seq: torch.LongTensor, scores: torch.FloatTensor) -> torch.LongTensor:
+        """
+        Generate greenlist based on current candidate next token. Reject and move on if necessary.
+        Runs for a fixed number of steps only for efficiency, since the methods is not batched.
+        """
+        final_greenlist = []
+        _, greedy_predictions = scores.sort(dim=-1, descending=True)
+
+        # 40 is an arbitrary number chosen to save compute and not run for long (taken from orig repo)
+        for i in range(40):
+            greenlist_ids = self._get_greenlist_ids(torch.cat([input_seq, greedy_predictions[i, None]], dim=-1))
+            if greedy_predictions[i] in greenlist_ids:
+                final_greenlist.append(greedy_predictions[i])
+        return torch.tensor(final_greenlist, device=input_seq.device)
+
+    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> torch.FloatTensor:
+        if input_ids.shape[-1] < self.context_width:
+            logger.warning(
+                f"`input_ids` should have at least `{self.context_width}` tokens but has {input_ids.shape[-1]}. "
+                "The seeding will be skipped for this generation step!"
+            )
+            return scores
+
+        scores_processed = scores.clone()
+        for b_idx, input_seq in enumerate(input_ids):
+            if self.seeding_scheme == "selfhash":
+                greenlist_ids = self._score_rejection_sampling(input_seq, scores[b_idx])
+            else:
+                greenlist_ids = self._get_greenlist_ids(input_seq)
+            scores_processed[b_idx, greenlist_ids] = scores_processed[b_idx, greenlist_ids] + self.bias
+
+        return scores_processed
--- a/src/transformers/generation/utils.py
+++ b/src/transformers/generation/utils.py
@@ -74,6 +74,7 @@ from .logits_process import (
    TopPLogitsWarper,
    TypicalLogitsWarper,
    UnbatchedClassifierFreeGuidanceLogitsProcessor,
+    WatermarkLogitsProcessor,
 )
 from .stopping_criteria import (
    EosTokenCriteria,
@@ -763,6 +764,7 @@ class GenerationMixin:
        encoder_input_ids: torch.LongTensor,
        prefix_allowed_tokens_fn: Callable[[int, torch.Tensor], List[int]],
        logits_processor: Optional[LogitsProcessorList],
+        device: str = None,
        model_kwargs: Optional[Dict[str, Any]] = None,
        negative_prompt_ids: Optional[torch.Tensor] = None,
        negative_prompt_attention_mask: Optional[torch.Tensor] = None,
@@ -879,6 +881,18 @@ class GenerationMixin:
                FutureWarning,
            )
            processors.append(ForceTokensLogitsProcessor(generation_config.forced_decoder_ids, _has_warned=True))
+        if generation_config.watermarking_config is not None:
+            processors.append(
+                WatermarkLogitsProcessor(
+                    vocab_size=self.config.vocab_size,
+                    device=device,
+                    greenlist_ratio=generation_config.watermarking_config.greenlist_ratio,
+                    bias=generation_config.watermarking_config.bias,
+                    hashing_key=generation_config.watermarking_config.hashing_key,
+                    seeding_scheme=generation_config.watermarking_config.seeding_scheme,
+                    context_width=generation_config.watermarking_config.context_width,
+                )
+            )
        processors = self._merge_criteria_processor_list(processors, logits_processor)
        # `LogitNormalization` should always be the last logit processor, when present
        if generation_config.renormalize_logits is True:
@@ -1632,6 +1646,7 @@ class GenerationMixin:
            encoder_input_ids=inputs_tensor,
            prefix_allowed_tokens_fn=prefix_allowed_tokens_fn,
            logits_processor=logits_processor,
+            device=inputs_tensor.device,
            model_kwargs=model_kwargs,
            negative_prompt_ids=negative_prompt_ids,
            negative_prompt_attention_mask=negative_prompt_attention_mask,
--- a/src/transformers/generation/watermarking.py
+++ b/src/transformers/generation/watermarking.py
@@ -0,0 +1,240 @@
+# coding=utf-8
+# Copyright 2024 The HuggingFace Inc. team
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import collections
+from dataclasses import dataclass
+from functools import lru_cache
+from typing import Dict, Optional, Union
+
+import numpy as np
+
+from ..configuration_utils import PretrainedConfig
+from ..utils import is_torch_available, logging
+from .configuration_utils import WatermarkingConfig
+
+
+if is_torch_available():
+    import torch
+
+    from .logits_process import WatermarkLogitsProcessor
+
+
+logger = logging.get_logger(__name__)
+
+
+@dataclass
+class WatermarkDetectorOutput:
+    """
+    Outputs of a watermark detector.
+
+    Args:
+        num_tokens_scored (np.array of shape (batch_size)):
+            Array containing the number of tokens scored for each element in the batch.
+        num_green_tokens (np.array of shape (batch_size)):
+            Array containing the number of green tokens for each element in the batch.
+        green_fraction (np.array of shape (batch_size)):
+            Array containing the fraction of green tokens for each element in the batch.
+        z_score (np.array of shape (batch_size)):
+            Array containing the z-score for each element in the batch. Z-score here shows
+            how many standard deviations away is the green token count in the input text
+            from the expected green token count for machine-generated text.
+        p_value (np.array of shape (batch_size)):
+            Array containing the p-value for each batch obtained from z-scores.
+        prediction (np.array of shape (batch_size)), *optional*:
+            Array containing boolean predictions whether a text is machine-generated for each element in the batch.
+        confidence (np.array of shape (batch_size)), *optional*:
+            Array containing confidence scores of a text being machine-generated for each element in the batch.
+    """
+
+    num_tokens_scored: np.array = None
+    num_green_tokens: np.array = None
+    green_fraction: np.array = None
+    z_score: np.array = None
+    p_value: np.array = None
+    prediction: Optional[np.array] = None
+    confidence: Optional[np.array] = None
+
+
+class WatermarkDetector:
+
+    r"""
+    Detector for detection of watermark generated text. The detector needs to be given the exact same settings that were
+    given during text generation to replicate the watermark greenlist generation and so detect the watermark. This includes
+    the correct device that was used during text generation, the correct watermarking arguments and the correct tokenizer vocab size.
+    The code was based on the [original repo](https://github.com/jwkirchenbauer/lm-watermarking/tree/main).
+
+    See [the paper](https://arxiv.org/abs/2306.04634) for more information.
+
+    Args:
+        model_config (`PretrainedConfig`):
+            The model config that will be used to get model specific arguments used when generating.
+        device (`str`):
+            The device which was used during watermarked text generation.
+        watermarking_config (Union[`WatermarkingConfig`, `Dict`]):
+            The exact same watermarking config and arguments used when generating text.
+        ignore_repeated_ngrams (`bool`, *optional*, defaults to `False`):
+            Whether to count every unique ngram only once or not.
+        max_cache_size (`int`, *optional*, defaults to 128):
+            The max size to be used for LRU caching of seeding/sampling algorithms called for every token.
+
+    Examples:
+
+    ```python
+    >>> from transformers import AutoTokenizer, AutoModelForCausalLM, WatermarkDetector, WatermarkingConfig
+
+    >>> model_id = "openai-community/gpt2"
+    >>> model = AutoModelForCausalLM.from_pretrained(model_id)
+    >>> tok = AutoTokenizer.from_pretrained(model_id)
+    >>> tok.pad_token_id = tok.eos_token_id
+    >>> tok.padding_side = "left"
+
+    >>> inputs = tok(["This is the beginning of a long story", "Alice and Bob are"], padding=True, return_tensors="pt")
+    >>> input_len = inputs["input_ids"].shape[-1]
+
+    >>> # first generate text with watermark and without
+    >>> watermarking_config = WatermarkingConfig(bias=2.5, seeding_scheme="selfhash")
+    >>> out_watermarked = model.generate(**inputs, watermarking_config=watermarking_config, do_sample=False, max_length=20)
+    >>> out = model.generate(**inputs, do_sample=False, max_length=20)
+
+    >>> # now we can instantiate the detector and check the generated text
+    >>> detector = WatermarkDetector(model_config=model.config, device="cpu", watermarking_config=watermarking_config)
+    >>> detection_out_watermarked = detector(out_watermarked, return_dict=True)
+    >>> detection_out = detector(out, return_dict=True)
+    >>> detection_out_watermarked.prediction
+    array([ True,  True])
+
+    >>> detection_out.prediction
+    array([False,  False])
+    ```
+    """
+
+    def __init__(
+        self,
+        model_config: PretrainedConfig,
+        device: str,
+        watermarking_config: Union[WatermarkingConfig, Dict],
+        ignore_repeated_ngrams: bool = False,
+        max_cache_size: int = 128,
+    ):
+        if isinstance(watermarking_config, WatermarkingConfig):
+            watermarking_config = watermarking_config.to_dict()
+
+        self.bos_token_id = (
+            model_config.bos_token_id if not model_config.is_encoder_decoder else model_config.decoder_start_token_id
+        )
+        self.greenlist_ratio = watermarking_config["greenlist_ratio"]
+        self.ignore_repeated_ngrams = ignore_repeated_ngrams
+        self.processor = WatermarkLogitsProcessor(
+            vocab_size=model_config.vocab_size, device=device, **watermarking_config
+        )
+
+        # Expensive re-seeding and sampling is cached.
+        self._get_ngram_score_cached = lru_cache(maxsize=max_cache_size)(self._get_ngram_score)
+
+    def _get_ngram_score(self, prefix: torch.LongTensor, target: int):
+        greenlist_ids = self.processor._get_greenlist_ids(prefix)
+        return target in greenlist_ids
+
+    def _score_ngrams_in_passage(self, input_ids: torch.LongTensor):
+        batch_size, seq_length = input_ids.shape
+        selfhash = int(self.processor.seeding_scheme == "selfhash")
+        n = self.processor.context_width + 1 - selfhash
+        indices = torch.arange(n).unsqueeze(0) + torch.arange(seq_length - n + 1).unsqueeze(1)
+        ngram_tensors = input_ids[:, indices]
+
+        num_tokens_scored_batch = np.zeros(batch_size)
+        green_token_count_batch = np.zeros(batch_size)
+        for batch_idx in range(ngram_tensors.shape[0]):
+            frequencies_table = collections.Counter(ngram_tensors[batch_idx])
+            ngram_to_watermark_lookup = {}
+            for ngram_example in frequencies_table.keys():
+                prefix = ngram_example if selfhash else ngram_example[:-1]
+                target = ngram_example[-1]
+                ngram_to_watermark_lookup[ngram_example] = self._get_ngram_score_cached(prefix, target)
+
+            if self.ignore_repeated_ngrams:
+                # counts a green/red hit once per unique ngram.
+                # num total tokens scored becomes the number unique ngrams.
+                num_tokens_scored_batch[batch_idx] = len(frequencies_table.keys())
+                green_token_count_batch[batch_idx] = sum(ngram_to_watermark_lookup.values())
+            else:
+                num_tokens_scored_batch[batch_idx] = sum(frequencies_table.values())
+                green_token_count_batch[batch_idx] = sum(
+                    freq * outcome
+                    for freq, outcome in zip(frequencies_table.values(), ngram_to_watermark_lookup.values())
+                )
+        return num_tokens_scored_batch, green_token_count_batch
+
+    def _compute_z_score(self, green_token_count: np.array, total_num_tokens: np.array) -> np.array:
+        expected_count = self.greenlist_ratio
+        numer = green_token_count - expected_count * total_num_tokens
+        denom = np.sqrt(total_num_tokens * expected_count * (1 - expected_count))
+        z = numer / denom
+        return z
+
+    def _compute_pval(self, x, loc=0, scale=1):
+        z = (x - loc) / scale
+        return 1 - (0.5 * (1 + np.sign(z) * (1 - np.exp(-2 * z**2 / np.pi))))
+
+    def __call__(
+        self,
+        input_ids: torch.LongTensor,
+        z_threshold: float = 3.0,
+        return_dict: bool = False,
+    ) -> Union[WatermarkDetectorOutput, np.array]:
+        """
+                Args:
+                input_ids (`torch.LongTensor`):
+                    The watermark generated text. It is advised to remove the prompt, which can affect the detection.
+                z_threshold (`Dict`, *optional*, defaults to `3.0`):
+                    Changing this threshold will change the sensitivity of the detector. Higher z threshold gives less
+                    sensitivity and vice versa for lower z threshold.
+                return_dict (`bool`,  *optional*, defaults to `False`):
+                    Whether to return `~generation.WatermarkDetectorOutput` or not. If not it will return boolean predictions,
+        ma
+                Return:
+                    [`~generation.WatermarkDetectorOutput`] or `np.array`: A [`~generation.WatermarkDetectorOutput`]
+                    if `return_dict=True` otherwise a `np.array`.
+
+        """
+
+        # Let's assume that if one batch start with `bos`, all batched also do
+        if input_ids[0, 0] == self.bos_token_id:
+            input_ids = input_ids[:, 1:]
+
+        if input_ids.shape[-1] - self.processor.context_width < 1:
+            raise ValueError(
+                f"Must have at least `1` token to score after the first "
+                f"min_prefix_len={self.processor.context_width} tokens required by the seeding scheme."
+            )
+
+        num_tokens_scored, green_token_count = self._score_ngrams_in_passage(input_ids)
+        z_score = self._compute_z_score(green_token_count, num_tokens_scored)
+        prediction = z_score > z_threshold
+
+        if return_dict:
+            p_value = self._compute_pval(z_score)
+            confidence = 1 - p_value
+
+            return WatermarkDetectorOutput(
+                num_tokens_scored=num_tokens_scored,
+                num_green_tokens=green_token_count,
+                green_fraction=green_token_count / num_tokens_scored,
+                z_score=z_score,
+                p_value=p_value,
+                prediction=prediction,
+                confidence=confidence,
+            )
+        return prediction
--- a/src/transformers/utils/dummy_pt_objects.py
+++ b/src/transformers/utils/dummy_pt_objects.py
@@ -422,6 +422,20 @@ class UnbatchedClassifierFreeGuidanceLogitsProcessor(metaclass=DummyObject):
        requires_backends(self, ["torch"])


+class WatermarkDetector(metaclass=DummyObject):
+    _backends = ["torch"]
+
+    def __init__(self, *args, **kwargs):
+        requires_backends(self, ["torch"])
+
+
+class WatermarkLogitsProcessor(metaclass=DummyObject):
+    _backends = ["torch"]
+
+    def __init__(self, *args, **kwargs):
+        requires_backends(self, ["torch"])
+
+
 class WhisperTimeStampLogitsProcessor(metaclass=DummyObject):
    _backends = ["torch"]

--- a/tests/generation/test_logits_process.py
+++ b/tests/generation/test_logits_process.py
@@ -53,6 +53,7 @@ if is_torch_available():
        TopPLogitsWarper,
        TypicalLogitsWarper,
        UnbatchedClassifierFreeGuidanceLogitsProcessor,
+        WatermarkLogitsProcessor,
    )
    from transformers.generation.logits_process import BarkEosPrioritizerLogitsProcessor

@@ -949,3 +950,26 @@ class LogitsProcessorTest(unittest.TestCase):
            [float("-inf"), float("-inf"), scores[0][0], scores[0][0]],
        ]
        self.assertListEqual(actual_scores.tolist(), expected_scores_list)
+
+    def test_watermarking_processor(self):
+        batch_size = 3
+        vocab_size = 20
+
+        input_ids = ids_tensor((batch_size, 5), vocab_size=20)
+        scores = self._get_uniform_logits(batch_size, vocab_size)
+
+        # raise error if incorrect seeding_scheme is passed
+        with self.assertRaises(ValueError):
+            WatermarkLogitsProcessor(vocab_size=vocab_size, device="cpu", seeding_scheme="hash")
+
+        # raise error if the greenlist_ratio in not in range (0.0, 1.0)
+        with self.assertRaises(ValueError):
+            WatermarkLogitsProcessor(vocab_size=vocab_size, device="cpu", greenlist_ratio=1.2)
+
+        watermark = WatermarkLogitsProcessor(vocab_size=vocab_size, device=input_ids.device)
+
+        # use fixed id for last token, needed for reprodicibility and tests
+        input_ids[:, -1] = 10
+        scores_wo_bias = scores[:, -1].clone()
+        out = watermark(input_ids=input_ids, scores=scores)
+        self.assertTrue((out[:, 1] == scores_wo_bias + watermark.bias).all())
--- a/tests/generation/test_utils.py
+++ b/tests/generation/test_utils.py
@@ -75,6 +75,8 @@ if is_torch_available():
        SampleEncoderDecoderOutput,
        StoppingCriteria,
        StoppingCriteriaList,
+        WatermarkDetector,
+        WatermarkingConfig,
    )
    from transformers.generation.utils import _speculative_sampling

@@ -2098,6 +2100,44 @@ class GenerationIntegrationTests(unittest.TestCase, GenerationIntegrationTestsMi
        )
        self.assertListEqual(low_output.tolist(), high_output.tolist())

+    @slow
+    def test_watermark_generation(self):
+        tokenizer = GPT2Tokenizer.from_pretrained("openai-community/gpt2")
+        model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2").to(torch_device)
+        tokenizer.pad_token_id = tokenizer.eos_token_id
+        model_inputs = tokenizer("I will be", return_tensors="pt").to(torch_device)
+        input_len = model_inputs["input_ids"].shape[-1]
+
+        # generation should work with both input types: WatermarkingConfig or Dict, so let's check it here :)
+        watermark_config = WatermarkingConfig(bias=2.5, seeding_scheme="selfhash")
+        _ = model.generate(**model_inputs, watermarking_config=watermark_config, do_sample=False, max_length=15)
+
+        args = {
+            "bias": 2.0,
+            "context_width": 1,
+            "seeding_scheme": "selfhash",
+            "greenlist_ratio": 0.25,
+            "hashing_key": 15485863,
+        }
+        output = model.generate(**model_inputs, do_sample=False, max_length=15)
+        output_selfhash = model.generate(**model_inputs, watermarking_config=args, do_sample=False, max_length=15)
+
+        # check that the watermarked text is generating what is should
+        self.assertListEqual(
+            output.tolist(), [[40, 481, 307, 262, 717, 284, 9159, 326, 314, 716, 407, 257, 4336, 286, 262]]
+        )
+        self.assertListEqual(
+            output_selfhash.tolist(), [[40, 481, 307, 2263, 616, 640, 284, 651, 616, 1621, 503, 612, 553, 531, 367]]
+        )
+
+        detector = WatermarkDetector(model_config=model.config, device=torch_device, watermarking_config=args)
+        detection_out_watermarked = detector(output_selfhash[:, input_len:], return_dict=True)
+        detection_out = detector(output[:, input_len:], return_dict=True)
+
+        # check that the detector is detecting watermarked text
+        self.assertListEqual(detection_out_watermarked.prediction.tolist(), [True])
+        self.assertListEqual(detection_out.prediction.tolist(), [False])
+
    @slow
    def test_beam_search_example_integration(self):
        # PT-only test: TF doesn't have a BeamSearchScorer