Add SynthID (watermerking by Google DeepMind) (#34350)

* Add SynthIDTextWatermarkLogitsProcessor

* esolving comments.

* Resolving comments.

* esolving commits,

* Improving SynthIDWatermark tests.

* switch to PT version

* detector as pretrained model + style

* update training + style

* rebase

* Update logits_process.py

* Improving SynthIDWatermark tests.

* Shift detector training to wikitext negatives and stabilize with lower learning rate.

* Clean up.

* in for 7B

* cleanup

* upport python 3.8.

* README and final cleanup.

* HF Hub upload and initiaze.

* Update requirements for synthid_text.

* Adding SynthIDTextWatermarkDetector.

* Detector testing.

* Documentation changes.

* Copyrights fix.

* Fix detector api.

* ironing out errors

* ironing out errors

* training checks

* make fixup and make fix-copies

* docstrings and add to docs

* copyright

* BC

* test docstrings

* move import

* protect type hints

* top level imports

* watermarking example

* direct imports

* tpr fpr meaning

* process_kwargs

* SynthIDTextWatermarkingConfig docstring

* assert -> exception

* example updates

* no immutable dict (cant be serialized)

* pack fn

* einsum equivalent

* import order

* fix test on gpu

* add detector example

---------

Co-authored-by: Sumedh Ghaisas <sumedhg@google.com>
Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: sumedhghaisas2 <138781311+sumedhghaisas2@users.noreply.github.com>
Co-authored-by: raushan <raushan@huggingface.co>
This commit is contained in:
Joao Gante
2024-10-23 21:18:52 +01:00
committed by GitHub
parent e50bf61dec
commit b0f0c61899
15 changed files with 2238 additions and 80 deletions

View File

@@ -185,6 +185,9 @@ generation.
[[autodoc]] SuppressTokensLogitsProcessor
- __call__
[[autodoc]] SynthIDTextWatermarkLogitsProcessor
- __call__
[[autodoc]] TemperatureLogitsWarper
- __call__
@@ -418,5 +421,20 @@ A [`Constraint`] can be used to force the generation to include specific tokens
## Watermark Utils
[[autodoc]] WatermarkingConfig
- __call__
[[autodoc]] WatermarkDetector
- __call__
[[autodoc]] BayesianDetectorConfig
- __call__
[[autodoc]] BayesianDetectorModel
- __call__
[[autodoc]] SynthIDTextWatermarkingConfig
- __call__
[[autodoc]] SynthIDTextWatermarkDetector
- __call__

View File

@@ -41,8 +41,6 @@ like token streaming.
- validate
- get_generation_mode
[[autodoc]] generation.WatermarkingConfig
## GenerationMixin
[[autodoc]] GenerationMixin