Files
HuggingFace_transformer/tests
Nicolas Patry d8fc26e919 NerPipeline (TokenClassification) now outputs offsets of words (#8781)
* NerPipeline (TokenClassification) now outputs offsets of words

- It happens that the offsets are missing, it forces the user to pattern
match the "word" from his input, which is not always feasible.
For instance if a sentence contains the same word twice, then there
is no way to know which is which.
- This PR proposes to fix that by outputting 2 new keys for this
pipelines outputs, "start" and "end", which correspond to the string
offsets of the word. That means that we should always have the
invariant:

```python
input[entity["start"]: entity["end"]] == entity["entity_group"]
                                    # or entity["entity"] if not grouped
```

* Fixing doc style
2020-11-30 14:05:08 -05:00
..
2020-11-10 07:11:02 -05:00
2020-11-16 21:43:42 -05:00
2020-11-20 22:07:21 +01:00
2020-11-11 12:59:40 -05:00
2020-11-16 21:43:42 -05:00
2020-11-16 21:43:42 -05:00
2020-11-16 21:43:42 -05:00
2020-11-17 12:23:09 +01:00
2020-11-23 18:20:19 -05:00
2020-11-17 12:23:09 +01:00
2020-11-16 21:43:42 -05:00
2020-10-30 10:25:48 -04:00
2020-08-27 18:31:51 +02:00
2020-11-16 21:43:42 -05:00