added wordpiece - updated readme

2018-10-30 23:09:09 +01:00
parent ccce66be27
commit 12e013dbac
4 changed files with 794 additions and 68 deletions
--- a/README.md
+++ b/README.md
@@ -1,2 +1,23 @@
 # pytorch-pretrained-BERT
-A PyTorch version of Google's pretrained BERT model
+A PyTorch version of Google's pretrained BERT model as described in
+
+No bells and whitles, just:
+- [one class](bert_model.py) with a clean commented version of Google's BERT model that can load the weights pre-trained by Google's authors,
+- [another class](data_processor.py) with all you need to pre- and post-process text data for the model (tokenize and encode),
+- and [a script](download_weigths.sh) to download Google's pre-trained weights.
+
+Here is how to use these:
+
+```python
+from .bert_model import BERT
+from .data_processor import DataProcessor
+
+bert_model = BERT(bert_model_path='.')
+data_processor = DataProcessor(bert_vocab_path='.')
+
+input_sentence = "We are playing with the BERT model."
+
+tensor_input = data_processor.encode(input_sentence)
+tensor_output = bert_model(prepared_input)
+output_sentence = data_processor.decode(tensor_output)
+```