Update README.md

This commit is contained in:
bfsujason
2022-04-17 00:19:57 +08:00
committed by GitHub
parent 60521ee4d0
commit e5a160c109

View File

@@ -6,25 +6,25 @@ Bertalign is designed to facilitate the construction of multilingual parallel co
---
##### Approach
#### Approach
Bertalign uses [sentence-transformers](https://github.com/UKPLab/sentence-transformers) to represent source and target sentences so that semantically similar sentences in different languages are mapped onto similar vector spaces. Then a two-step algorithm based on dynamic programming is performed: 1) Step 1 finds the 1-1 alignments for approximate anchor points; 2) Step 2 limits the search path to the anchor points and extracts all the valid alignments with 1-many, many-1 or many-to-many relations between the source and target sentences.
##### Performance
#### Performance
According to our experiments, Bertalign achieves more accurate results on [Text+Berg](./text+berg), a publicly available German-French parallel corpus, than the traditional length-, dictionary-, or MT-based alignment methods as reported in [Thompson & Koehn (2019)](https://aclanthology.org/D19-1136/)
##### Languges Supported
#### Languges Supported
Alignment between 25 languages: Catalan (ca), Chinese (zh), Czech (cs), Danish (da), Dutch (nl), English(en), Finnish (fi), French (fr), German (de), Greek (el), Hungarian (hu), Icelandic (is), Italian (it), Lithuanian (lt), Latvain (lv), Norwegian (no), Polish (pl), Portuguese (pt), Romanian (ro), Russian (ru), Slovak (sk), Slovenian (sl), Spanish (es), Swedish (sv), and Trukish (tr).
---
##### Installation
#### Installation
Please see [requirements.txt](./requirements.txt) for installation information. If you are running Bertalign on *GPU-enabled Linux* such as Google Colaboratory, please install *faiss-gpu* for faster processing.
##### Basic example
#### Basic example
Just import *Bertalign* and initialize it with the source and target text, which will detect the source and target language automatically and split both texts into sentences. Then invoke the method *align_sents()* to align sentences and print out the result with *print_sents()*.
@@ -130,7 +130,7 @@ aligner.print_sents()
---
##### Example with more options
#### Example with more options
The following example shows how to use Bertalign to align the Text+Berg corpus, and evaluate its performance with gold standard alignments. The evaluation script [eval.py](./bertalign/eval.py) is based on [Vecalign](https://github.com/thompsonb/vecalign).
@@ -236,13 +236,13 @@ log_final_scores(scores)
---
##### Licence
#### Licence
Bertalign is released under the [GNU General Public License v3.0](./LICENCE)
##### Credits
#### Credits
###### Main Libraries
##### Main Libraries
* [sentence-transformers](https://github.com/UKPLab/sentence-transformers)
@@ -250,7 +250,7 @@ Bertalign is released under the [GNU General Public License v3.0](./LICENCE)
* [sentence-splitter](https://github.com/mediacloud/sentence-splitter)
###### Other Sentence Aligners
##### Other Sentence Aligners
* [Hunalign](http://mokk.bme.hu/en/resources/hunalign/)
@@ -258,7 +258,7 @@ Bertalign is released under the [GNU General Public License v3.0](./LICENCE)
* [Vecalign](https://github.com/thompsonb/vecalign)
##### Todo List
#### Todo List
- Try the [CNN model](https://tfhub.dev/google/universal-sentence-encoder-multilingual/3) for sentence embeddings
* Develop a GUI for Windows users