From a775fce71920324b8e17da5db5c6258b9bf4ae0d Mon Sep 17 00:00:00 2001
From: nlpfun <bfsu_jason@163.com>
Date: Sun, 28 Nov 2021 14:24:10 +0800
Subject: [PATCH] Bertalign 0.03 README

---
 README.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/README.md b/README.md
index c8cf500..fc93ece 100644
--- a/README.md
+++ b/README.md
@@ -46,7 +46,7 @@ In order to compare the sentence-based alignments returned by various aligners w
 All the experiments are conducted on [Google Colab](https://colab.research.google.com/).
 
 ### Sentence Embeddings
-We use the Python script [embed_sents.py](./bin/embed_sents.py) to create the sentence embedddings for the MAC-Dev and the Bible corpus:
+We use the Python script [embed_sents.py](./bin/embed_sents.py) to create the sentence embedddings for the MAC-Dev and the Bible corpus. This script is based on [Vecalign developed by Brian Thompson](https://github.com/thompsonb).
 ```
 # Embedding MAC-Dev Chinese
 !python bin/embed_sents.py \
@@ -76,7 +76,7 @@ We use the Python script [embed_sents.py](./bin/embed_sents.py) to create the se
   -m data/bible/meta_data.tsv \
   -n 5
 ```
-The parameter *-n* indicates the maximum number of overlapping sentences allowed on the source and target side, which is similar to word *n*-grams applied to sentences. After running the script, the overlapping sentences in the source and target texts and their embeddings are saved in the file *overlap* and *overlap.emb* respectively.
+The parameter *-n* indicates the maximum number of overlapping sentences allowed on the source and target side, which is similar to word *n*-grams applied to sentences. After running the script, the overlapping sentences *overlap* in the source and target texts and their embeddings *overlap.emb* are saved in the directory [mac/dev/zh](./data/mac/dev/zh), [mac/dev/en](./data/mac/dev/en),  [bible/zh](./data/bible/zh), and [bible/en](./data/bible/en).
 
 ### Evaluation on MAC-Dev
 
@@ -110,7 +110,7 @@ The parameter *-n* indicates the maximum number of overlapping sentences allowed
 
 Please refer to [Sennrich & Volk (2010)](https://aclanthology.org/people/r/rico-sennrich/) for the difference between Strict and Lax evaluation method. We can see that the F1 score is 0.91 when aligning MAC-Dev using Bertalign.
 
-Please note that aligning literary texts is not an easy task, since they contain more interpretive and free translations than non-literary works ([Xu et al. (2015)](https://aclanthology.org/2015.lilt-12.6/)). Let's see how the other systems perform on MAC-Dev:
+Please note that aligning literary texts is not an easy task, since they contain more interpretive and free translations than non-literary works. You can refer to [Xu et al. (2015)](https://aclanthology.org/2015.lilt-12.6/) for more details about sentence alignment of literary texts. Let's see how the other systems perform on MAC-Dev:
 
 #### Baseline Approaches