Post-processing script

2021-11-30 20:39:03 +08:00
parent 6c5f824686
commit 2ef398bce5
26 changed files with 6224 additions and 12 deletions
--- a/README.md
+++ b/README.md
@@ -368,6 +368,30 @@ It takes 4.676 seconds to align all the sentences.
 ---------------------------------
 ```

+### Post-processing
+
+Post-processing means manually correcting the wrong alignments generated by automatic aligners. The human validated sentence pairs can then be loaded into a translation memory software (e.g. [OmegaT](https://omegat.org/)) or bilingual concordancer (e.g. [Paraconc](https://paraconc.com/)), enabling translators to search the corresponding translation units and improve translation quality, or help researchers to carry out corpus-based translation studies.
+
+Bertalign supports two output formats for manual alignments with [LF Aligner](https://sourceforge.net/projects/aligner/) and [Intertext](https://wanthalf.saga.cz/intertext). For example, running the following scripts will save the converted outputs in [tsv](./data/mac/dev/auto/tsv) or [intertext][./data/mac/dev/data/intertext], which can be opened and edited using LF Aligner or Intertext. 
+
+```
+# Convert automatic alignments to TSV for LF Aligner
+python -p mac-dev \
+	-s data/mac/dev/zh zh \
+    -t data/mac/dev/en en \
+    -a data/mac/dev/auto \
+    -f tsv
+
+# Convert automatic alignments to XML for Intertext
+python -p mac-dev \
+	-s data/mac/dev/zh zh \
+    -t data/mac/dev/en en \
+    -a data/mac/dev/auto \
+    -f intertext
+```
+
+
+
 ## TODO List

 Evaluate Bertalign on datasets containing language pairs other than Chinese and English.