Post-processing script
This commit is contained in:
24
README.md
24
README.md
@@ -368,6 +368,30 @@ It takes 4.676 seconds to align all the sentences.
|
||||
---------------------------------
|
||||
```
|
||||
|
||||
### Post-processing
|
||||
|
||||
Post-processing means manually correcting the wrong alignments generated by automatic aligners. The human validated sentence pairs can then be loaded into a translation memory software (e.g. [OmegaT](https://omegat.org/)) or bilingual concordancer (e.g. [Paraconc](https://paraconc.com/)), enabling translators to search the corresponding translation units and improve translation quality, or help researchers to carry out corpus-based translation studies.
|
||||
|
||||
Bertalign supports two output formats for manual alignments with [LF Aligner](https://sourceforge.net/projects/aligner/) and [Intertext](https://wanthalf.saga.cz/intertext). For example, running the following scripts will save the converted outputs in [tsv](./data/mac/dev/auto/tsv) or [intertext][./data/mac/dev/data/intertext], which can be opened and edited using LF Aligner or Intertext.
|
||||
|
||||
```
|
||||
# Convert automatic alignments to TSV for LF Aligner
|
||||
python -p mac-dev \
|
||||
-s data/mac/dev/zh zh \
|
||||
-t data/mac/dev/en en \
|
||||
-a data/mac/dev/auto \
|
||||
-f tsv
|
||||
|
||||
# Convert automatic alignments to XML for Intertext
|
||||
python -p mac-dev \
|
||||
-s data/mac/dev/zh zh \
|
||||
-t data/mac/dev/en en \
|
||||
-a data/mac/dev/auto \
|
||||
-f intertext
|
||||
```
|
||||
|
||||
|
||||
|
||||
## TODO List
|
||||
|
||||
Evaluate Bertalign on datasets containing language pairs other than Chinese and English.
|
||||
|
||||
Reference in New Issue
Block a user