Document alignment software and web crawler
Free trial
sales@terminotix.com
to ask for your free trial version of AlignFactory Desktop.
Students
To buy
AlignFactory Desktop is a document alignment solution that save significant time when feeding a computer-assisted translation tool, neural machine translation engine or bilingual concordancer.
AlignFactory offers two distinct methods for pairing source and target documents: by user-defined language markers in the file names or by language detection and electronic fingerprinting algorithms that analyze file type and content. When a valid file pair is detected, AlignFactory performs the alignment and creates a bitext in the chosen format.
AlignFactory has many configuration options that let you help the software correctly identify file pairs for alignment, even if they have different file name structures. File matching criteria include:
- Language markers
- Filtering by file extension
- Exclusion strings for removing files with names containing certain character strings
- Ignore strings for ignoring certain character strings in file names
- Ignore characters after last marker
- Match files in different folders
- Allow one file name with no language marker
LogiTerm bitextsAlignFactory can create LogiTerm bitexts in XML or HTML format. If you use LogiTerm, we recommend creating XML bitexts, as they are more visually appealing and do not display segment language codes when viewed in a web browser. What’s more, XML bitexts contain source document metadata and are slightly faster to index than HTML bitexts.
HTML bitexts, however, can be opened in any application with no display issues. They can also be easily indexed by any full-text search engine.
TMX filesYou can also create TMX files to import into any translation memory. The file creation options are as follows:
- Create one TMX file for each pair, or merge into a single file
- Insert source document name into each segment
- Automatically add attributes (project, client, domain, etc.) to segments
Alignment EditorAlignFactory has an integrated alignment editing tool that lets you make changes to LogiTerm bitexts and view and edit TMX files before importing them into your translation memory.
Segment filteringOver 18 filtering options that let you delete unwanted segments in your alignments automatically. Primary filters are as follows:
- Reject if both sides are the same
- Reject if segment contains no letters
- Reject duplicate segments
- Reject if too few words in segment
- Reject if one side is significantly longer
- Reject if too many sentences in segment
Web CrawlerThe Web Crawler tool lets you automatically download pages or files from a website. Once the download is complete, simply create and launch an alignment project to automatically align all the downloaded content. Then, import the alignments into a computer-assisted translation tool.
The Web Crawler contains the following domain filters:
- Ignore top-level domain (TLD)
- Allow two-letter domain
- Do not truncate URL prefix
You can also filter downloaded files by extension or file name character string.