Automated document alignment software with web crawler
With AlignFactory, you can save incredible amounts of time by automating the entire document alignment process to feed your computer-assisted translation tool. AlignFactory automatically scans your archives to locate source-target document pairs based on their file names. For the software to identify the languages in an alignment pair, the file names must contain a pre-set language marker. Once a valid file pair has been found, AlignFactory will align the files, creating either a LogiTerm XML, HTML or TMX bitext file, based on your preferences. AlignFactory can be synchronized with LogiTerm and configured to scan only folders that are linked with your LogiTerm modules.
AlignFactory’s many configuration options let you help the software correctly identify file pairs for alignment, even if they have different file name structures. File matching criteria include:
- Language markers,
- Filtering by file extension,
- Exclusion strings – for removing files with names containing certain character strings,
- Ignore strings – for ignoring certain character strings in file names,
- Ignore characters after last marker,
- Match files in different folders,
- Allow one file name with no language marker.
LogiTerm bitextsAlignFactory can create LogiTerm bitexts in XML or HTML format. If you use LogiTerm, we recommend creating XML bitexts, as they are more visually appealing and do not display segment language codes when viewed in a web browser. What’s more, they contain source document metadata and are slightly faster to index than HTML bitexts. However, HTML bitexts can be opened with any application, with no display issues. They can also be easily indexed by any full text search engine.
TMX filesAlignFactory can also generate TMX files to import into any translation memory. TMX file creation options are as follows:
- Create one TMX file for each pair, or merge into a single file,
- Insert name of source document in each segment,
- Automatically add attributes (project, client, domain, etc.) to segments.
Alignment EditorAlignFactory features an alignment editor tool that lets you make changes to LogiTerm bitexts. It also lets you view and edit TMX files before importing them into a translation memory.
Segment filteringAlignFactory features over 18 filtering options that let you get rid of unwanted segments in your alignments automatically. Filters include:
- Reject if both sides are the same,
- Reject if segment contains no letters,
- Reject duplicate segments,
- Reject if too few words in segment,
- Reject if one side is significantly longer,
- Reject if too many sentences in segment.
Web CrawlerThe Web Crawler tool can be used with the AlignFactory alignment engine to import an entire multilingual website into your translation memory. The Web Crawler automatically downloads pages and files from your chosen website. Once the download is complete, simply create an alignment project to automatically align all the downloaded pages and files. Then, all that’s left to do is import the alignments into a computer-assisted translation tool.
The Web Crawler includes filters to help you select the types of pages and files to download. These include the following domain filters:
- Ignore top-level domain (TLD),
- Allow two-letter domain,
- Do not truncate URL prefix.
You can also filter downloaded files by extension or file name character string. When the download is complete.