patgen - generate patterns for TeX hyphenation


patgen dictionary_file pattern_file patout_file translate_file


This manual page is not meant to be exhaustive. See also the Info file or manual "Web2C: A TeX implementation" .

The patgenprogram reads the dictionary_filecontaining a list of hyphenated words and the pattern_filecontaining previously-generated patterns (if any) for a particular language (not a complete TeX source file; see below), and produces the patout_filewith (previously- plus newly-generated) hyphenation patterns for that language. The translate_filedefines language specific values for the parameters left_hyphen_min " and " right_hyphen_min used by \*(TX's hyphenation algorithm and the external representation of the lower and upper case version(s) of all \`letters' of that language. Further details of the pattern generation process such as hyphenation levels and pattern lengths are requested interactively from the user's terminal. Optionally patgencreates a new dictionary file pattmp. n showing the good and bad hyphens found by the generated patterns, where nis the highest hyphenation level.

The patterns generated by patgencan be read by

initex for use in hyphenating words. For a real-life example of patgen 's output, see $TEXMFMAIN/tex/generic/hyphen/hyphen.tex , which contains the patterns \*(TX uses for English by default. At some sites, patterns for (many) other languages may be available, and the local

tex programs may have them preloaded.

All filenames must be complete; no adding of default extensions or path searching is done.




initex digests hyphenation patterns, \*(TX first expands macros and the result must entirely consist of digits (hyphenation levels), dots (\`.', edge of a word), and letters. In pattern files for non-English languages letters are often represented by macros or other expandable constructs. For the purpose of patgenthese are just character sequences, subject to the condition that no such sequence is a prefix of another one.

Dictionary file

    A dictionary file contains a weighted list of hyphenated words, one word per line starting in column 1. A digit in column 1 indicates a global word weight (initially =1) applicable to all following words up to the next global word weight. A digit at some intercharacter position indicates a weight for that position only. The hyphens in a word are indicated by \`-', \`*', or \`.' (or their replacements as defined in the translate file) for hyphens yet to be found, \`good' hyphens (correctly found by the patterns), and \`bad' hyphens (erroneously found by the patterns) respectively; when reading a dictionary file \`*' is treated like \`-' and \`.' is ignored.

Pattern file

    A pattern file contains only patterns in the format above, e.g., from a previous run of patgen. It may not contain any \*(TX comments or control sequences. For instance, this is not a valid pattern file:

% this is a pattern file read by TeX.

It can only contain the actual patterns, i.e., the `...'.

Translate file

    A translate file starts with a line containing the values of left_hyphen_minin columns 1-2, right_hyphen_minin columns 3-4, and either a blank or the replacement for one of the "hyphen" characters \`-', \`*', and \`.' in columns 5, 6, and 7. (Input lines are padded with blanks as for many \*(TX related programs.) Each following line defines one \`letter': an arbitrary delimiter character in column 1, followed by one or more external representations of that character (first the \`lower' case one used for output), each one terminated by the delimiter and the whole sequence terminated by another delimiter. If the translate file is empty, the values left_hyphen_min "=2, " right_hyphen_min "=3," and the 26 lower case letters a ... z with their upper case representations A ... Z are assumed.

Terminal input

    After reading the translate_fileand any previously-generated patterns from pattern_file, patgenrequests input from the user's terminal. First the integer values of hyph_start " and " hyph_finish , the lowest and highest hyphenation level for which patterns are to be generated. The value of hyph_startshould be larger than any hyphenation level already present in pattern_file . Then, for each hyphenation level, the integer values of pat_start " and " pat_finish , the smallest and largest pattern length to be analyzed, as well as "good weight" ", " "bad weight" ", and " threshold , the weights for good and bad hyphens and a weight threshold for useful patterns. Finally the decision (\`y' or \`Y' vs. anything else) whether or not to produce a hyphenated word list.


  • $TEXMFMAIN/tex/generic/hyphen/hyphen.tex
        The original hyphenation patterns for English, by Donald Knuth and Frank Liang.
  • $TEXMFMAIN/tex/generic/hyphen/ushyphmax.tex
        Maximal hyphenation patterns for English, extended by Gerard Kuiken.
  • DEBUTINLINEhttp://www.ctan.org/tex-archive/language/ FININLINE
        Patterns and support for many other languages


Frank Liang and Peter Breitenlohner, patgen.web.

Frank Liang, "Word hy-phen-a-tion by com-puter" , STAN-CS-83-977, Stanford University Ph.D. thesis, 1983, http://tug.org/docs/liang.

Donald E. Knuth, "The \*(OXbook" , Addison-Wesley, 1986, ISBN 0-201-13447-0, Appendix H.


Frank Liang wrote the first version of this program. Peter Breitenlohner made a substantial revision in 1991 for \*(TX 3. The first version was published as the appendix to the \*(OXwaretechnical report. Howard Trickey originally ported it to Unix.