detex - a filter to strip TeX commands from a .tex file.


detex [ -clnstw ] [ \fB-e environment-list ] [ filename[.tex] ... ]


Detex(Version 2.6) reads each file in sequence, removes all comments and TeXcontrol sequences and writes the remainder on the standard output. All text in math mode and display mode is removed. By default, detexfollows \\input commands. If a file cannot be opened, a warning message is printed and the command is ignored. If the

-n option is used, no \\input or \\include commands will be processed. This allows single file processing. If no input file is given on the command line, detexreads from standard input.

If the magic sequence ``\\begin{document}'' appears in the text, detexassumes it is dealing with LaTeXsource and detexrecognizes additional constructs used in LaTeX . These include the \\include and \\includeonly commands. The

-l option can be used to force LaTeXmode and the

-t option can be used to force TeXmode regardless of input content.

Text in various environment modes of LaTeXis ignored. The default modes are array, eqnarray, equation, figure, mathmatica, picture, table and verbatim. The

-e option can be used to specify a comma separated environment-listof environments to ignore. The list replaces the defaults so specifying an empty list effectively causes no environments to be ignored.


-c option can be used in LaTeXmode to have detex echo the arguments to \\cite, \\ref, and \\pageref macros. This can be useful when sending the output to a style checker.

Detexassumes the standard character classes are being used for TeX . Detexallows white space between control sequences and magic characters like `{' when recognizing things like LaTeXenvironments.

If the

-w flag is given, the output is a word list, one `word' (string of two or more letters and apostrophes beginning with a letter) per line, and all other characters ignored. Without -w the output follows the original, with the deletions mentioned above. Newline characters are preserved where possible so that the lines of output match the input as closely as possible.

The TEXINPUTS environment variable is used to find \\input and \\include files. Like TeX, it interprets a leading or trailing `:' as the default TEXINPUTS. It does not support the `//' directory expansion magic sequence.

Detex now handles the basic TeX ligatures as a special case, replacing the ligatures with acceptable charater substitutes. This eliminates spelling errors introduced by merely removing them. The ligatures are \\aa, \\ae, \\oe, \\ss, \\o, \\l (and their upper-case equivalents). The special "dotless" characters \\i and \\j are also replaced with i and j respectively.

Note that previous versions of detexwould replace control sequences with a space character to prevent words from running together. However, this caused accents in the middle of words to break words, generating "spelling errors" that were not desirable. Therefore, the new version merely removes these accents. The old functionality can be essentially duplicated by using the

-s option.




Nesting of \\input is allowed but the number of opened files must not exceed the system's limit on the number of simultaneously opened files. Detexignores unrecognized option characters after printing a warning message.


Daniel Trinkle, Computer Science Department, Purdue University


Detexis not a complete TeXinterpreter, so it can be confused by some constructs. Most errors result in too much rather than too little output.

Running LaTeX source without a ``\\begin{document}'' through detex may produce errors.

Suggestions for improvements are (mildly) encouraged.