Go to the first, previous, next, last section, table of contents.

4. TeX: Typesetting

TeX is a typesetting system: it was especially designed to handle complex mathematics, as well as most ordinary text typesetting.

TeX is a batch language, like C or Pascal, and not an interactive "word processor": you compile a TeX input file into a corresponding device-independent (DVI) file (and then translate the DVI file to the commands for a particular output device). This approach has both considerable disadvantages and considerable advantages. For a complete description of the TeX language, see The TeXbook (see section B. References). Many other books on TeX, introductory and otherwise, are available.

4.1 tex invocation

TeX (usually invoked as tex) formats the given text and commands, and outputs a corresponding device-independent representation of the typeset document. This section merely describes the options available in the Web2c implementation. For a complete description of the TeX typesetting language, see The TeXbook (see section B. References).

TeX, Metafont, and MetaPost process the command line (described here) and determine their memory dump (fmt) file in the same way (see section 3.5.2 Memory dumps). Synopses:

tex [option]... [texname[.tex]] [tex-commands]
tex [option]... \first-line
tex [option]... &fmt args

TeX searches the usual places for the main input file texname (see section `Supported file formats' in Kpathsea), extending texname with `.tex' if necessary. To see all the relevant paths, set the environment variable KPATHSEA_DEBUG to `-1' before running the program.

After texname is read, TeX processes any remaining tex-commands on the command line as regular TeX input. Also, if the first non-option argument begins with a TeX escape character (usually \), TeX processes all non-option command-line arguments as a line of regular TeX input.

If no arguments or options are specified, TeX prompts for an input file name with `**'.

TeX writes the main DVI output to the file `basetexname.dvi', where basetexname is the basename of texname, or `texput' if no input file was specified. A DVI file is a device-independent binary representation of your TeX document. The idea is that after running TeX, you translate the DVI file using a separate program to the commands for a particular output device, such as a PostScript printer (see section `Introduction' in Dvips) or an X Window System display (see xdvi(1)).

TeX also reads TFM files for any fonts you load in your document with the \font primitive. By default, it runs an external program named `mktextfm' to create any nonexistent TFM files. You can disable this at configure-time or runtime (see section `mktex configuration' in Kpathsea). This is enabled mostly for the sake of the EC fonts, which can be generated at any size.

TeX can write output files, via the \openout primitive; this opens a security hole vulnerable to Trojan horse attack: an unwitting user could run a TeX program that overwrites, say, `~/.rhosts'. (MetaPost has a write primitive with similar implications). To alleviate this, there is a configuration variable openout_any, which selects one of three levels of security. When it is set to `a' (for "any"), no restrictions are imposed. When it is set to `r' (for "restricted"), filenames beginning with `.' are disallowed (except `.tex' because LaTeX needs it). When it is set to `p' (for "paranoid") additional restrictions are imposed: an absolute filename must refer to a file in (a subdirectory) of TEXMFOUTPUT, and any attempt to go up a directory level is forbidden (that is, paths may not contain a `..' component). The paranoid setting is the default. (For backwards compatibility, `y' and `1' are synonyms of `a', while `n' and `0' are synonyms for `r'.)

In any case, all \openout filenames are recorded in the log file, except those opened on the first line of input, which is processed when the log file has not yet been opened. (If you as a TeX administrator wish to implement more stringent rules on \openout, modifying the function openoutnameok in `web2c/lib/texmfmp.c' is intended to suffice.)

The program accepts the following options, as well as the standard `-help' and `-version' (see section 3.2 Common options):

These options are common to TeX, Metafont, and MetaPost. See section 3.2 Common options.
With either option, TeX writes its DVI output to a socket as well as to the usual `.dvi' file. With `-ipc-start', TeX also opens a server program at the other end to read the output. See section 4.6 IPC and TeX. These options are available only if the `--enable-ipc' option was specified to configure during installation of Web2c.
Turn on or off the `mktex' script associated with filetype. The only values that make sense for filetype are `tex' and `tfm',
If INITEX (see section 3.5.1 Initial and virgin), enable MLTeX extensions such as \charsubdef. Implicitly set if the program name is mltex. See section 4.5.1 MLTeX: Multi-lingual TeX.
Use string as the DVI file comment. Ordinarily, this comment records the date and time of the TeX run, but if you are doing regression testing, you may not want the DVI file to have this spurious difference. This is also taken from the environment variable and config file value `output_comment'.
Enable the `\write18{shell-command}' feature. This is also enabled if the environment variable or config file value `shell_escape' is set to `t'. (For backwards compatibility, `y' and `1' are accepted as synonyms of `t'). It is disabled by default to avoid security problems. When enabled, the shell-command string (which first undergoes the usual TeX expansions, just as in `\special') is passed to the command shell (via the C library function `system'). The output of shell-command is not diverted anywhere, so it will not appear in the log file. The system call either happens at `\output' time or right away, according to the absence or presence of the `\immediate' prefix, as usual for \write. (If you as a TeX administrator wish to implement more stringent rules on what can be executed, you will need to modify `tex.ch'.)
This option requires TeX to output specific source information using `\special' commands in the DVI file. These `\special' track the current file name and line number. Using the first form of this option, the `\special' are inserted automatically. In the second form of the option, string is a comma separated list of the following values: `cr', `display', `hbox', `math', `par', `parend', `vbox'. You can use this list to specify where you want TeX to output such commands. By example, `-src-specials=cr,math' will output source information every line and every math formula. These commands can be used with the appropriate DVI viewer and text editor to switch from the current position in the editor to the same position in the viewer and back from the viewer to the editor.

4.2 initex invocation

initex is the "initial" form of TeX, which does lengthy initializations avoided by the "virgin" (vir) form, so as to be capable of dumping `.fmt' files (see section 3.5.2 Memory dumps). For a detailed comparison of virgin and initial forms, see section 3.5.1 Initial and virgin.

For a list of options and other information, see section 4.1 tex invocation.

Unlike Metafont and MetaPost, many format files are commonly used with TeX. The standard one implementing the features described in the TeXbook is `plain.fmt', also known as `tex.fmt' (again, see section 3.5.2 Memory dumps). It is created by default during installation, but you can also do so by hand if necessary (e.g., if an update to `plain.tex' is issued):

initex '\input plain \dump'

(The quotes prevent interpretation of the backslashes from the shell.) Then install the resulting `plain.fmt' in `$(fmtdir)' (`/usr/local/share/texmf/web2c' by default), and link `tex.fmt' to it.

The necessary invocation for generating a format file differs for each format, so instructions that come with the format should explain. The top-level `web2c' Makefile has targets for making most common formats: plain latex amstex texinfo eplain. See section 4.4 Formats, for more details on TeX formats.

4.3 virtex invocation

virtex is the "virgin" form of TeX, which avoids the lengthy initializations done by the "initial" (ini) form, and is thus what is generally used for production work. For a detailed comparison of virgin and initial forms, see section 3.5.1 Initial and virgin.

For a list of options and other information, see section 4.1 tex invocation.

4.4 Formats

TeX formats are large collections of macros, possibly dumped into a `.fmt' file (see section 3.5.2 Memory dumps) by initex (see section 4.2 initex invocation). A number of formats are in reasonably widespread use, and the Web2c Makefile has targets to make the versions current at the time of release. You can change which formats are automatically built by setting the fmts Make variable; by default, only the `plain' and `latex' formats are made.

You can get the latest versions of most of these formats from the CTAN archives in subdirectories of `CTAN:/macros' (for CTAN info, see section `unixtex.ftp' in Kpathsea). The archive ftp://ftp.tug.org/tex/lib.tar.gz (also available from CTAN) contains most of these formats (although perhaps not the absolute latest version), among other things.

The most widely used format. The current release is named `LaTeX 2e'; new versions are released approximately every six months, with patches issued as needed. The old release was called `LaTeX 2.09', and is no longer maintained or supported. LaTeX attempts to provide generic markup instructions, such as "emphasize", instead of specific typesetting instructions, such as "use the 10pt Computer Modern italic font".
The official typesetting system of the American Mathematical Society, used to produce nearly all of its publications, e.g., Mathematical Reviews. Like LaTeX, it encourages generic markup commands. The AMS also provides a LaTeX package for authors who prefer LaTeX (see the `amslatex' item below).
The documentation system developed and maintained by the Free Software Foundation for their software manuals. It can be automatically converted into plain text, a machine-readable on-line format called `info', HTML, etc.
The "expanded plain" format provides various common features (e.g., symbolic cross-referencing, tables of contents, indexing, citations using BibTeX), for those authors who prefer to handle their own high-level formatting.
Augments AMSTeX with LaTeX-like features.
An LaTeX package (see `latex' item above), that augments LaTeX with AMSTeX-like features.
An obsolete LaTeX 2.09 format for making slides. It is replaced by the `slides' document class.

4.5 Languages and hyphenation

TeX supports most natural languages. See also section 4.7 TeX extensions.

4.5.1 MLTeX: Multi-lingual TeX

Multi-lingual TeX (mltex) is an extension of TeX originally written by Michael Ferguson and now updated and maintained by Bernd Raichle. It allows the use of non-existing glyphs in a font by declaring glyph substitutions. These are restricted to substitutions of an accented character glyph, which need not be defined in the current font, by its appropriate \accent construction using a base and accent character glyph, which do have to exist in the current font. This substitution is automatically done behind the scenes, if necessary, and thus MLTeX additionally supports hyphenation of words containing an accented character glyph for fonts missing this glyph (e.g., Computer Modern). Standard TeX suppresses hyphenation in this case.

MLTeX works at `.fmt'-creation time: the basic idea is to specify the `-mltex' option to TeX when you \dump a format. Then, when you subsequently invoke TeX and read that .fmt file, the MLTeX features described below will be enabled.

Generally, you use special macro files to create an MLTeX .fmt file. See:


The sections below describe the two new primitives that MLTeX defines. Aside from these, MLTeX is completely compatible with standard TeX. \charsubdef: Character substitutions

The most important primitive MLTeX adds is \charsubdef, used in a way reminiscent of \chardef:

\charsubdef composite [=] accent base

Each of composite, accent, and base are font glyph numbers, expressed in the usual TeX syntax: `\e symbolically, '145 for octal, "65 for hex, 101 for decimal.

MLTeX's \charsubdef declares how to construct an accented character glyph (not necessarily existing in the current font) using two character glyphs (that do exist). Thus it defines whether a character glyph code, either typed as a single character or using the \char primitive, will be mapped to a font glyph or to an \accent glyph construction.

For example, if you assume glyph code 138 (decimal) for an e-circumflex and you are using the Computer Modern fonts, which have the circumflex accent in position 18 and lowercase `e' in the usual ASCII position 101 decimal, you would use \charsubdef as follows:

\charsubdef 138 = 18 101

For the plain TeX format to make use of this substitution, you have to redefine the circumflex accent macro \^ in such a way that if its argument is character `e' the expansion \char138 is used instead of \accent18 e. Similar \charsubdef declaration and macro redefinitions have to be done for all other accented characters.

To disable a previous \charsubdef c, redefine c as a pair of zeros. For example:

\charsubdef '321 = 0 0  % disable N tilde

(Octal '321 is the ISO Latin-1 value for the Spanish N tilde.)

\charsubdef commands should only be given once. Although in principle you can use \charsubdef at any time, the result is unspecified. If \charsubdef declarations are changed, usually either incorrect character dimensions will be used or MLTeX will output missing character warnings. (The substitution of a \charsubdef is used by TeX when appending the character node to the current horizontal list, to compute the width of a horizontal box when the box gets packed, and when building the \accent construction at \shipout-time. In summary, the substitution is accessed often, so changing it is not desirable, nor generally useful.) \tracingcharsubdef: Substitution diagnostics

To help diagnose problems with `\charsubdef', MLTeX provides a new primitive parameter, \tracingcharsubdef. If positive, every use of \charsubdef will be reported. This can help track down when a character is redefined.

In addition, if the TeX parameter \tracinglostchars is 100 or more, the character substitutions actually performed at \shipout-time will be recorded.

4.5.2 TCX files: Character translations

TCX (TeX character translation) files help TeX support direct input of 8-bit international characters if fonts containing those characters are being used. Specifically, they map an input (keyboard) character code to the internal TeX character code (a superset of ASCII).

Of the various proposals for handling more than one input encoding, TCX files were chosen because they follow Knuth's original ideas for the use of the `xhcr' and `xord' tables. He ventured that these would be changed in the WEB source in order to adjust the actual version to a given environment. It turned out, however, that recompiling the WEB sources is not as simple task as Knuth predicted; therefore, TCX files, providing the possibility of changing of the conversion tables on on-the-fly, has been implemented instead.

This approach limits the portability of TeX documents, as some implementations do not support it (or use a different method for input-internal reencoding). It may also be problematic to determine the encoding to use for a TeX document of unknown provenance; in the worst case, failure to do so correctly may result in subtle errors in the typeset output.

While TCX files can be used with any format, using them breaks the LaTeX `inputenc' package. This is why you should either use tcxfile or `inputenc' in LaTeX files, but never both.

This is entirely independent of the MLTeX extension (see section 4.5.1 MLTeX: Multi-lingual TeX): whereas a TCX file defines how an input keyboard character is mapped to TeX's internal code, MLTeX defines substitutions for a non-existing character glyph in a font with a \accent construction made out of two separate character glyphs. TCX files involve no new primitives; it is not possible to specify that an input (keyboard) character maps to more than one character.

Specifying TCX files:

The Web2c distribution comes with at least two TCX files, `il1-t1.tcx' and `il2-t1.tcx'. These support ISO Latin 1 and ISO Latin 2, respectively, with Cork-encoded fonts (a.k.a. the T1 encoding). TCX files for Czech, Polish, and Slovak are also provided.

Syntax of TCX files:

  1. Line-oriented. Blank lines are ignored.
  2. Whitespace is ignored except as a separator.
  3. Comments start with `%' and continue to the end of the line.
  4. Otherwise, a line consists of one or two character codes:
    src [dest]
  5. Each character code may be specified in octal with a leading `0', hexadecimal with a leading `0x', or decimal otherwise. Values must be between 0 and 255, inclusive (decimal).
  6. If the dest code is not specified, it is taken to be the same as src.
  7. If the same src code is specified more than once, it is the last definition that counts.

Finally, here's what happens: when TeX sees an input character with code src, it 1) changes src to dest; and 2) makes code the dest "printable", i.e., printed as-is in diagnostics and the log file instead of in `^^' notation.

By default, no characters are translated, and character codes between 32 and 126 inclusive (decimal) are printable. It is not possible to make these (or any) characters unprintable.

Specifying translations for the printable ASCII characters (codes 32--127) will yield unpredictable results. Additionally you shouldn't make the following characters printable: ^^I (TAB), ^^J (line feed), ^^M (carriage return), and ^^? (delete), since TeX uses them in various ways.

Thus, the idea is to specify the input (keyboard) character code for src, and the output (font) character code for dest.

4.5.3 Patgen: Creating hyphenation patterns

Patgen creates hyphenation patterns from dictionary files for use with TeX. Synopsis:

patgen dictionary patterns output translate

Each argument is a filename. No path searching is done. The output is written to the file output.

In addition, Patgen prompts interactively for other values.

For more information, see Word hy-phen-a-tion by com-puter by Frank Liang (see section B. References), and also the `patgen.web' source file.

The only options are `-help' and `-version' (see section 3.2 Common options).

4.6 IPC and TeX

(Sorry, but I'm not going to write this unless someone actually uses this feature. Let me know.)

This functionality is available only if the `--enable-ipc' option was specified to configure during installation of Web2c (see section 2. Installation).

If you define IPC_DEBUG before compilation (e.g., with `make XCFLAGS=-DIPC_DEBUG'), TeX will print messages to standard error about its socket operations. This may be helpful if you are, well, debugging.

4.7 TeX extensions

The base TeX program has been extended in many ways. Here's a partial list. Please send information on extensions not listed here to the address in section `Reporting bugs' in Kpathsea.

Adds many new primitives, including right-to-left typesetting. Available from http://www.vms.rhbnc.ac.uk/e-TeX/ and `CTAN:/systems/e-tex'.
Adds Unicode support, right-to-left typesetting, and more. Available from http://www.ens.fr/omega and `CTAN:/systems/omega'.
A variant of TeX that produces PDF instead of DVI files. It also includes primitives for hypertext. Available from `CTAN:/systems/pdftex'.
Adds primitives and DVI opcodes for right-to-left typesetting (as used in Arabic, for example). An old version for TeX 3.1415 is available from `CTAN:/systems/knuth/tex--xet'. A newer version is included in e-TeX.
File-handling TeX
Adds primitives for creating multiple DVI files in a single run; and appending to output files as well as overwriting. Web2c implementation available in the distribution file `web2c/contrib/file-handling-tex'.

Go to the first, previous, next, last section, table of contents.