This chapter describes the generic path searching mechanism Kpathsea provides. For information about searching for particular file types (e.g., TeX fonts), see the next chapter.
A search path is a colon-separated list of path elements, which are directory names with a few extra frills. A search path can come from (a combination of) many sources; see below. To look up a file `foo' along a path `.:/dir', Kpathsea checks each element of the path in turn: first `./foo', then `/dir/foo', returning the first match (or possibly all matches).
The "colon" and "slash" mentioned here aren't necessarily `:' and `/' on non-Unix systems. Kpathsea tries to adapt to other operating systems' conventions.
To check a particular path element e, Kpathsea first sees if a
prebuilt database (see section 3.4 Filename database (ls-R
)) applies to e, i.e.,
if the database is in a directory that is a prefix of e. If so,
the path specification is matched against the contents of the database.
If the database does not exist, or does not apply to this path element, or contains no matches, the filesystem is searched (if this was not forbidden by the specification with `!!' and if the file being searched for must exist). Kpathsea constructs the list of directories that correspond to this path element, and then checks in each for the file being searched for. (To help speed future lookups of files in the same directory, the directory in which a file is found is floated to the top of the directory list.)
The "file must exist" condition comes into play with VF files and input files read by the TeX `\openin' command. These files may not exist (consider `cmr10.vf'), and so it would be wrong to search the disk for them. Therefore, if you fail to update `ls-R' when you install a new VF file, it will never be found.
Each path element is checked in turn: first the database, then the disk. If a match is found, the search stops and the result is returned. This avoids possibly-expensive processing of path specifications that are never needed on a particular run. (Unless the search explicitly requested all matches.)
Although the simplest and most common path element is a directory name, Kpathsea supports additional features in search paths: layered default values, environment variable names, config file values, users' home directories, and recursive subdirectory searching. Thus, we say that Kpathsea expands a path element, meaning transforming all the magic specifications into the basic directory name or names. This process is described in the sections below. It happens in the same order as the sections.
Exception to all of the above: If the filename being searched for is absolute or explicitly relative, i.e., starts with `/' or `./' or `../', Kpathsea simply checks if that file exists.
Ordinarily, if Kpathsea tries to access a file or directory that cannot
be read, it gives a warning. This is so you will be alerted to
directories or files that accidentally lack read permission (for
example, a `lost+found'). If you prefer not to see these warnings,
include the value `readable' in the TEX_HUSH
environment
variable or config file value.
This generic path searching algorithm is implemented in `kpathsea/pathsearch.c'. It is employed by a higher-level algorithm when searching for a file of a particular type (see section 4.2 File lookup, and section 4.3 Glyph lookup).
A search path can come from many sources. In the order in which Kpathsea uses them:
TEXINPUTS
.
Environment variables with an underscore and the program name appended
override; for example, TEXINPUTS_latex
overrides TEXINPUTS
if the program being run is named `latex'.
You can see each of these values for a given search path by using the debugging options (see section 2.6.3 Debugging).
These sources may be combined via default expansion (see section 3.3.1 Default expansion).
As mentioned above, Kpathsea reads runtime configuration files
named `texmf.cnf' for search path and other definitions. The
search path used to look for these configuration files is named
TEXMFCNF
, and is constructed in the usual way, as described
above, except that configuration files cannot be used to define the
path, naturally; also, an `ls-R' database is not used to search for
them.
Kpathsea reads all `texmf.cnf' files in the search path, not just the first one found; definitions in earlier files override those in later files. Thus, if the search path is `.:$TEXMF', values from `./texmf.cnf' override those from `$TEXMF/texmf.cnf'.
While (or instead of) reading this description, you may find it helpful to look at the distributed `texmf.cnf', which uses or at least mentions most features. The format of `texmf.cnf' files follows:
variable [. progname] [=] valuewhere the `=' and surrounding whitespace is optional.
argv[0]
is) progname or
`progname.exe'. This allows different flavors of TeX to
have different search paths, for example.
CFLAGS
.)
Here is a configuration file fragment illustrating most of these points:
% TeX input files -- i.e., anything to be found by \input or \openin ... latex209_inputs = .:$TEXMF/tex/latex209//:$TEXMF/tex// latex2e_inputs = .:$TEXMF/tex/latex//:$TEXMF/tex// TEXINPUTS = .:$TEXMF/tex// TEXINPUTS.latex209 = $latex209_inputs TEXINPUTS.latex2e = $latex2e_inputs TEXINPUTS.latex = $latex2e_inputs
Although this format has obvious similarities to Bourne shell
scripts--change the comment character to #
, disallow spaces
around the =
, and get rid of the .name
convention,
and it could be run through the shell. But there seemed little
advantage to doing this, since all the information would have to passed
back to Kpathsea and parsed there anyway, since the sh
process
couldn't affect its parent's environment.
The implementation of all this is in `kpathsea/cnf.c'.
Kpathsea recognizes certain special characters and constructions in search paths, similar to that in shells. As a general example: `~$USER/{foo,bar}//baz' expands to all subdirectories under directories `foo' and `bar' in $USER's home directory that contain a directory or file `baz'. These expansions are explained in the sections below.
If the highest-priority search path (see section 3.2 Path sources) contains an extra colon (i.e., leading, trailing, or doubled), Kpathsea inserts at that point the next-highest-priority search path that is defined. If that inserted path has an extra colon, the same happens with the next-highest. (An extra colon in the compile-time default value has unpredictable results, so installers beware.)
For example, given an environment variable setting
setenv TEXINPUTS /home/karl:
and a TEXINPUTS
value from `texmf.cnf' of
.:$TEXMF//tex
then the final value used for searching will be:
/home/karl:.:$TEXMF//tex
Since Kpathsea looks for multiple configuration files, it would be natural to expect that (for example) an extra colon in `./texmf.cnf' would expand to the path in `$TEXMF/texmf.cnf'. Or, with Dvips' configuration files, that an extra colon in `config.$PRINTER' would expand to the path in `config.ps'. This doesn't happen. It's not clear this would be desirable in all cases, and trying to devise a way to specify the path to which the extra colon should expand seemed truly baroque.
Technicality: Since it would be useless to insert the default value in more than one place, Kpathsea changes only one extra `:' and leaves any others in place (they will eventually be ignored). Kpathsea checks first for a leading `:', then a trailing `:', then a doubled `:'.
You can trace this by debugging "paths" (see section 2.6.3 Debugging). Default expansion is implemented in the source file `kpathsea/kdefault.c'.
`$foo' or `${foo}' in a path element is replaced by (1) the value of an environment variable `foo' (if defined); (2) the value of `foo' from `texmf.cnf' (if defined); (3) the empty string.
If the character after the `$' is alphanumeric or `_', the variable name consists of all consecutive such characters. If the character after the `$' is a `{', the variable name consists of everything up to the next `}' (braces may not be nested around variable names). Otherwise, Kpathsea gives a warning and ignores the `$' and its following character.
You must quote the $'s and braces as necessary for your shell.
Shell variable values cannot be seen by Kpathsea, i.e., ones
defined by set
in C shells and without export
in Bourne
shells.
For example, given
setenv tex /home/texmf setenv TEXINPUTS .:$tex:${tex}prev
the final TEXINPUTS
path is the three directories:
.:/home/texmf:/home/texmfprev
The `.progname' suffix on variables and `_progname' on environment variable names are not implemented for general variable expansions. These are only recognized when search paths are initialized (see section 3.2 Path sources).
Variable expansion is implemented in the source file `kpathsea/variable.c'.
A leading `~' in a path element is replaced by the value of the
environment variable HOME
, or `.' if HOME
is not set.
A leading `~user' in a path element is replaced by user's home directory from the system `passwd' database.
For example,
setenv TEXINPUTS ~/mymacros:
will prepend a directory `mymacros' in your home directory to the default path.
As a special case, if a home directory ends in `/', the trailing slash is dropped, to avoid inadvertently creating a `//' construct in the path. For example, if the home directory of the user `root' is `/', the path element `~root/mymacros' expands to just `/mymacros', not `//mymacros'.
Tilde expansion is implemented in the source file `kpathsea/tilde.c'.
`x{a,b}y' expands to `xay:xby'. For example:
foo/{1,2}/baz
expands to `foo/1/baz:foo/2/baz'. `:' is the path separator on the current system; e.g., on a DOS system, it's `;'.
Braces can be nested; for example, `x{A,B{1,2}}y' expands to `xAy:xB1y:xB2y'.
Multiple non-nested braces are expanded from right to left; for example, `x{A,B}{1,2}y' expands to `x{A,B}1y:x{A,B}2y', which expands to `xA1y:xB1y:xA2y:xB2y'.
This feature can be used to implement multiple TeX hierarchies, by
assigning a brace list to $TEXMF
, as mentioned in
`texmf.in'.
You can also use the path separator in stead of the comma. The last example could have been written `x{A:B}{1:2}y'.
Brace expansion is implemented in the source file `kpathsea/expand.c'. It is a modification of the Bash sources, and is thus covered by the GNU General Public License, rather than the Library General Public License that covers the rest of Kpathsea.
KPSE_DOT
expansion
When KPSE_DOT
is defined in the environment, it names a directory
that should be considered the current directory for the purpose of
looking up files in the search paths. This feature is needed by the
`mktex...' scripts section 2.2.9 `mktex' scripts, because these
change the working directory. You should not ever define it yourself.
Two or more consecutive slashes in a path element following a directory d is replaced by all subdirectories of d: first those subdirectories directly under d, then the subsubdirectories under those, and so on. At each level, the order in which the directories are searched is unspecified. (It's "directory order", and definitely not alphabetical.)
If you specify any filename components after the `//', only subdirectories which match those components are included. For example, `/a//b' would expand into directories `/a/1/b', `/a/2/b', `/a/1/1/b', and so on, but not `/a/b/c' or `/a/1'.
You can include multiple `//' constructs in the path.
`//' at the beginning of a path is ignored; you didn't really want to search every directory on the system, did you?
I should mention one related implementation trick, which I took from GNU find. Matthew Farwell suggested it, and David MacKenzie implemented it.
The trick is that in every real Unix implementation (as opposed to the
POSIX specification), a directory which contains no subdirectories will
have exactly two links (namely, one for `.' and one for `..').
That is to say, the st_nlink
field in the `stat' structure
will be two. Thus, we don't have to stat everything in the bottom-level
(leaf) directories--we can just check st_nlink
, notice it's two,
and do no more work.
But if you have a directory that contains a single subdirectory and 500
regular files, st_nlink
will be 3, and Kpathsea has to stat every
one of those 501 entries. Therein lies slowness.
You can disable the trick by undefining UNIX_ST_LINK
in
`kpathsea/config.h'. (It is undefined by default except under Unix.)
Unfortunately, in some cases files in leaf directories are
stat
'd: if the path specification is, say,
`$TEXMF/fonts//pk//', then files in a subdirectory
`.../pk', even if it is a leaf, are checked. The reason cannot
be explained without reference to the implementation, so read
`kpathsea/elt-dirs.c' (search for `may descend') if you are
curious. And if you can find a way to solve the problem, please
let me know.
Subdirectory expansion is implemented in the source file `kpathsea/elt-dirs.c'.
ls-R
)Kpathsea goes to some lengths to minimize disk accesses for searches (see section 3.3.6 Subdirectory expansion). Nevertheless, at installations with enough directories, searching each possible directory for a given file can take an excessively long time (depending on the speed of the disk, whether it's NFS-mounted, how patient you are, etc.).
In practice, a font tree containing the standard PostScript and PCL fonts is large enough for searching to be noticeably slow on typical systems these days. Therefore, Kpathsea can use an externally-built "database" file named `ls-R' that maps files to directories, thus avoiding the need to exhaustively search the disk.
A second database file `aliases' allows you to give additional names to the files listed in `ls-R'. This can be helpful to adapt to "8.3" filename conventions in source files.
The `ls-R' and `aliases' features are implemented in the source file `kpathsea/db.c'.
As mentioned above, you must name the main filename database
`ls-R'. You can put one at the root of each TeX installation
hierarchy you wish to search ($TEXMF
by default); most sites have
only one hierarchy. Kpathsea looks for `ls-R' files along the
TEXMFDBS
path, so that should presumably match the list of
hierarchies.
The recommended way to create and maintain `ls-R' is to run the
mktexlsr
script, which is installed in `$(bindir)'
(`/usr/local/bin' by default). That script goes to some trouble to
follow symbolic links as necessary, etc. It's also invoked by the
distributed `mktex...' scripts.
At its simplest, though, you can build `ls-R' with the command
cd /your/texmf/root && ls -LAR ./ >ls-R
presuming your ls
produces the right output format (see the
section below). GNU ls
, for example, outputs in this format.
Also presuming your ls
hasn't been aliased in a system file
(e.g., `/etc/profile') to something problematic, e.g., `ls
--color=tty'. In that case, you will have to disable the alias before
generating `ls-R'. For the precise definition of the file format,
see section 3.4.3 Database format.
Regardless of whether you use the supplied script or your own, you will
almost certainly want to invoke it via cron
, so when you make
changes in the installed files (say if you install a new LaTeX
package), `ls-R' will be automatically updated.
The `-A' option to ls
includes files beginning with `.'
(except for `.' and `..'), such as the file `.tex'
included with the LaTeX tools package. (On the other hand,
directories whose names begin with `.' are always ignored.)
If your system does not support symbolic links, omit the `-L'.
ls -LAR /your/texmf/root
will also work. But using
`./' avoids embedding absolute pathnames, so the hierarchy can be
easily transported. It also avoids possible trouble with automounters
or other network filesystem conventions.
Kpathsea warns you if it finds an `ls-R' file, but the file does not contain any usable entries. The usual culprit is running plain `ls -R' instead of `ls -LR ./' or `ls -R /your/texmf/root'. Another possibility is some system directory name starting with a `.' (perhaps if you are using AFS); Kpathsea ignores everything under such directories.
Because the database may be out-of-date for a particular run, if a file
is not found in the database, by default Kpathsea goes ahead and
searches the disk. If a particular path element begins with `!!',
however, only the database will be searched for that element,
never the disk. If the database does not exist, nothing will be
searched. Because this can surprise users ("I see the font
`foo.tfm' when I do an ls
; why can't Dvips find it?"), it
is not in any of the default search paths.
In some circumstances, you may wish to find a file under several names. For example, suppose a TeX document was created using a DOS system and tries to read `longtabl.sty'. But now it's being run on a Unix system, and the file has its original name, `longtable.sty'. The file won't be found. You need to give the actual file `longtable.sty' an alias `longtabl.sty'.
You can handle this by creating a file `aliases' as a companion to the `ls-R' for the hierarchy containing the file in question. (You must have an `ls-R' for the alias feature to work.)
The format of `aliases' is simple: two whitespace-separated words per line; the first is the real name `longtable.sty', and second is the alias (`longtabl.sty'). These must be base filenames, with no directory components. `longtable.sty' must be in the sibling `ls-R'.
Also, blank lines and lines starting with `%' or `#' are ignored in `aliases', to allow for comments.
If a real file `longtabl.sty' exists, it is used regardless of any aliases.
The "database" read by Kpathsea is a line-oriented file of plain
text. The format is that generated by GNU (and most other) ls
programs given the `-R' option, as follows.
For example, here's the first few lines of `ls-R' (which totals about 30K bytes) on my system:
bibtex dvips fonts ls-R metafont metapost tex web2c ./bibtex: bib bst doc ./bibtex/bib: asi.bib btxdoc.bib ...
kpsewhich
: Standalone path searching
The Kpsewhich program exercises the path searching functionality
independent of any particular application. This can also be useful as a
sort of find
program to locate files in your TeX hierarchies,
perhaps in administrative scripts. It is used heavily in the
distributed `mktex...' scripts.
Synopsis:
kpsewhich option... filename...
The options and filename(s) to look up can be intermixed. Options can start with either `-' or `--', and any unambiguous abbreviation is accepted.
Kpsewhich looks up each non-option argument on the command line as a filename, and returns the first file found. There is no option to return all the files with a particular name (you can run the Unix `find' utility for that, see section `Invoking find' in GNU find utilities).
Various options alter the path searching behavior:
gf: gf pk: pk bitmap font afm: .afm base: .base bib: .bib bst: .bst cnf: .cnf ls-R: ls-R fmt: .fmt map: .map mem: .mem mf: .mf mfpool: .pool mft: .mft mp: .mp mppool: .pool MetaPost support ocp: .ocp ofm: .ofm .tfm opl: .opl otp: .otp ovf: .ovf ovp: .ovp graphic/figure: .eps .epsi tex: .tex TeX system documentation texpool: .pool TeX system sources PostScript header/font: .pro Troff fonts tfm: .tfm type1 fonts: .pfa .pfb vf: .vf dvips config ist: .ist truetype fonts: .ttf .ttc type42 fonts web2c files other text files other binary filesThis option and `--path' are mutually exclusive.
Kpsewhich provides some additional features not strictly related to path lookup:
Kpsewhich accepts the standard GNU options:
Go to the first, previous, next, last section, table of contents.