You can use the wrapper program makeindex4
. It tries to do it's
very best to make xindy behave as makeindex
does. But if you
ever need to use some of the features of xindy you should learn
how to write an index style. Its easy!
Congratulations! You have made a good decision, indeed.
For a first impression, how a style file can be written from scratch reference the tutorial that comes with this distribution. It is written as a guided step-by-step practicing exercise and you can learn the basic concepts quite easily.
Afterwards, the best starting point is to make a copy of the template
file that contains all the necessary commands that are needed to make
a makeindex
-like index. You can find it in the subdirectory
markup/tex
of the module library. Starting from this template
you can remove or add commands as necessary.
Additionally, consult the library of predefined index style modules that comes with this distribution. Solutions for most of the typical problems can be found there, such as a module for doing case-insensitive sorting rules, or a typical TeX-like markup. Most of the time is is enough to include some of these modules and add a few additional commands.
Maybe some of the examples coming with the test-suite are good examples of how unusual index style files can be written.
Thus, there are many ways to learn writing an index style file. But it is very easy and after some experience you can process indexes your friends will be jealous of.
makeindex4
?Process your document as usual. Then run makeindex4
on the index
file. It produces an index that should equal the one you would get
with an ordinary makeindex
run. As far as you are satisfied with
the default behaviour of makeindex
, makeindex4
will produce
comparable results.
Some of the command-line options of makeindex
are accepted by
makeindex4
, others aren't. This may change in future releases,
but we recommend using plain xindy after a phase of
investigation, since one cannot use all of its features with
makeindex4
. You will be informed about unsupported command-line
arguments when running makeindex4
.
If you have written special style files for makeindex
they will
no longer work with makeindex4
. Go ahead and write a new style
file for xindy.
makeindex
works fine!With the International MakeIndex project, Joachim Schrod and
Gabor Herr [3, 6] have shown that adding extensions to
makeindex
is a difficult job. Thus we have decided to develop a
new indexing tool from scratch. The new tool is based on a new
requirements analysis and offers very interesting features for
processing very complex indexing schemes. The resulting index model is
described in [5].
makeindex
wizard. What does makeindex
doesn't?Here are the most important differences between xindy and
makeindex
:
(sort-rule 'ä' 'ae')
defining that a word containing the umlaut-a will be sorted as if it
contained the letters ae
instead. This is one form of how the
umlaut-a is sorted into german indexes. With an appropriate set of
rules one can express the complete rules of a specific language.
makeindex
is able to recognize and
process arabic numbers, roman numerals and letter-based alphabets as
specifiers for the indexed location. Simple composite structures of
these are also possible to process. This implicit recognition scheme
has completely been dropped in favour of a well-defined and very
powerful declaration scheme called location-classes. Thus,
xindy initally does not know any location-class by default and
must be instructed to accept certain location-classes. A typical
declaration might look like:
(define-location-class "page-numbers" ("arabic-numbers"))
This declares that page numbers consist of the enumeration of the arabic numbers. The arabic numbers are referred to as alphabets. Users may use the pre-defined alphabets arabic numbers, roman numerals, etc. or define new alphabets as needed. See the tutorial that comes with this distribution for some examples.
With makeindex
one can assign a
markup to each index entry using the encapsulators (usually following
the vertical bar sign in an index entry command). For example in the
specification
\index{xindy|bold}
the encapsulator is bold
which encapsulates the page-numbers in
the markup-phase. An additional TeX-macro must be supplied to assign
some markup with the page number. This concept has completely been
dropped in xindy in favour of a more powerful scheme called
attributes. Attributes can be used to (i) define several grouping
and ordering rules with locations and we can define (ii) markup-tags
for the document preparation system.
The result of this design decision is that the user is required to
define the attributes in the style file and not in the document
preparation system. The reasons lie in the more powerful markup scheme
of xindy which can't be specified in the document processor anymore.
In fact, the makeindex
-like markup is only a small subset of
xindys features.
Cross-references were implemented in
makeindex
with the encapsulation mechanism, which only served for
markup purposes. This has been completely separated in xindy.
Here we distinguish cleanly between attributes and cross references.
This makes it possible to implement checked cross references,
i.e. cross-refernces that can be checked if they point to an existing
index entry instead of somewhere ``behind-the-moon''.
xindy has dropped the usage of
command-line options in favour of a well-defined indexstyle
description language. Thus, options that could be activated at the
command-line level of makeindex
must now be specified in the
indexstyle file. This sounds more restrictive than it is, because the
indexstyle files can be composed from several modules which makes it
possible to write style files in just a few lines of code.
The parser built into makeindex
has
completely been separated from the core xindy system. xindy
understands a well-defined specification language for the raw index
that is completely different from makeindex
, but in our opinion
more maintainable than the makeindex
format. This requires a
separate filter that transforms arbitrary indexes to the xindy
format. An example filter is the program tex2xindy
that comes
with this distribution.
Summed up, some of the implicit assumptions made by makeindex
have been replaced and now burdend to the user. The reason is that
many of makeindex
's assumptions were no longer valid in
multi-language environments or documents with arbitrary location
structures. This also characterizes xindy more as a framework
instead of a end-user-tailored product. One should notice that writing
an appropriate index style is an essential part of the document
preparation process and should be tailored to each document anew.
tex2xindy
?tex2xindy
is a filter that parses `.idx
' or similar files
and converts the
\indexentry
macros into a form readable by
xindy.
The parser of makeindex
can be configured to recognize different
quoting characters, etc. (see the man-page for makeindex
, section
input style specifiers for further details). We have tried to
extract the parser from makeindex
but due to several probems we
have finally rewritten the parser using lex
. Scanners written
with lex
are usually fixed to a specific character set used in
the regular expressions. Our parser, tex2xindy
is therefore not
configurable. If one uses a different configuration of the
makeindex
input style specifiers, one can change the source
(tex2xindy.l
) to generate a completely new parser. From our
personal experience we have rarely used more than two different
parsers in practice so we have written tex2xindy
in a form that
is easily maintainable. The input specifiers are stored symbolically
in the source. The definiton section looks like this:
KEYWORD \\indexentry
ENCAP \|
ACTUAL @
ESCAPE \\
LEVEL !
QUOTE \"
ROPEN \(
RCLOSE \)
ARGOPEN \{
ARGCLOSE \}
These definitions are essentially the input style specifiers as can be
found in the man-page of makeindex
. Changing this section
according to your needs and recompiling tex2xindy
should be an
easy task. Maybe we will include more pre-defined parsers in future
releases if necessary.
Copy the file tex/makeidx.xdy
from the library to your
local directory. It is documented in in a way that should make it easy
to fill in new commands or remove or modify others.
makeindex4
?This job is now done automatically by makeindex4
. It calls
tex2xindy
to transform the raw-index into the format suitable for
xindy. tex2xindy
emits some information about the attributes
(aka. encapsulators in makeindex
) and the usage of
cross-references into a file, which has the extension `.sta
'. The
makeindex4
program, written in perl
, parses this
statistics-file and generates the above presented indexstyle commands
for you automatically including the required declaration of all
attributes in the whole index and their markup.
Another problem is the automatic detection of cross-references. As
noted above, makeindex
handles cross-references with its
encapsulation mechanism, a scheme which has been dropped in xindy
and replaced by a more powerful mechanism. To implement a simple
plug-in mechanism we have extended the syntax of the tex2xindy
filter to identify encapsulators of the form
\indexentry{...|encap{...}}{...}
as a cross-reference, whereas encapsulators of the form
\indexentry{...|encap}{...}
are treated as ordinary attributes. This is standard practice
defining cross-references in makeindex
. Thus, tex2xindy
distinguishes these two forms of encapsulators as opposed to
makeindex
and our plug-in makeindex4
generates the
appropriate definitions of the cross-reference classes as well.
makeindex
with The treatment of the actual key (usually denoted with @
, the
at-sign) has changed with xindy. Specifying index entries with a
specific markup can be done in makeindex
with the actual key. The
makeindex-3
system and xindy offer the merge- and
sort-rules to transform a key into different representations,
limiting the need to specify an actual key. For example they support a
style of writing
\index{\bf{VIP}}
which can be transformed with a rule like
(merge-rule "\bf{\(.*\)}" "\1" :again :bregexp)
which removes the macro definition for merging and sorting keywords, but keeping the original definition for markup purposes. Therefore we don't need any actual keys for all keywords written in boldface.
The makeindex
behaviour, that the two keywords
\index{VIP}
\index{VIP@\bf{VIP}}
are seen as two distinct index entries, can be simulated using the following definition:
(merge-rule "\bf{\(.*\)}" "\1~e" :again :bregexp)
This rule tells xindy to remove the boldface macro for merging
and sorting purposes but defines the replacement to include the
special character
~e
, which is the last character in the
alphabet (ISO-Latin in our case). This makes xindy treat them as
different keywords and positions the boldface keyword right behind the
one without any markup. Thus we receive the following mapping:
Keyword: Merged and sorted as: Appears in the index as:
VIP VIP VIP
\bf{VIP} VIP~e \bf{VIP}
With this new style of writing keywords and defining their markup, the
need to explicitly specifying the print key (aka. actual key) has
convinced us to remove the makeindex
way of defining keywords.
What makes makeindex
hardly usable in non-English speaking
countries is its lack of support of language specific alphabets and
sort orderings. For example, many roman languages such as Italian,
French, Portuguese or Spanish contain accented letters such as
À, Á, ñ. Other languages from northern Europe
have letters like Ä, Ø, æ or ß which often
can't even be processed by many index processors let alone sorting
them correctly into an index.
Two problems must be solved when processing indexes with a new languages:
The xindy system can be configured to process these alphabets by defining sort and merge rules that allow expressing language specific rules.
The keyword mappings are as follows: The merge key is generated
from the main key with the so called merge mapping. The
merge mapping can be specified with the command merge-rule
. The
sort key is derived from the merge key using the sort
mapping specified with the sort-rule
command. The following
scheme shows this mapping process:
The index style commands accomplishing this task are
sort-rule
and merge-rule
. One example of such a rule would
be
(sort-rule "ä" "ae")
defining that a word containing the umlaut-a will be sorted as if it
contained the letters ae
instead. This is one form of how the
umlaut-a (ä) is sorted into german indexes. With an appropriate set of
rules on can express the complete rules of a specific language.
An example of how an appropriate mapping for some of the Roman languages could look like is:
(sort-rule "à" "a")
(sort-rule "á" "a")
(sort-rule "ã" "a")
(sort-rule "è" "e")
(sort-rule "é" "e")
(sort-rule "ç" "c")
This makes the accented letters be sorted as their unaccented counterparts, yielding the desired sort ordering.
Sometimes it is necessary to specify keyword mappings that tell the
system to put something behind something else. For instance, we'd
like to map the character ö behind the letter o. No problem
if you use the special characters
~b
and
~e
which
are called the beginning and ending characters. The first
letter lexicographically precedes all other letters whereas the latter
one comes after all others. Our mapping problem can now be specified
as follows.
(sort-rule "ö" "o~e")
Now the ö is directly positioned after the o but before p.
See the manual for a detailed description of this feature. Also be informed that the keyword mappings can be specified with regular expressions. Rules of the form
(merge-rule "[-$()]" "")
are possible. This on removes all letters of the defined letter class. Regular expression substitutions are possible as well. Refer to the manual for an exact description.
The default sort ordering sorts letters according to their ordinal number in the ISO Latin alphabet. As a consequence the lowercase letters appear before the uppercase letters. To sort them case-insensitively use the command
(require "lang/latin/caseisrt.xdy")
This module defines the appropriate sort rules for the letters `A-Z' for latin-based alphabets. If your language has more letters simply add the missing ones into your style file. Have a look at the module to see how to the sort rules are defined.
Letter groups for latin based alphabets can be defined with the command
(require "lang/latin/letgroup.xdy")
If your language needs additional letter groups you can insert them into the previously defined letter group with inserting definitions of the following form:
(define-letter-group "ly" :after "l" :before "m")
(define-letter-group "ny" :after "n" :before "o")
This adds two more letter groups to the latin alphabet. Group ly is inserted between l and m, and ny is inserted between n and o. This is how two additional letters of the Hungarian alphabet can be inserted.
Assume you have index entries containing arbitrary formatting information. For example you write your index entries in TeX in the following form:
\index{\bf{In boldface please}}
To avoid specifying for each index entry the print key separately as can be done with the following command
\index{In boldface please@\bf{In boldface please}}
you can instead define a rule doing this task for you:
(merge-rule "\\bf *{(.*)}" "\1" :eregexp :again)
This extended regular expression matches all strings that are surrounded by this formatting command and in the merge phase the formatting command is simply stripped off. Thus, you don't need to write an explicit print key anymore.
If for some reason the same word appears more than once in the index, each time having another markup tag as in the following example
index
{\tt index}
you must be warned that a rule like
(merge-rule "{\\tt *(.*)}" "\1" :eregexp :again)
is probably not correct. In this case the above strings are both
mapped into the string index
thus joining their location
references into one index entry. This happens because the result of
the merge mapping is used as the equality citerium which views both
keywords as equal. To avoid this you should specify instead
(merge-rule "{\\tt *(.*)}" "\1~e" :eregexp :again)
With the additional meta character
~e
the substitution of the
second key word is placed after the first one making them
different index entries. If the second keyword should appear first,
use
~b
instead.
Especially for hierarchical indexes sometimes the result is not as expected due to special characters appearing in the keyword. In the following example the word `card' should appear before `-eyed' since the hyphen should not count as an ordinary character by means of sorting.
green
-eyed 12
card 15
This is especially problematic if the list of words on the second level is very long. To make the hyphen be simply ignored during the sorting process you should specify the following command in the index style:
(sort-rule "-" "")
This makes `-eyed' be sorted as `eyed' thus making it appear after `card' as desired.
According to the Chicago Manual of Style there exist two
different schemes of sorting word lists. In word ordering
a blank precedes any letter in the alphabet, whereas in letter
ordering it does not count at all. The following example borrowed
from the makeindex
man-page illustrates the difference:
Word Order: Letter Order:
sea lion seal
seal sea lion
By default, xindy uses word ordering. To use letter ordering include the appropriate module with the following command:
(require "ord/letorder.xdy")
It actually defines the following command:
(sort-rule " " "")
This simply removes all blanks from the keyword resulting in the desired behaviour.
The ability to deal with user-definable location structures is one of the most important new features of xindy. Many documents have a document structure that is not based on page numbers. Typical examples are manuals or appendices that come with a chapter/page-per-chapter numbering scheme, URLs, Bible verses, etc. One can even imagine the Greek alphabet as possibly appearing in a location reference. In our analysis we have found many interesting examples of location references that made us to develop the concept of location classes.
A location class is defined by a sequence of alphabets. An alphabet can be the set of arabic numbers (0, 1, 2, ...) or the roman numerals (i, ii, iii, ...). These are built-in alphabets in xindy. Addtionally, one can define more alphabets in the index style with a command like
(define-alphabet "weekdays" ("mon" "tue" "wed" "thu" "fri" "sat" "sun"))
Based on alphabets one can now compose a location class as follows:
(define-location-class "weekday-hours" ("weekday" :sep ":" "arabic-numbers"))
This class description indicates that all location refernces matching
this template are viewed as correct instances of this class. Here
:sep
makes the dot serving as a separation string separation
the alphabets from each other. Example instances of this class are:
mon:23, thu:45, sun:17
For more detailed information consult the description of the command
define-location-class
in the reference manual.
By default, xindy joins three successive location references into a range. Ranges are used as an abbrevation for a long sequence of location references. For exmaple the sequence
12, 13, 14, 15, 16
would be shorter represented as
12-16
If you don't want to have ranges, simply define your location class in the form
(define-location-class ... :min-range-length none)
The argument :min-range-length none
avoids forming of ranges.
Arbitrary numbers instead of none
define the minimum length of a
sequence of location references that are needed to form a range.
xindys default value is 2.
A common way of tagging ranges is as follows: a range of length 1 is printed with the starting page number and the suffix `f.', those of length 2 with suffix `ff.', and all others in the form `X--Y'.
Assume we want to do this for the location class pagenums we can specify the markup as follows:
(markup-range :class "pagenums" :close "f." :length 1 :ignore-end)
(markup-range :class "pagenums" :close "ff." :length 2 :ignore-end)
(markup-range :class "pagenums" :sep "--")
The first command indicates that a range (X,Y) of length 1 should
be printed in the form Xf., a range of length 2 as Xff. and
all others in the form X--Y. The switch :ignore-end
causes
the end of range location reference Y to be suppressed in the
resulting output.
Sometimes it is necessary to hide some of the parts of the index. If you have a text formatter that allows comments or macros that possibly expand to nothing, just define appropriate markup that makes things invisible to the formatter. For example, with TeX you can define a macro like this
\def\ignore#1{}
If you additionally define markup like this
(markup-index :open "\ignore{" :close "}")
you can throw away the complete index if you like, which would be a real pity!
Cross references are references pointing to an item in the index itself. Typical examples are:
foo-bar see baz
With makeindex
cross references could be specified with the
encapsulation mechanism. This has completely been removed in xindy
and we have made cross references real first-class objects.
In xindy one can declare different cross reference classes, whose purpose is (a) to make all instances of a certain class appear next to each other, and (b) to specify appropriate markup with them.
tex2xindy
recognises all index entries of the form
\index{...|\macro{where}}
as cross references. Here macro
stands for an arbitrary macro
name and where
is interpreted as the target keyword of the cross
references.
If you want to use these cross references with xindy; add the following line to your style file.
(define-crossref-class "macro")
Additionally, you can assign specific markup to cross references using
the markup-crossref
-commands.