Converters from PC Textprocessors to LaTeX - Overview

Switch conversion direction: From LaTeX to PC

Author: Wilfried Hennings, Forschungszentrum (Research Center) Jülich GmbH
last update: Feb. 8, 2000
Although this page resides on the official WWW server of Forschungszentrum Jülich GmbH, it is NOT officially supported by Forschungszentrum Jülich but results from my personal work.
The url of this page is http://www.fz-juelich.de/isr/1/texconv/pctotex.html
NOTE: The url will be changed rather soon but the new one is not yet decided.

I maintain these pages because I need converters between LaTeX and PC Textprocessors for my work and I want to share the information with others who need it. They grew significantly and, because I maintain them in my spare time, I can no longer maintain a text version and a German version in parallel, as the previous version had.

This list is as good or as bad as its support, and I need YOUR support to update and supplement this list. Please supplement if you know more and/or better ones. There are some more converters on the CTAN sites, but the following seem to be most promising for conversion to and from the currect versions of wordprocessors.

Neither correctness nor completeness is guaranteed.
All opinions mentioned (if any) are my own, not my employer's. Please send corrections, enhancements and supplements to the following address:
W.Hennings@fz-juelich.de

Note that this FAQ list contains information about converters ONLY between LaTeX and PC word processors. Converters to and from other formats may have own FAQ lists - e.g. see the link for converters to and from HTML.


General Remarks

Before looking for a converter, stop and think about a principal question:

What do you want to be converted in which way?

Do you want to convert the document structure, i.e. a heading should remain a heading, a list should remain a list etc., no matter how it will look like in the target format?
Or do you want to convert the appearance, i.e. how it looks like, no matter how it is represented in the target format?
Or do you want a mixture of both?
For using SGML as an intermediate format, you would have to specify the translation rules yourself (as far as I understood). This makes sense, and explains why different people have very different opinions about which converter best fits their needs: They simply have different demands and expectations on what should be converted and how.
So, not only practically there is no converter which is good for everyone and every purpose, but this is even principally impossible because there are no well-defined requirements which a converter should meet.

So keep this in mind when looking through the following list of converters, try yourself and decide what you need.

Principal problems of wordprocessor to LaTeX conversion

One advantage of LaTeX is that it forces to structure a document, whereas wordprocessors like Word/WordPerfect allow unstructured documents. It is hardly possible to automatically structure a document where there was no structure before.

However it is nevertheless possible to write a structured document with a wordprocessor by consistently using styles. Therefore, wordprocessor documents using styles can be converted to LaTeX e.g. by a macro written for the specific wordprocessor.

There are several ways to convert

To illustrate these, let me restrict it to the Microsoft Word case:

  1. Word binary format -> LaTeX
  2. RTF (Word ASCII format, use Word's own RTF export) -> LaTeX
  3. WordPerfect 5.1 format (use Word's own export) -> LaTeX
  4. HTML (use Word's internet assistant) -> LaTeX
  5. maybe other external format(s)

In the previous version I dared to recommend using HTML as an intermediate format. However I got some comments saying that they had much better experience with <insert your favorite converter here>.

Moreover, the <favorite converter> of someone else didn't work at all for me, and the other way around.

So I am sorry not to have ANY recommendation now. There is no converter satisfying everyone's needs, nor working under all conditions. I am sorry again you have to try for your own.


Using a Word macro

Free:

winw2ltx: A set of macros for WinWord 2, now also available for WinWord 6 and 7 (95)

Commercial:

MathType: PC equation editor with export to LaTeX. MathType home page (USA)


Converting from Word binary format

Free:

LAOLA: LAOLA can read Word6/Word7(=95) documents under Unix and extract the text. LAOLA homepage (DE site)

word2x: Converts Word6/Word7(=95) documents to LaTeX or plain text. word2x homepage (UK site)

wvWare is a library that can read the Word6/Word7(=95), Word8(=97) and Word9(=2000) binary file format. See wvWare homepage (Ireland site). It sais that the wvWare library were used as import library in the free (GPL) wordprocessor AbiWord which runs on BeOS, several Unix's and also Windows95/98/NT and stores documents as XML. So if you had an XML to LaTeX converter, this would be an alternative way to convert.
Its predecessor MSWordView could only read Word8(=97) and convert word into html, which can then be read with a browser. For more, see the old MSWordView homepage (Ireland site)

Shareware:

Word2TeX: This converter can save documents from Word6/Word7(=95) or later as LaTeX, including equation editor (!) objects and MathType objects. It principally works (I tested it with German Winword95) but is under further development, so for now I do not give more details here. Visit its homepage.


Converting from RTF

To use an RTF converter, the wordprocessor document must first be "saved as" Rich Text Format. However each new version of MS Word came with a new level of the RTF language. Most of the available converters cannot understand the current RTF version

Free:

rtflatex understands only older RTF levels

rtf2latex understands only older RTF levels. RTF utilities homepage (USA site)

w2latex understands only older RTF levels

rtf2latex2e new (1999) version which also can read current rtf levels
rtf2latex2e homepage (USA site)

Commercial:

Scientific Word: Win95 based TeX/LaTeX system with graphical editor and rtf import capability including MS's equation editor equations. Supported rtf level:
WinWord7(95): generally yes, with a few exceptions (e.g. subscripts & superscripts);
WinWord97: not yet tested by me.
Scientific Word home page (USA)


Converting from WordPerfect format

General Remarks

Apart from Scientific Word/Workplace which come with an equation-capable rtf-to-LaTeX converter, here are the only available converters which can handle equations. The problem for me is that Microsoft WinWord 7 (95) (I don't have other versions available) does a bad job converting equations to WordPerfect. In fact, only very simple constructs are (partially) converted, and more complex equations are not converted at all.

Free:

WP2LaTeX: converts WordPerfect 4.x / 5.x / 6.x, including equations, to LaTeX. homepage

TeXPerfect: WordPerfect 5.1 for DOS -> LaTeX Translater

Commercial:

Publishing Companion: Word/WordPerfect -> LaTeX converter, equation editor. KTALK's home page (USA)


HTML as intermediate format

There are free HTML converters for Word 6 and 7 for Windows available from Microsoft:
Download... IA for Word 6 / IA for Word 7 / IA for Word for Mac
Word 97 contains it by default, but in contrary to the previous versions it only recognizes heading styles if they are first converted into the corresponding html styles.

WordPerfect 7 and up have an integrated InternetPublisher.
For WordPerfect 6.1 for Windows, the InternetPublisher is available separately:
Download... InternetPublisher for WPWin 6.1

There also is a tool for Unix which is intended to convert word6, word7(95) and word8(97) binary files to html. See http://www.su.shuttle.de/turbo/word2html.c.gz

General Remarks

Because HTML is a structured format, the conversion between HTML and LaTeX is rather straightforward. However there remain the limitations of HTML compared to LaTeX, i.e. there are many elements in LaTeX which can not (yet?) be represented in HTML.

See www.w3.org for a list of converters between word processors and HTML . Or see Liste von Konvertern zwischen HTML und LaTeX (auf Deutsch).

Some converters are available from CTAN ("Comprehensive TeX Archive Network"), e.g. in .../support/latex2html and .../support/html2latex.
(The ... stands for a host specific base directory, which often is either "/pub/tex" or "/tex-archive")

HTML to LaTeX

html2latex (local): Description of HTML-to-LaTeX converter
html2latex (USA site)

html2tex (NL site): Another HTML-to-LaTeX converter.


Other intermediate formats

There are ways to use SGML as intermediate format, and others have used it successfully. Having had a quick look at it, I found it rather complicated, especially it seems that you have to define the translation rules yourself. So I did not put more effort in trying to use it. If anyone can give a ready-to-use cookbook solution, I will include it here.


Converting from FrameMaker

FrameMaker Utilities (UK site): Contains converters for both directions (LaTeX <-> FrameMaker) as well as templates which make conversion from Framemaker to LaTeX more easy


Converting from NotaBene

NB4LATEX: converts files from NotaBene4 (including ancient Greek and all the symbols of logic) to LaTeX2e format. homepage


Converting from Excel

Excel-macro to convert Excel to Latex: http://www.jam-software.com/software.html

The generated LaTeX code uses the tabular environment: http://www.hsh.no/~ag/tabular/


Related WWW pages:

Converters ported to Macintosh

General:

DANTE's LaTeX-PC-Konverter-Liste (auf Deutsch)

Der deutsche CTAN Server (The German CTAN server)

The British CTAN server

The USA CTAN Server

Deutscher CTAN Server, freie Konverter

British CTAN server, free converters

USA CTAN Server, free converters


This HTML page is part of the texcnv site.
Copyright © 1998, 1999, 2000 Wilfried Hennings
You may copy and redistribute it under the following conditions: