antiword 0.37 original source

This commit is contained in:
2020-11-12 16:52:36 -05:00
commit 71f6baaafa
141 changed files with 48223 additions and 0 deletions

342
Docs/COPYING Normal file
View File

@@ -0,0 +1,342 @@
GNU GENERAL PUBLIC LICENSE
Version 2, June 1991
Copyright (C) 1989, 1991 Free Software Foundation, Inc.
675 Mass Ave, Cambridge, MA 02139, USA
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
Preamble
The licenses for most software are designed to take away your
freedom to share and change it. By contrast, the GNU General Public
License is intended to guarantee your freedom to share and change free
software--to make sure the software is free for all its users. This
General Public License applies to most of the Free Software
Foundation's software and to any other program whose authors commit to
using it. (Some other Free Software Foundation software is covered by
the GNU Library General Public License instead.) You can apply it to
your programs, too.
When we speak of free software, we are referring to freedom, not
price. Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
this service if you wish), that you receive source code or can get it
if you want it, that you can change the software or use pieces of it
in new free programs; and that you know you can do these things.
To protect your rights, we need to make restrictions that forbid
anyone to deny you these rights or to ask you to surrender the rights.
These restrictions translate to certain responsibilities for you if you
distribute copies of the software, or if you modify it.
For example, if you distribute copies of such a program, whether
gratis or for a fee, you must give the recipients all the rights that
you have. You must make sure that they, too, receive or can get the
source code. And you must show them these terms so they know their
rights.
We protect your rights with two steps: (1) copyright the software, and
(2) offer you this license which gives you legal permission to copy,
distribute and/or modify the software.
Also, for each author's protection and ours, we want to make certain
that everyone understands that there is no warranty for this free
software. If the software is modified by someone else and passed on, we
want its recipients to know that what they have is not the original, so
that any problems introduced by others will not reflect on the original
authors' reputations.
Finally, any free program is threatened constantly by software
patents. We wish to avoid the danger that redistributors of a free
program will individually obtain patent licenses, in effect making the
program proprietary. To prevent this, we have made it clear that any
patent must be licensed for everyone's free use or not licensed at all.
The precise terms and conditions for copying, distribution and
modification follow.
GNU GENERAL PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. This License applies to any program or other work which contains
a notice placed by the copyright holder saying it may be distributed
under the terms of this General Public License. The "Program", below,
refers to any such program or work, and a "work based on the Program"
means either the Program or any derivative work under copyright law:
that is to say, a work containing the Program or a portion of it,
either verbatim or with modifications and/or translated into another
language. (Hereinafter, translation is included without limitation in
the term "modification".) Each licensee is addressed as "you".
Activities other than copying, distribution and modification are not
covered by this License; they are outside its scope. The act of
running the Program is not restricted, and the output from the Program
is covered only if its contents constitute a work based on the
Program (independent of having been made by running the Program).
Whether that is true depends on what the Program does.
1. You may copy and distribute verbatim copies of the Program's
source code as you receive it, in any medium, provided that you
conspicuously and appropriately publish on each copy an appropriate
copyright notice and disclaimer of warranty; keep intact all the
notices that refer to this License and to the absence of any warranty;
and give any other recipients of the Program a copy of this License
along with the Program.
You may charge a fee for the physical act of transferring a copy, and
you may at your option offer warranty protection in exchange for a fee.
2. You may modify your copy or copies of the Program or any portion
of it, thus forming a work based on the Program, and copy and
distribute such modifications or work under the terms of Section 1
above, provided that you also meet all of these conditions:
a) You must cause the modified files to carry prominent notices
stating that you changed the files and the date of any change.
b) You must cause any work that you distribute or publish, that in
whole or in part contains or is derived from the Program or any
part thereof, to be licensed as a whole at no charge to all third
parties under the terms of this License.
c) If the modified program normally reads commands interactively
when run, you must cause it, when started running for such
interactive use in the most ordinary way, to print or display an
announcement including an appropriate copyright notice and a
notice that there is no warranty (or else, saying that you provide
a warranty) and that users may redistribute the program under
these conditions, and telling the user how to view a copy of this
License. (Exception: if the Program itself is interactive but
does not normally print such an announcement, your work based on
the Program is not required to print an announcement.)
These requirements apply to the modified work as a whole. If
identifiable sections of that work are not derived from the Program,
and can be reasonably considered independent and separate works in
themselves, then this License, and its terms, do not apply to those
sections when you distribute them as separate works. But when you
distribute the same sections as part of a whole which is a work based
on the Program, the distribution of the whole must be on the terms of
this License, whose permissions for other licensees extend to the
entire whole, and thus to each and every part regardless of who wrote it.
Thus, it is not the intent of this section to claim rights or contest
your rights to work written entirely by you; rather, the intent is to
exercise the right to control the distribution of derivative or
collective works based on the Program.
In addition, mere aggregation of another work not based on the Program
with the Program (or with a work based on the Program) on a volume of
a storage or distribution medium does not bring the other work under
the scope of this License.
3. You may copy and distribute the Program (or a work based on it,
under Section 2) in object code or executable form under the terms of
Sections 1 and 2 above provided that you also do one of the following:
a) Accompany it with the complete corresponding machine-readable
source code, which must be distributed under the terms of Sections
1 and 2 above on a medium customarily used for software interchange; or,
b) Accompany it with a written offer, valid for at least three
years, to give any third party, for a charge no more than your
cost of physically performing source distribution, a complete
machine-readable copy of the corresponding source code, to be
distributed under the terms of Sections 1 and 2 above on a medium
customarily used for software interchange; or,
c) Accompany it with the information you received as to the offer
to distribute corresponding source code. (This alternative is
allowed only for noncommercial distribution and only if you
received the program in object code or executable form with such
an offer, in accord with Subsection b above.)
The source code for a work means the preferred form of the work for
making modifications to it. For an executable work, complete source
code means all the source code for all modules it contains, plus any
associated interface definition files, plus the scripts used to
control compilation and installation of the executable. However, as a
special exception, the source code distributed need not include
anything that is normally distributed (in either source or binary
form) with the major components (compiler, kernel, and so on) of the
operating system on which the executable runs, unless that component
itself accompanies the executable.
If distribution of executable or object code is made by offering
access to copy from a designated place, then offering equivalent
access to copy the source code from the same place counts as
distribution of the source code, even though third parties are not
compelled to copy the source along with the object code.
4. You may not copy, modify, sublicense, or distribute the Program
except as expressly provided under this License. Any attempt
otherwise to copy, modify, sublicense or distribute the Program is
void, and will automatically terminate your rights under this License.
However, parties who have received copies, or rights, from you under
this License will not have their licenses terminated so long as such
parties remain in full compliance.
5. You are not required to accept this License, since you have not
signed it. However, nothing else grants you permission to modify or
distribute the Program or its derivative works. These actions are
prohibited by law if you do not accept this License. Therefore, by
modifying or distributing the Program (or any work based on the
Program), you indicate your acceptance of this License to do so, and
all its terms and conditions for copying, distributing or modifying
the Program or works based on it.
6. Each time you redistribute the Program (or any work based on the
Program), the recipient automatically receives a license from the
original licensor to copy, distribute or modify the Program subject to
these terms and conditions. You may not impose any further
restrictions on the recipients' exercise of the rights granted herein.
You are not responsible for enforcing compliance by third parties to
this License.
7. If, as a consequence of a court judgment or allegation of patent
infringement or for any other reason (not limited to patent issues),
conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot
distribute so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you
may not distribute the Program at all. For example, if a patent
license would not permit royalty-free redistribution of the Program by
all those who receive copies directly or indirectly through you, then
the only way you could satisfy both it and this License would be to
refrain entirely from distribution of the Program.
If any portion of this section is held invalid or unenforceable under
any particular circumstance, the balance of the section is intended to
apply and the section as a whole is intended to apply in other
circumstances.
It is not the purpose of this section to induce you to infringe any
patents or other property right claims or to contest validity of any
such claims; this section has the sole purpose of protecting the
integrity of the free software distribution system, which is
implemented by public license practices. Many people have made
generous contributions to the wide range of software distributed
through that system in reliance on consistent application of that
system; it is up to the author/donor to decide if he or she is willing
to distribute software through any other system and a licensee cannot
impose that choice.
This section is intended to make thoroughly clear what is believed to
be a consequence of the rest of this License.
8. If the distribution and/or use of the Program is restricted in
certain countries either by patents or by copyrighted interfaces, the
original copyright holder who places the Program under this License
may add an explicit geographical distribution limitation excluding
those countries, so that distribution is permitted only in or among
countries not thus excluded. In such case, this License incorporates
the limitation as if written in the body of this License.
9. The Free Software Foundation may publish revised and/or new versions
of the General Public License from time to time. Such new versions will
be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.
Each version is given a distinguishing version number. If the Program
specifies a version number of this License which applies to it and "any
later version", you have the option of following the terms and conditions
either of that version or of any later version published by the Free
Software Foundation. If the Program does not specify a version number of
this License, you may choose any version ever published by the Free Software
Foundation.
10. If you wish to incorporate parts of the Program into other free
programs whose distribution conditions are different, write to the author
to ask for permission. For software which is copyrighted by the Free
Software Foundation, write to the Free Software Foundation; we sometimes
make exceptions for this. Our decision will be guided by the two goals
of preserving the free status of all derivatives of our free software and
of promoting the sharing and reuse of software generally.
NO WARRANTY
11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
REPAIR OR CORRECTION.
12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.
END OF TERMS AND CONDITIONS
Appendix: How to Apply These Terms to Your New Programs
If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.
To do so, attach the following notices to the program. It is safest
to attach them to the start of each source file to most effectively
convey the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.
<one line to give the program's name and a brief idea of what it does.>
Copyright (C) 19yy <name of author>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
Also add information on how to contact you by electronic and paper mail.
If the program is interactive, make it output a short notice like this
when it starts in an interactive mode:
Gnomovision version 69, Copyright (C) 19yy name of author
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
This is free software, and you are welcome to redistribute it
under certain conditions; type `show c' for details.
The hypothetical commands `show w' and `show c' should show the appropriate
parts of the General Public License. Of course, the commands you use may
be called something other than `show w' and `show c'; they could even be
mouse-clicks or menu items--whatever suits your program.
You should also get your employer (if you work as a programmer) or your
school, if any, to sign a "copyright disclaimer" for the program, if
necessary. Here is a sample; alter the names:
Yoyodyne, Inc., hereby disclaims all copyright interest in the program
`Gnomovision' (which makes passes at compilers) written by James Hacker.
<signature of Ty Coon>, 1 April 1989
Ty Coon, President of Vice
This General Public License does not permit incorporating your program into
proprietary programs. If your program is a subroutine library, you may
consider it more useful to permit linking proprietary applications with the
library. If this is what you want to do, use the GNU Library General
Public License instead of this License.

219
Docs/ChangeLog Normal file
View File

@@ -0,0 +1,219 @@
****************************************************************************
* Changes in Antiword from versions 0.22 to 0.37 *
****************************************************************************
Changes 0.36 to 0.37
--------------------
Bug fixes:
- Bug reported by Suzanne Skinner <tril@igs.net> (and others) fixed
New features:
- XML/DocBook output now contains <footnote> tags
- Antiword is now based on DeskLib instead of RISC_OSLib (RISC OS only)
- Show page headers and footers (PostScript and PDF output only)
- Show text that was removed by the revisioning system
- Improved kantiword, based on information from Stefan Wiens <s.wi@gmx.net>
Changes 0.35 to 0.36
--------------------
Bug fixes:
- Bug reported by Michael Minn <mail@michaelminn.com> fixed
New features:
- The default mapping file is now based on the locale (Unix/Linux) or on
the active codepage (DOS)
- A Word document can now be saved as "formatted" text. That means with things
like *bold* to show bold text, /italics/ to show italics and _undeline_ to
show underlined text are added to the plain text. Based on patches send by
Ofir Reichenberg <ofir@qlusters.com>
- Improved table parsing. Based on information supplied by Bastien Legras
<bastien.legras@nectech.fr> and Alex de Kruijff <freebsd@akruijff.dds.nl>
- A Word document can now be saved in PDF.
- First attempt to support PostScript output in the Cyrillic alphabet. Based
on work done by Alexander Belyaev <isle@free.kursknet.ru>
- Better support for the Cyrillic alphabet
Changes 0.34 to 0.35
--------------------
Bug fixes:
- Fixed the bug in the use of the environment variable ANTIWORDHOME
New features:
- The XML/DocBook output is slightly better.
- Scale view window is closed when the main window is closed. Thanks to Tony
Moore <old_coaster@yahoo.co.uk> (RISC OS only)
- More support for WinWord 1.x documents
Changes 0.33 to 0.34
--------------------
Bug fixes:
- Bug in UTF-8 tables fixed
- Bug reported by Stewart Goldwater <sg@janus.freeserve.co.uk> fixed
- Bug reported by Karl-Otto Linn <linn@informatik.fh-wiesbaden.de> fixed
- Fixed a bug that made DOS hang when Antiword processed a document > 8 MB.
New features:
- Better approximations for fancy characters in the output
- A Word document can now be saved as XML/DocBook.
- Linux Makefile is now closer to conventions.
- Support for Text Boxes
- An environment variable ANTIWORDHOME was added to create a more flexable
place for the fontnames file and the mapping files.
- Antiword is now Latin9 enabled. Thanks to Stefan Bellon
<sbellon@sbellon.de> (RISC OS only)
- Some support for MacWord 4 and 5 documents
- More support for Word-for-DOS documents
- Support for superscripts and subscripts
- Displays slightly more images.
- Improved lists, especially in documents from Word 97 or later.
Changes 0.32 to 0.33
--------------------
Bug fixes:
- Bug reported by Yannick PERRET <yperret@bat710.univ-lyon1.fr> fixed
Old features:
- The -X option is no longer supported. Replace "-X 2" by "-m 8859-2.txt"
New features:
- Slightly more accurate font translation
- Full support for WinWord 2.0 documents
- Some support for Word-for-DOS and WinWord 1.x documents
- Selective header numbering
- Implementation of stylesheets
- The system-wide directory for the mapping files was changed from
"/opt/antiword/share" to "/usr/share/antiword", in accordance with FHS,
the file-system hierarchy standard, as suggested by Anand Buddhdev
<arb@anand.org>.
- Antiword now turns white text into light gray text.
- Antiword is now closer to "64-bit clean". Based on information supplied by
Duncan Haldane <f.duncan.m.haldane@worldnet.att.net>.
Changes 0.31 to 0.32
--------------------
Bug fixes:
- Bug reported by Forrest J. Cavalier III <mibsoft@mibsoftware.com> fixed
- Bug reported by Jan ONDREJ (SAL) <ondrejj@salstar.sk> fixed
- Bug in dealing with RLE compressed bitmap images fixed
- Bug in image scaling fixed (RISC OS only)
New features:
- Improved leading (Unix only; PostScript version only)
- Antiword can now read from the standard input. This is based on an idea by
Matthew Miller <mattdm@mattdm.org>. (Unix only)
- A white background looks much better. (RISC OS only)
- A system-wide directory for the mapping files, as suggested by Sven Geggus
<sven@geggus.net> and many others. (Unix only)
- Antiword can now deal with documents larger than 7 MB.
Changes 0.30 to 0.31
--------------------
Bug fixes:
- Bug in the "Show hidden (by Word) text" feature fixed
- Bug reported by David Aspinwall <aspinwall@timesten.com> fixed
- Bug reported by Robert Steinmetz <rob@steinmetznet.com> fixed
Old features:
- The -g and -c options are no longer supported. The -c option was the default
and is now used automatically. (Unix only)
New features:
- Ability to display some of the images
- Ability to use landscape mode (Unix only; PostScript version only)
- Support for all ISO-8859 character sets plus KOI8 and some code pages
(Unix only; text version only)
- Antiword will now give a warning if the specified PostScript paper size is
unsupported. Thanks to Greg Robinson <Greg.Robinson@dsto.defence.gov.au>
- Changed from PostScript version 1 to version 2
- Antiword now returns 1 if no Word document is found among the files listed
on the command line, as suggested by Jens Schleusener
<Jens.Schleusener@dlr.de>.
- Takes the right margin into account
- The PostScript part now supports the AvantGarde, Bookman, Helvetica-Narrow,
NewCenturySchlbk and Palatino fonts (Unix only)
- More accurate fontnames translation table
- Initial scale factor is now configurable (RISC OS only)
Changes 0.29 to 0.30
--------------------
Bug fixes:
- Bug in the generated PostScript (nocurrentpoint) fixed
- Bug reported by Keith Bamford <kbamford@eurobell.co.uk> fixed
- Bug in the chapter numbering font fixed
New features:
- Improved handling of changes in the font size on a single line.
- Some support for long file names (RISC OS only)
- Thanks to David Kanareck <david@davidkanareck.demon.co.uk>, Antiword can
now deal with documents made by "Word for Asian languages", but only
when these documents are written in a European language.
- Character properties "Caps" and "SmallCaps" for accented characters
- More accurate fontnames translation table. (RISC OS only)
- PostScript part now supports the Times and Helvetica fonts. (Unix only)
Changes 0.28 to 0.29
--------------------
Bug fixes:
- Bug reported by Paul McCann <P.J.McCann@cfm1220.x400.icl.co.uk> fixed
- Character property "SmallCaps" works better now
- Bug reported by Richard Lambley <richard@wireless.demon.co.uk> fixed
- Fixed a bug in the linewidth computation (Unix only)
New features:
- A Word document can now be saved as PostScript (Unix only, Courier font only)
- Left, Center, Right and Justify alignment added for Word 97
- Supports the Macintosh character set
Changes 0.27 to 0.28
--------------------
Licence:
- Distributed under the GNU General Public License
Bug fixes:
- Bug reported by Richard Lambley <richard@wireless.demon.co.uk> has not
been fixed yet.
- Deals correctly with fancy quotes in files from a Macintosh
New features:
- Supports character properties "SmallCaps", "Caps" and "Hidden Text"
- The use of fonts and font sizes for "fast saved" documents is now supported.
- Separators between the text, the footnotes and the endnotes
- Footnotes are now numbered in Arabic numericals (1, 2, 3), endnotes are now
numbered in Roman numericals (i, ii, iii).
Changes 0.26 to 0.27
--------------------
Bug fixes:
- The main title now shows the first 12 characters of the file name.
New features:
- "Fast saved" documents are now supported for Word 97.
- All tables are now supported for Word 97.
- It is now possible to scale the text.
Changes 0.25 to 0.26
--------------------
Bug fixes:
- Fixed several problems with the Choices file
- Closed a small memory leak
New features:
- The use of fonts and font sizes for "full saved" documents is now supported.
- Most tables are now supported for Word 97.
- Header numbers are now supported for Word 97.
Changes 0.24 to 0.25
--------------------
Bug fixes:
- Improved handling of memory shortages
- Some special tables were messed up.
New features:
- "Fast saved" documents are now supported for Word 6 and 7.
- A new option to permit Antiword to change the filetype of Word documents to
MSWord (&ae6).
- A Wordfile can now be saved as a Drawfile.
- The look and feel has been changed from editor-like to browser-like
Changes 0.23 to 0.24
--------------------
Bug fixes:
- Empty paragraphs in numbered list were not always numbered correctly.
- In very complex tables some text could get lost.
New features:
- F3 is now a shortcut to the "Save as" dialogue box.
- Left, Center, Right and Justify alignment added for Word 6 and 7
- [pic] marks the place where an image should have been.
- It is now possible to have a writeable Choices file, even when Antiword
itself is on a read-only medium.
Changes 0.22 to 0.23
--------------------
New features:
- Paragraph breaks are now an option.
- Bulleted single level lists for files from Word 6 and 7
- Numbered single level lists (some styles) for files from Word 6 and 7

134
Docs/Emacs Normal file
View File

@@ -0,0 +1,134 @@
From: Alex Schroeder <alex@emacswiki.org>
Subject: Re: MS Word mode?
Date: Fri, 08 Nov 2002 00:40:15 +0100
Roger Mason <rmason@sparky2.esd.mun.ca> writes:
> There was a question about this recently on this forum. Look for
> undoc.el, I got it from the wiki (I think). It has worked very well for
> me to date, although I have not attempted ro read complex documents.
Well, it makes things readable, but it is far from perfect -- it seems
to just delete any non-ascii characters, such that sometimes you will
see words such as "Alex8" where "8" is some garbage that just looked
like being part of a real word... In other words, interfacing to
something like catdoc, antiword, or wvText (included with AbiWord)
might be cool. Actually all you need is this:
(add-to-list 'auto-mode-alist '("\\.doc\\'" . no-word))
(defun no-word ()
"Run antiword on the entire buffer."
(shell-command-on-region (point-min) (point-max) "antiword - " t t))
Alex.
===============================================================================
From: Arnaldo Mandel <am@ime.usp.br>
Subject: Re: MS Word mode?
Date: Fri, 8 Nov 2002 11:52:33 -0200
Alex Schroeder wrote (on Nov 8, 2002):
> Actually all you need is this:
>
> (add-to-list 'auto-mode-alist '("\\.doc\\'" . no-word))
>
> (defun no-word ()
> "Run antiword on the entire buffer."
> (shell-command-on-region (point-min) (point-max) "antiword - " t t))
On my system there are lots of filenames ending in .doc whose files
are not Word files. So I modified your function thusly
(defun no-word ()
"Run antiword on the entire buffer."
(if (string-match "Microsoft "
(shell-command-to-string (concat "file " buffer-file-name)))
(shell-command-on-region (point-min) (point-max) "antiword - " t t)))
Works in Solaris and Linux, and should work on other unixes as well.
am
===============================================================================
From: Alex Schroeder <alex@emacswiki.org>
Subject: Re: MS Word mode?
Date: Fri, 08 Nov 2002 18:24:07 +0100
Arnaldo Mandel <am@ime.usp.br> writes:
> (defun no-word ()
> "Run antiword on the entire buffer."
> (if (string-match "Microsoft "
> (shell-command-to-string (concat "file " buffer-file-name)))
> (shell-command-on-region (point-min) (point-max) "antiword - " t t)))
Cool. I did not know about "file"... :)
My stuff is on the wiki, btw:
* http://www.emacswiki.org/cgi-bin/wiki.pl?AntiWord
Alex.
===============================================================================
From: Benjamin Riefenstahl <Benjamin.Riefenstahl@epost.de>
Subject: Re: emacs rmail. How to convert .doc to plain text
Date: 24 Nov 2002 18:08:22 +0100
Hi,
Puff Addison <puff@theaddisons.demon.co.uk> writes:
> Yes, please post your Emacs integration code.
Ok, see below. I should note that it is probably also possible to
(ab-)use jka-compr for this, which would make my two functions
obsolete.
so long, benny
>>>>>>>
(defun benny-antiword-file-handler (operation &rest args)
;; First check for the specific operations
;; that we have special handling for.
(cond ((eq operation 'insert-file-contents)
(apply 'benny-antiword-insert-file args))
((eq operation 'file-writable-p)
nil)
((eq operation 'write-region)
(error "Word documents can't be written"))
;; Handle any operation we don't know about.
(t (let ((inhibit-file-name-handlers
(cons 'benny-antiword-file-handler
(and (eq inhibit-file-name-operation operation)
inhibit-file-name-handlers)))
(inhibit-file-name-operation operation))
(apply operation args)))))
(defun benny-antiword-insert-file (filename &optional visit beg end replace)
(set-buffer-modified-p nil)
(setq buffer-file-name (file-truename filename))
(setq buffer-read-only t)
(let ((start (point))
(inhibit-read-only t))
(if replace (delete-region (point-min) (point-max)))
(save-excursion
(let ((coding-system-for-read 'utf-8)
(filename (encode-coding-string
buffer-file-name
(or file-name-coding-system
default-file-name-coding-system))))
(call-process "antiword" nil t nil "-m" "UTF-8.txt"
filename))
(list buffer-file-name (- (point) start)))))
(setq file-name-handler-alist
(cons '("\\.doc\\'" . benny-antiword-file-handler)
file-name-handler-alist))
<<<<<<<

14
Docs/Exmh Normal file
View File

@@ -0,0 +1,14 @@
From: Glenn Burkhardt <glenn@vtecus.com>
Subject: It's great!
Date: Wed, 22 Aug 2001 12:02:54 -0400
Thank you for this program. Thank you very much! Thank you immensely!!
P.S. I find entry helpful as a /etc/mailcap rule:
application/msword;/usr/local/bin/antiword -t %s | less; needsterminal; \
copiousoutput; print=antiword -p letter %s|lpr
I completely integrates the text mode with my mailer, exmh. You might
want to include it in your documentation.

113
Docs/FAQ Normal file
View File

@@ -0,0 +1,113 @@
Frequently Asked Questions
==========================
These questions and answers are mainly Linux/Unix oriented. For other
Operating Systems you may want to read the documentation provided by the
people who ported Antiword.
Q1: How do I install Antiword?
A1: (a) Make a suitable directory such as '$HOME/src/antiword' and copy the
'antiword.tar.gz' file to this directory.
(b) decompress: 'gunzip antiword.tar.gz'
(c) unpack: 'tar xvf antiword.tar'
(d) compile: 'make all'
(e) install: 'make install'. This will install Antiword in the $HOME/bin
directory.
(f) copy the file 'fontnames' and one or more mapping files from the
Resources directory to the $HOME/.antiword directory (note the dot
before antiword!).
NOTE: you can skip point (f) if your system administrator already copied
these files to /usr/share/antiword.
Q2: I get the message "I can't open your mapping file (xxxx-x.txt)"
A2: This means that the mapping file has not been installed. The installation
may have to be done manually. See above answer A1, point (f).
NOTE: Antiword assumes that a file that can't be opened for reading is a
file that doesn't exist.
Q3: How do I use Antiword?
A3: Type antiword -h and see.
Q4: I tried "antiword -m /some/directory/8859-1.txt word.doc", but this
doesn't work.
A4: The -m option is followed by the name of a mapping file, a full pathname
won't work.
Q5: How does Antiword deal with Word macro viruses?
A5: Antiword does not run any Word macros because it can't do so.
Therefore such a virus will not harm your computer system.
Q6: What is the purpose of the file 'fontnames' in the '/usr/share/antiword/'
or '$HOME/.antiword' directory?
A6: This file provides a translation table from the font names used in a Word
document to the font names used by a PostScript printer.
The file 'fontnames' can be edited to match the font collection used by
your PostScript printer.
Q7: What is 'Hidden Text'?
A7: Hidden Text is Microsoft speak for text that may or may not be shown
on the screen, subject to the user's preferences, but such text is never
printed.
Q8: Antiword claims to support all ISO-8859 character sets, but I can't see
any of this.
A8: There is support for all ISO-8859 character sets, but only in the text
output, not in the PostScript output.
The result can only be seen if your xterm, vtterm, kvt or similar
terminal emulation program uses a font compatible with that ISO-8859
character set.
Q9: Which mapping file (-m option) is correct in my situation?
A9: The correct mapping file depends on the character set you need for output
in a specific language.
For Western European languages (like English, French, German) this is
8859-1.txt. (OS/2: cp1252.txt) (DOS: cp850.txt)
For Eastern European languages (like Polish, Czech, Slovak, Croatian) this
is 8859-2.txt. (OS/2: cp1250.txt) (DOS: cp852.txt)
For Esperanto use 8859-3.txt.
For Russian use 8859-5.txt or koi8-r.txt. (OS/2: cp1251.txt)
(DOS: cp866.txt)
For Ukrainian use koi8-u.txt.
For Arabic use 8859-6.txt. (DOS: cp864.txt)
For Hebrew use 8859-8.txt. (DOS: cp862.txt)
For Thai use 8859-11.txt.
If your system supports it, you might also try UTF-8.txt.
NOTE: UTF-8 also enables Antiword to show text in languages like Chinese,
Japanese and Korean.
Q10: I tried UTF-8, but some documents show more garbage than text. Why?
A10: UTF-8 will only work if the document was saved by a Unicode enabled
version of Word (or if Word used ISO-8859-1 as its internal encoding).
The following versions of Word are known to be Unicode enabled:
Word 6 and Word 7 for Asian languages, all versions of Word 97,
Word 98 (Mac), Word 2000, Word 2001 (Mac) and Word 2002 (aka Word XP).
Q11: Why can't Antiword read from stdin directly? Why use a temporary file?
A11: The information in a Word document is not stored sequentially. Therefore
the use of the "fseek" function can't be avoided. So Antiword must copy
stdin to a temporary file first and then process that file.
Q12: Why does the XML output of Antiword sometimes contain such a strange
structure or practically no structure at all?
A12: Remember that Word is basically 'text plus appearance' and XML is
basically 'text plus structure'. If a Word document is written by a
competent person there will be a balance between appearance and structure,
but if a Word document is written by an inexperienced or incompetent
person the Word document can end up without a structure, or worse, with a
terrible structure.
Antiword can't create a structure when there is none.
Q13: Why is the Postscript output in Cyrillic in ISO-8869-5? Nobody uses that
character set.
A13: For Cyrillic you a have:
(a) koi8 does not cover all languages that use Cyrillic,
(b) cp866, cp1251 and Mac-Cyrillic are proprietary,
(c) Unicode and UTF-8 are not supported by PostScript yet and
(d) ISO-8859-5, the character set that nobody uses.
Q14: I have used "antiword -p a4 -m 8869-5.txt file.doc > file.ps", but I get
no Cyrillic characters.
A14: Programs like Ghostscript and Ghostview need Cyrillic enabled fonts in
order to show Cyrillic characters. A PostScript printer needs to be
Cyrillic enabled in order to show Cyrillic characters.

44
Docs/History Normal file
View File

@@ -0,0 +1,44 @@
History of Antiword by (C) Adri van Os
------------------------------------
The Name
--------
The name comes from: "The antidote against people who send Microsoft(R) Word
files to everybody, because they believe that everybody runs Windows(R) and
therefore runs Word".
Version 0.37 (21 Oct 2005)
--------------------------
Beta release, for evaluation by the public.
Known Limitations
-----------------
1) The layout of Word documents is kept secret by Microsoft(R). Therefore
Antiword is based on information gathered from the Internet and on
guesswork.
2) Antiword doesn't show all the images included in a Word document.
3) Antiword doesn't do any hyphenation, because hyphenation is language
dependent.
4) Antiword places footnotes at the end of the text.
5) Antiword places box text after normal text and not in a box.
6) Antiword doesn't try to emulate any of Word's DTP abilities.
7) PostScript ouput will not work in combination with UTF-8. It only works in
combination with character sets ISO-8859-1, ISO-8859-2 and ISO-8859-5.
8) Antiword's error messages are not very helpful.
Known Bugs
----------
1) Antiword cannot handle encrypted documents.
2) Antiword assumes default tab stops.
3) Antiword doesn't handle frames.
4) Antiword ignores page headers and footers.
5) Antiword only handles lists in some of the styles.
6) Antiword cannot handle some types of multilevel lists.
7) Antiword assumes that all Word documents made on a Macintosh with Word
version 6 or older use the MacRoman character set.

88
Docs/Mozilla Normal file
View File

@@ -0,0 +1,88 @@
Date: Mon, 11 Nov 2002 11:36:21 +0000
From: Cam <camilo@mesias.co.uk>
Subject: Re: antiword
Hi
I have updated the script for the latest Mozilla with plugger, as found
in RedHat 8. This makes the default action a very quick text view of a
document, much better IMHO than starting ooffice or abiword. If users
want to edit the file they can still save as.
Here is a slightly improved script for gnome users:
#!/bin/bash
tmpfile=/tmp/aw$$.txt
lastditch=`which vi`
editor=${EDITOR:-$lastditch}
if [ ! -x $editor ] ; then
editor=$lastditch
fi
tmpfile=/tmp/aw$$.txt
gtopts="-t antiword-helper --hide-menubar"
antiword "$1" > $tmpfile
chmod -w $tmpfile
gnome-terminal $gtopts -x $editor $tmpfile ; chmod +w $tmpfile ; rm $tmpfile
Here is the script for non-gnome users:
#!/bin/bash
tmpfile=/tmp/aw$$.txt
lastditch=`which vi`
editor=${EDITOR:-$lastditch}
if [ ! -x $editor ] ; then
editor=$lastditch
fi
antiword "$1" > $tmpfile
chmod -w $tmpfile
xterm -T "antiword-helper" -e $editor $tmpfile
chmod +w $tmpfile
rm $tmpfile
To use the scripts add an entry into your plugger config file
(pluggerrc, for locations check man plugger). Mine is in
/home/cxm/.netscape/pluggerrc:
The line to add is (it has a leading tab):
ignore_errors exits: antiword-helper "$file"
Here is my config file after I added the line
application/rtf: rtf: Rich Text Format
application/x-msword: doc, dot: Microsoft Word Document
application/msword: doc, dot: Microsoft Word Document
ignore_errors exits: antiword-helper "$file"
nokill exits: oowriter "$file"
repeat swallow(AbiWord) fill: AbiWord -nosplash -geometry
+9000+9000 "$file" >/dev/null 2>/dev/null
repeat swallow(PCFileViewer) fill: sdtpcv "$file"
repeat swallow(PCFileViewer) fill: /opt/SUNWdtpcv/bin/sdtpcv
"$file"
Then start Mozilla / Netscape and you should be able to quickly view
word docs from the browser and as email attachments.
Hope that helps,
-Cam

24
Docs/Mutt Normal file
View File

@@ -0,0 +1,24 @@
From: Sven Geggus (sven@geggus.net)
Subject: Re: Word attachments in Mutt
Newsgroups: comp.mail.mutt
Date: 2001-05-16 01:21:11 PST
Bob Zimmerman <bobzim@no.spam.org> wrote:
> I receive MS Word attachments in Mutt reguarly. Is there a way to
> read these via Mutt in a Linux/Solaris environment? (e.g. Lynx or
> some type of viewer)?
The best M$-word to ASCII converter has to be antiword!
Just put the following line into .mailcap:
application/msword; antiword %s; copiousoutput
Sven
--
"We just typed make"
(Stephen Lambrigh, Director of Server Product Marketing at Informix
about porting their Database to Linux)
/me is giggls@ircnet, http://geggus.net/sven/ on the Web

129
Docs/Netscape Normal file
View File

@@ -0,0 +1,129 @@
From: "Craig D. Miller" <Craig.D.Miller@jpl.nasa.gov>
Hi,
Steps to integrate antiword into NetScape 4.73 (should also work with earlier
versions).
Programs that launch from netscape must startup an X window to display their
output (otherwise output ends up it the bit bucket on your system). I wrote the
following script to do this for antiword (and saved it as
"/usr/local/bin/xantiword":
#!/bin/csh -f
setenv FILE $1
setenv NEWFILE ${FILE}.xantiword
/usr/local/bin/antiword $FILE >&$NEWFILE
/usr/bin/X11/xterm -title "$FILE (MS Word)" -e /usr/bsd/more $NEWFILE
rm -f $NEWFILE
The above script works, but may not be the best way to do it. If you come up
with a more elegant solution, then please let me know.
Next you'll have to tell netscape to execute the "/usr/local/bin/xantiword"
script when word documents are clicked on. The easiest way to do this is to
change the /usr/local/lib/netscape/mailcap netscape configuration file. For
SGI version of netscape the following two lines are changed. For other versions
of netscape, one should find similar lines or will need to add the new lines.
Old lines (try to run SoftWindows, which is not installed on my system):
application/x-dos_ms_word; /usr/local/lib/netscape/swinexec %s winword; \
description="Microsoft Word-for-Windows Document";
application/msword; /usr/local/lib/netscape/swinexec %s winword; \
description="Microsoft Word-for-Windows Document";
New lines (for antiword execution), which replace old lines on my system:
application/x-dos_ms_word; /usr/local/bin/xantiword %s; \
description="Microsoft Word-for-Windows Document";
application/msword; /usr/local/bin/xantiword %s; \
description="Microsoft Word-for-Windows Document";
These changes can also be made via the netscape preferences, under
Navigator/Applications, but then the changes would only be for the user that
changed them. The above change to the mailcap file affects all users, which is
what you'll usually want.
Note that the above file paths may be different for your system. On our linux
box, a quick search DID NOT show where the mailcap for netscape was stored, but
I did find one in /etc/mailcap. I don't have time to experiment to see if this
is the same one that netscape uses.
If you have questions then please E-mail me.
- Craig
===============================================================================
From: "Craig D. Miller" <Craig.D.Miller@jpl.nasa.gov>
Hi,
I just discovered a program called "xless". It would actually be easier to use
than my previous xterm/more solution. To use it change the
"/usr/local/bin/xantiword" script to:
#!/bin/csh -f
setenv FILE $1
/usr/local/bin/antiword $FILE | /usr/freeware/bin/xless \
-title "$FILE (MS Word)" -geometry 100x60
Note that one also needs to have xless installed. It can be found on the
SGI Freeware Feb 1999 (or later) CD-ROM.
- Craig
===============================================================================
From: Bruno Crochet <bruno.crochet@pse.unige.ch>
Hi!
Another way to integrate antiword into netscape is to copy the following
line in your .mailcap file :
application/msword; ns="%s"\; nf="${ns}".ps\; antiword -pa4 "${ns}" >
"${nf}"\; gv "${nf}"\; sleep 2 \; rm "${nf}"
Bruno.
===============================================================================
From: Andoni Zarate <azarate@saincotrafico.com>
In order to view the file into netscape you can write the xantiword file
like this:
#!/bin/csh -f
setenv FILE $1
setenv NEWFILE ${FILE}.xantiword
/usr/local/bin/antiword $FILE >&$NEWFILE
netscape -remote 'openFile('$NEWFILE')'
Andoni Z<>rate.
===============================================================================
From: Evelyne Pinter <epinter@ptcs.ch>
I include a script for netscape to see the document with ghostview.
#!/bin/csh -f
setenv FILE $1
setenv NEWFILE ${FILE}.xantiword
/usr/local/bin/antiword -pa4 $FILE >&$NEWFILE
/usr/X11R6/bin/gv $NEWFILE
rm -f $NEWFILE
In netscape the application must be called like that
"/usr/local/bin/xantiword %s"
This is just a small change(done by Roger Luechinger) to the xantiword
you included in the distribution 0.31
Thanks
SG E.M.S.P.
===============================================================================

59
Docs/QandA Normal file
View File

@@ -0,0 +1,59 @@
Questions and Answers (RISC OS version)
=======================================
Q1: How do I install Antiword?
A1: Copy the application-directory and all the files within it to a
suitable directory.
Q2: How do I use Antiword?
A2: Double click on a Word document, filetype MSWord (&ae6). Or drag and drop
a file onto the Antiword icon on the iconbar.
Q3: How does Antiword deal with Word macro viruses?
A3: Antiword does not run any Word macros because it cannot do so.
Therefore your Archimedes will not be harmed by such a virus.
Q4: What does the 'Paragraph breaks' option do?
A4: This option controls the maximum number of characters per line in
paragraphs. If your screen is 640 pixels wide (like modes 20 and 27)
than 76 is probably best. If your screen is 800 or more pixels wide
(like mode 31) then numbers near 94 work best. You can switch this
option off if the (text only) output of Antiword will be the input to a
wordprocessor or a DTP program.
The pagebreak setting refers to the number of characters when you use
the system font. When you use an outline font only the width of that
number of characters in the system font is used.
Q5: What does the 'Auto filetype' option do?
A5: When auto filetype is allowed, Antiword will change the filetype of
Word documents to MSWord (&ae6)
Q6: When Antiword uses outline fonts it becomes terribly slow. What can I
do about this?
A6: When Antiword uses outline fonts it needs a large font cache. A small
font cache will make Antiword (very) slow. The larger the font cache the
better, but usually 160K or 256K will do.
Q7: What is the purpose of the file 'FontNames' in the Choices directory?
A7: This file provides a translation table from the font names found in a
Word document to the font names used by the RISC OS font-manager.
The file 'FontNames' is can be edited to match your font collection.
Some examples are provided in the Resources directory.
Q8: What is 'Hidden Text'?
A8: Hidden Text is Microsoft speak for text that may or may not be shown
on the screen, subject to the user's preferences, but such text is never
printed.
Q9: After upgrading to a new version of Antiword, I found that Antiword does
not put a new _updated_ version of FontNames in !Choices. Why not?
A9: The user can change the file Fontnames to reflect the fonts available
on a specific computer. Antiword cannot be permitted to overwrite changes
made by a user. So after upgrading you should remove or rename the old
FontNames file.
Q10: Why does Antiword freeze my computer while converting the Word document?
A10: This can happen when the Word document contains a very large image and
the image must be scaled to a much smaller size before displaying. The
delay occurs while RISC OS does the scaling, so there is not much
Antiword can do about it.

114
Docs/ReadMe Normal file
View File

@@ -0,0 +1,114 @@
___ _ _ _
/ _ \ | | (_) | |
| |_| |_ __ | |_ ___ _____ _ __ __| |
| _ | '_ \| __| \ \ /\ / / _ \| '__/ _` |
| | | | | | | |_| |\ V V / (_) | | | (_| |
|_| |_|_| |_|\__|_| \_/\_/ \___/|_| \__,_|
Antiword
========
Version 0.37 (21 Oct 2005)
--------------------------
Introduction
------------
Antiword is an application for displaying Microsoft(R) Word documents.
License
-------
This program is distributed under the GNU General Public License - see the
accompanying COPYING file for more details.
Problems
--------
Any bugs found should be reported to the author with full details of how to
get the problem to occur, but don't *expect* support for a product that you
have not paid for!
Please include Antiword's version number and version date, otherwise you
make it impossible for the author to help.
Thanks To
---------
Victor B. Wagner <vitus@agropc.msk.su> creator of "catdoc"
Duncan Simpson <word2x@duncan.telstar.net> creator of "word2x"
Martin Schwartz <schwartz@cs.tu-berlin.de> creator of "laola" and "elser"
Caolan McNamara <Caolan.McNamara@ul.ie> creator of "mswordview"
Andrew Scriven <andy.scriven@research.natpower.co.uk> creator of "OLEdecode"
Craig Southeren <geoffw@extro.ucc.oz.au> creator of "nenscript"
Thomas Merz <tm@muc.de> creator of "jpeg2ps"
Ulrich von Zadow <uzadow@cs.tu-berlin.de> creator of "paintlib"
Contributors
------------
ISO-8859-2 support by: Pawel Turnau <uzturnau@cyf-kr.edu.pl>
Character set mapping by: Dmitry Chernyak
<Dmitry.Chernyak@p998.f983.n5030.z2.fidonet.org>
UTF-8 support by: Karl Koehler <koehler@or.uni-bonn.de> and
Markus Kuhn <Markus.Kuhn@cl.cam.ac.uk>
PostScript Cyrillic by: Alexander Belyaev <isle@free.kursknet.ru>
Ports
-----
Antiword was ported to BeOS by Pete Goodeve <pete@jwgibbs.cchem.berkeley.edu>
Antiword was ported to OS/2 by Dave Yeo <dave_yeo@paralynx.com>
Antiword was ported to Mac OS X by Ronaldo Nascimento <ronaldo@ronaldo.com>
Antiword was ported to Amiga by Raffaele Pisapia <rafpis@libero.it>
Antiword was ported to VMS by Joseph Huber <huber@mppmu.mpg.de>
Antiword was ported to NetWare by Guenter Knauf <info@gknw.de>
Antiword was ported to EPOC by Max Tomin <tomin@samaramail.ru>
Antiword was ported to Zaurus PDA by Piotr Jachimczyk
<P.Jachimczyk@prioris.mini.pw.edu.pl>
Antiword was ported to DOS by myself ;-)
Yen-Ming Lee <leeym@freebsd.org> is the maintainer of the FreeBSD version of
Antiword.
Acknowledgements
----------------
Microsoft is a registered trademark and Windows is a trademark of Microsoft
Corporation.
UNIX is a registered trademark of the X/Open Company, Ltd.
Linux is a registered trademark of Linus Torvalds.
Postscript is a trademark of Adobe Systems Incorporated.
All other trademarks are acknowledged.
Future Versions
---------------
If you have any comments, bug reports or suggestions for future versions
don't hesitate to write to me.
New versions of the program will only be available if sufficient people
are using this program. So let me know!
Most recent version
-------------------
Most recent version of Antiword can be found on the author's website:
==>> http://www.winfield.demon.nl/index.html <<==
==>> http://antiword.cjb.net/ <<==
Author
------
The author can be reached by e-mail:
antiword@winfield.demon.nl
comments@antiword.cjb.net
But PLEASE read the FAQ before you write!!

158
Docs/antiword.1 Normal file
View File

@@ -0,0 +1,158 @@
.TH ANTIWORD 1 "Oct 29, 2005" "Antiword 0.37" "Linux User's Manual"
.SH NAME
antiword - show the text and images of MS Word documents
.SH SYNOPSIS
.B antiword
[
.I options
]
.I wordfiles
.SH DESCRIPTION
.I Antiword
is an application that displays the text and the images of Microsoft Word
documents.
.br
A wordfile named - stands for a Word document read from the standard input.
.br
Only documents made by MS Word version 2 and version 6 or later are supported.
.SH OPTIONS
.TP
.BI "\-a " papersize
Output in Adobe PDF form. Printable on paper of the specified size: 10x14,
a3, a4, a5, b4, b5, executive, folio, legal, letter, note, quarto, statement
or tabloid.
.TP
.B \-f
Output in formatted text form. That means that bold text is printed like
*bold*, italics like /italics/ and underlined text as _underlined_.
.TP
.B \-h
Give a help message.
.TP
.BI "\-i " "image level"
The image level determines how images will be shown.
.RS
.TP 3
0:
Use non-standard extensions from Ghostscript. This output may not print on
any PostScript printer, but is useful in case no hard copy is needed. It is
also useful when Ghostscript is used as a filter to print a PostScript file to
a non-PostScript printer.
.TP 3
1:
Show no images.
.TP 3
2:
PostScript level 2 compatible. (default)
.TP 3
3:
PostScript level 3 compatible. (EXPERIMENTAL, Portable Network Graphics (PNG)
images are not printed correctly)
.RE
.TP
.BI "\-m " "mapping file"
This file is used to map Unicode characters to your local character set.
The default mapping file depends on the locale.
.TP
.BI "\-p " papersize
Output in PostScript form. Printable on paper of the specified size: 10x14,
a3, a4, a5, b4, b5, executive, folio, legal, letter, note, quarto, statement
or tabloid.
.TP
.B \-r
Include text removed by the revisioning system.
.TP
.B \-s
Include text with the so-called "hidden text" attribute.
.TP
.B \-t
Output in text form. (default)
.TP
.BI "\-w " width
In text mode this is the line width in characters. A value of zero puts an
entire paragraph on a line, useful when the text is to used as input for
another wordprocessor. This value is ignored in PostScript mode.
.TP
.BI "\-x " "document type definition"
Output in XML form. Currently the only document type definition is db
(for DocBook).
.TP
.B \-L
In PostScript mode: use landscape mode.
.RE
.SH FILES
.TP
Mapping files like 8859-1.txt
.br
Antiword looks for its mapping files in three directories, in the order given:
.br
(1) The directory specified by $ANTIWORDHOME
.br
(2) The directory specified by $HOME/.antiword
.br
(3) Directory /usr/share/antiword
.TP
The fontnames file
.br
Antiword will look for its fontname file in the same directories as used for the
mapping files.
.br
The fontnames file contains the translation table from font names used by MS
Word to font names used by PostScript.
.TP
NOTE:
.br
Antiword cannot tell the difference between a file that does not exist and a
file that cannot be opened for reading.
.SH ENVIRONMENT
Antiword uses the environment variable ``ANTIWORDHOME'' as the first directory
to look for its files. Antiword uses the environment variable ``HOME'' to find
the user's home directory. When in text mode it uses the variable ``COLUMNS''
to set the width of the output (unless overridden by the -w option).
Antiword uses the environment variables ``LC_ALL'', ``LC_CTYPE'' and ``LANG''
(in that order) to get the current locale and uses this information to
select the default mapping file.
.SH BUGS
Antiword is far from complete. Many features are still missing. Many images are
not shown yet. Some of the images that are shown, are shown in the wrong place.
PostScript output is only available in ISO 8859-1 and ISO 8859-2.
.SH WEB SITES
The most recent released version of Antiword is always available from:
.br
http://www.winfield.demon.nl/index.html
.br
or try
.br
http://antiword.cjb.net/
.SH AUTHOR
Adri van Os <antiword@winfield.demon.nl>
.br
or try <comments@antiword.cjb.net>
.sp
R.F. Smith <rsmith@xs4all.nl> and
.br
Sindi Keesan <keesan@cyberspace.org>
.br
contributed to this manual page.
.SH LICENSE
Antiword is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the Free
Software Foundation; either version 2 of the License, or (at your option)
any later version.
This program is distributed in the hope that it will be useful but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
.SH ACKNOWLEDGEMENTS
Linux is a registered trademark of Linus Torvalds.
.br
Adobe, PDF and PostScript are trademarks of Adobe Systems Incorporated.
.br
Microsoft is a registered trademark and Windows is a trademark of Microsoft
Corporation.

146
Docs/antiword.man Normal file
View File

@@ -0,0 +1,146 @@
ANTIWORD(1) Linux User's Manual ANTIWORD(1)
NAME
antiword - show the text and images of MS Word documents
SYNOPSIS
antiword [ options ] wordfiles
DESCRIPTION
Antiword is an application that displays the text and the images of
Microsoft Word documents.
A wordfile named - stands for a Word document read from the standard
input.
Only documents made by MS Word version 2 and version 6 or later are
supported.
OPTIONS
-a papersize
Output in Adobe PDF form. Printable on paper of the specified
size: 10x14, a3, a4, a5, b4, b5, executive, folio, legal, let-
ter, note, quarto, statement or tabloid.
-f Output in formatted text form. That means that bold text is
printed like *bold*, italics like /italics/ and underlined text
as _underlined_.
-h Give a help message.
-i image level
The image level determines how images will be shown.
0: Use non-standard extensions from Ghostscript. This output may
not print on any PostScript printer, but is useful in case no
hard copy is needed. It is also useful when Ghostscript is
used as a filter to print a PostScript file to a non-
PostScript printer.
1: Show no images.
2: PostScript level 2 compatible. (default)
3: PostScript level 3 compatible. (EXPERIMENTAL, Portable Net-
work Graphics (PNG) images are not printed correctly)
-m mapping file
This file is used to map Unicode characters to your local char-
acter set. The default mapping file depends on the locale.
-p papersize
Output in PostScript form. Printable on paper of the specified
size: 10x14, a3, a4, a5, b4, b5, executive, folio, legal, let-
ter, note, quarto, statement or tabloid.
-r Include text removed by the revisioning system.
-s Include text with the so-called "hidden text" attribute.
-t Output in text form. (default)
-w width
In text mode this is the line width in characters. A value of
zero puts an entire paragraph on a line, useful when the text is
to used as input for another wordprocessor. This value is
ignored in PostScript mode.
-x document type definition
Output in XML form. Currently the only document type definition
is db (for DocBook).
-L In PostScript mode: use landscape mode.
FILES
Mapping files like 8859-1.txt
Antiword looks for its mapping files in three directories, in
the order given:
(1) The directory specified by $ANTIWORDHOME
(2) The directory specified by $HOME/.antiword
(3) Directory /usr/share/antiword
The fontnames file
Antiword will look for its fontname file in the same directories
as used for the mapping files.
The fontnames file contains the translation table from font
names used by MS Word to font names used by PostScript.
NOTE:
Antiword cannot tell the difference between a file that does not
exist and a file that cannot be opened for reading.
ENVIRONMENT
Antiword uses the environment variable ``ANTIWORDHOME'' as the first
directory to look for its files. Antiword uses the environment variable
``HOME'' to find the user's home directory. When in text mode it uses
the variable ``COLUMNS'' to set the width of the output (unless over-
ridden by the -w option).
Antiword uses the environment variables ``LC_ALL'', ``LC_CTYPE'' and
``LANG'' (in that order) to get the current locale and uses this infor-
mation to select the default mapping file.
BUGS
Antiword is far from complete. Many features are still missing. Many
images are not shown yet. Some of the images that are shown, are shown
in the wrong place. PostScript output is only available in ISO 8859-1
and ISO 8859-2.
WEB SITES
The most recent released version of Antiword is always available from:
http://www.winfield.demon.nl/index.html
or try
http://antiword.cjb.net/
AUTHOR
Adri van Os <antiword@winfield.demon.nl>
or try <comments@antiword.cjb.net>
R.F. Smith <rsmith@xs4all.nl> and
Sindi Keesan <keesan@cyberspace.org>
contributed to this manual page.
LICENSE
Antiword is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 2 of the License, or (at your
option) any later version.
This program is distributed in the hope that it will be useful but
WITHOUT ANY WARRANTY; without even the implied warranty of MER-
CHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
ACKNOWLEDGEMENTS
Linux is a registered trademark of Linus Torvalds.
Adobe, PDF and PostScript are trademarks of Adobe Systems Incorporated.
Microsoft is a registered trademark and Windows is a trademark of
Microsoft Corporation.
Antiword 0.37 Oct 29, 2005 ANTIWORD(1)

34
Docs/antiword.old.php Normal file
View File

@@ -0,0 +1,34 @@
From: Paul Southworth <pauls@etext.org>
Subject: antiword PHP script
Date: Thu, 24 Oct 2002 14:01:05 -0700 (PDT)
Please find attached a trivial example of using a web form to process an
uploaded Word doc to text using antiword. Perhaps other antiword users
would find it useful.
--Paul
<?
/* antiword.php
A PHP script to convert uploaded MS Word docs to text using antiword.
This script is public domain, no copyright.
September 11, 2002
Paul Southworth
*/
function print_form() {
?>
<html><head><title>antiword</title></head><body>
<form method=post action=antiword.php enctype="multipart/form-data">
<input name=upload type=file>
<input type=submit name=submit value=convert>
</form>
</body></html>
<?
}
if ($_FILES['upload']) {
header ("Content-type: text/plain");
system("/usr/local/bin/antiword " . $_FILES['upload']['tmp_name']);
} else {
print_form();
}
?>

141
Docs/antiword.php Normal file
View File

@@ -0,0 +1,141 @@
| <?php
/*
(C) 2005 Vidar L<>kken <vidarlo@vestdata.no>
V.3: I've added escapeshellcmd to all user input that shows up directly
in exec()
*/
switch ($_REQUEST['output']) {
case "PostScript":
$output=escapeshellcmd("-p $_REQUEST[paper]");
break;
case "PDF":
$output=escapeshellcmd("-a $_REQUEST[paper]");
$pdf=1;
break;
case "InLine":
$output="-t";
break;
}
if (isset($_FILES['userfile']['name'])) {
$uploaddir = '/tmp/';
$uploadfile = $uploaddir . $_FILES['userfile']['name'];
$userfile = $_FILES['userfile']['name'];
if (move_uploaded_file($_FILES['userfile']['tmp_name'],$uploadfile)) {
$delims=".";
if (strstr($output,"-p")) {
$psfile=strtok($userfile,$delims).".ps";
header("Content-Type: Application/PostScript");
header("Content-Disposition: attachment; filename=".$psfile);
$file=escapeshellcmd($uploadfile);
$command="antiword $output $file";
passthru($command);
unlink($uploadfile);
} elseif (strstr($output,"-a")) {
$psfile=strtok($userfile,$delims).".pdf";
header("Content-Type: Application/PDF");
// header("Content-Disposition: attachment; filename=".$psfile);
// $command="antiword $output $uploadfile";
$file=escapeshellcmd($uploadfile);
$command="antiword $output $file";
passthru($command);
unlink($uploadfile);
} else {
echo "<pre>";
$file=escapeshellcmd($uploadfile);
$command="antiword $output $file";
// echo $command;
// $command="antiword $output $uploadfile";
passthru($command);
unlink($uploadfile);
}
}
elseif (isset($_REQUEST['url'])) {
echo $command;
$url=$_REQUEST['url'];
$uri=escapeshellcmd($_REQUEST['url']);
$delim="/";
$docfile=explode($delim,$uri);
exec("wget -O /tmp/$docfile $url");
if (strstr($output,"-p")) {
$psfile=strtok(end($docfile),".").".ps";
$safe=escapeshellcmd($docfile);
$command="antiword $output /tmp/$safe";
header("Content-Type: Application/PostScript");
header("Content-Disposition: attachment; filename=".$psfile);
passthru($command);
@@ unlink("/tmp/$docfile");
} elseif (strstr($output,"-a")) {
$psfile=strtok(end($docfile),".").".pdf";
$safe=escapeshellcmd($docfile);
$command="antiword $output /tmp/$safe";
header("Content-Type: Application/PDF");
header("Content-Disposition: attachment; filename=".$psfile);
passthru($command);
@@ unlink("/tmp/$docfile");
} else {
echo "<pre>";
$safe=escapeshellcmd($docfile);
$command="antiword $output /tmp/$safe";
passthru($command);
@@ unlink("/tmp/$docfile");
}
}
}
if (!isset($_FILES['userfile']['name'])) {
?>
<p>
This script converts a word file (most versions supported) into a
pure ASCII, a PDF or a PostScript version. Currently, only PostScript
and PDF carry images, and those images might be distorted or such. It's
based on the nice program antiword. see <a
href=http://antiword.cjb.net>antiword.cjb.net</a> for more information
about antiword. Currently, max file size is 3MiB for the upload. This
should be enough!
</p><p>Currently, I tend to end up with the ascii version being 1/100th
of the word document, and the pdf/ps versions being 1/10th of the size.
So if you're gonna send me a word document, rethink that. I'll not read
it. I'll read ascii, and probably pdf/ps too.</p>
</p>
<form enctype="multipart/form-data" action="antiword.php" method="post">
<input type="hidden" name="MAX_FILE_SIZE" value="30000" />
URL:<br /><input type="text" name="url" size=50 /><br />
Send this file:<br /> <input name="userfile" type="file"/>
<br />Output: <br />
<SELECT name="output">
<OPTGROUP>
<OPTION name=txt>InLine</OPTION>
<OPTION name=ps>PostScript</OPTION>
<OPTION name=PDF>PDF</OPTION>
</OPTGROUP>
</SELECT>
Papersize: <SELECT name="paper"/>
<OPTGROUP>
<OPTION>a4</OPTION>
<OPTION>a3</OPTION>
<OPTION>a5</OPTION>
<OPTION>b4</OPTION>
<OPTION>b5</OPTION>
<OPTION>10x14</OPTION>
<OPTION>executive</OPTION>
<OPTION>folio</OPTION>
<OPTION>legal</OPTION>
<OPTION>letter</OPTION>
<OPTION>note</OPTION>
<OPTION>quarto</OPTION>
<OPTION>statement</OPTION>
<OPTION>tabloid</OPTION>
</select>
<br />
<input type="submit" value="Send File" />
</form>
<p>This is running <a href="http://antiword.cjb.net">antiword</a> 0.36. <br>
Please drop me a note at antiword (at) bitsex.net if you have
comments for this.
<hr>
<font size=-1>(C)Vidar L&oslash;kken 2005</font>
<!-- Version: 0.2 as of 19. oct. 2005 -->
<?php
}
?>
|

BIN
Docs/testdoc.doc Normal file

Binary file not shown.