1. NAME▲
pdftotext - Portable Document Format (PDF) to text converter (version 3.00)
2. SYNOPSIS ▲
pdftotext [options] [ PDF-file [ text-file ]]
3. DESCRIPTION ▲
Pdftotext converts Portable Document Format (PDF) files to plain text.
Pdftotext reads the PDF file, PDF-file , and writes a text file, text-file . If text-fileis not specified, pdftotext converts file.pdfto file.txt . If text-fileis \'-', the text is sent to stdout.
4. OPTIONS ▲
- -f number
Specifies the first page to convert. - -l number
Specifies the last page to convert. - -r number
Specifies the resolution, in DPI. The default is 72 DPI. - -x number
Specifies the x-coordinate of the crop area top left corner - -y number
Specifies the y-coordinate of the crop area top left corner - -W number
Specifies the width of crop area in pixels (default is 0) - -H number
Specifies the height of crop area in pixels (default is 0)
-layout
Maintain (as best as possible) the original physical layout of the text. The default is to \'undo' physical layout (columns, hyphenation, etc.) and output the text in reading order.
-raw
Keep the text in content stream order. This is a hack which often "undoes" column formatting, etc. Use of raw mode is no longer recommended.
-htmlmeta
Generate a simple HTML file, including the meta information. This simply wraps the text in <pre> and </pre> and prepends the meta headers.
- -enc encoding-name
Sets the encoding to use for text output. This defaults to "UTF-8".
-listenc
Lits the available encodings
- -eol unix | dos | mac
Sets the end-of-line convention to use for text output.
-nopgbrk
Don't insert page breaks (form feed characters) between pages.
- -opw password
Specify the owner password for the PDF file. Providing this will bypass all security restrictions. - -upw password
Specify the user password for the PDF file.
-q
Don't print any messages or errors.
-v
Print copyright and version information.
-h
Print usage information. ( -help and
--help are equivalent.)
5. BUGS ▲
Some PDF files contain fonts whose encodings have been mangled beyond recognition. There is no way (short of OCR) to extract text from these files.
6. EXIT CODES ▲
The Xpdf tools use the following exit codes:
- 0
No error. - 1
Error opening a PDF file. - 2
Error opening an output file. - 3
Error related to PDF permissions. - 99
Other error.
7. AUTHOR ▲
The pdftotext software and documentation are copyright 1996-2004 Glyph & Cog, LLC.
8. SEE ALSO ▲
R pdftops (1),
R pdfinfo (1),
R pdffonts (1),
R pdftoppm (1),
R pdfimages (1),