You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+4-8Lines changed: 4 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -35,7 +35,6 @@ This tool is mostly for examining/working with a PDF's data and logical structur
35
35
36
36
If you suspect you are dealing with a malcious PDF you can safely run `pdfalyze` on it; embedded javascript etc. will not be executed. If you want to actually look at the contents of a suspect PDF you can use [`dangerzone`](https://dangerzone.rocks/) to sanitize the contents with extreme prejudice before opening it.
@@ -81,20 +79,17 @@ If you provide none of the flags in the `ANALYSIS SELECTION` section of the `--h
81
79
82
80
The `--streams` output is the one used to hunt for patterns in the embedded bytes and can be _extremely_ verbose depending on the `--quote-char` options chosen (or not chosen) and contents of the PDF. [The Yaralyzer](https://github.com/michelcrypt4d4mus/yaralyzer) handles this task; if you want to hunt for patterns in the bytes other than bytes surrounded by backticks/frontslashes/brackets/quotes/etc. you may want to use The Yaralyzer directly. As The Yaralyzer is a prequisite for The Pdfalyzer you may already have the `yaralyze` command installed and available.
83
81
84
-
### Setting Command Line Options Permanently With A `.pdfalyzer` File
82
+
####Setting Command Line Options Permanently With A `.pdfalyzer` File
85
83
When you run `pdfalyze` on some PDF the tool will check for a file called `.pdfalyzer` in these places in this order:
86
84
87
85
1. the current directory
88
86
2. the user's home directory
89
87
90
88
If it finds a `.pdfalyzer` file in either such place it will load configuration options from it. Documentation on the options that can be configured with these files lives in [`.pdfalyzer.example`](.pdfalyzer.example) which doubles as an example file you can copy into place and edit to your needs. Handy if you find yourself typing the same command line options over and over again.
91
89
92
-
### Environment Variables
90
+
####Environment Variables
93
91
Even if you don't configure your own `.pdfalyzer` file you may still glean some insight from reading the descriptions of the various variables in [`.pdfalyzer.example`](.pdfalyzer.example); there's a little more exposition there than in the output of `pdfalyze -h`.
94
92
95
-
### Colors And Themes
96
-
Run `pdfalyzer_show_color_theme` to see the color theme employed.
97
-
98
93
### Guarantees
99
94
Warnings will be printed if any PDF object ID between 1 and the `/Size` reported by the PDF itself could not be successfully placed in the tree. If you do not get any warnings then all[^2] of the inner PDF objects should be seen in the output.
100
95
@@ -108,6 +103,7 @@ The Pdfalyzer comes with a few command line tools for doing stuff with PDFs:
108
103
*`combine_pdfs` - Combines multiple PDFs into a single PDF. Run `combine_pdfs --help` for more info.
109
104
*`extract_pdf_pages` - Extracts page ranges (e.g. "10-25") from a PDF and writes them to a new PDF. Run `extract_pdf_pages --help` for more info.
110
105
*`extract_pdf_text` - Extracts text from a PDF, including applying OCR to all embedded images. Run `extract_pdf_text --help` for more info.
106
+
*`pdfalyzer_show_color_theme` - Run to see the color theme employed in Pdfalyzer's output.
111
107
112
108
Running `extract_pdf_text` requires that you install The Pdfalyzer's optional dependencies:
For info about setting up a dev environment see [Contributing](#contributing) below.
121
117
122
118
At its core The Pdfalyzer is taking PDF internal objects gathered by [PyPDF](https://github.com/py-pdf/pypdf) and wrapping them in [AnyTree](https://github.com/c0fec0de/anytree)'s `NodeMixin` class. Given that things like searching the tree or accessing internal PDF properties will be done through those packages' code it may be helpful to review their documentation.
0 commit comments