With PowerGREP, you can quickly search for a piece of information through files and folders on your computer, including documents stored in the Adobe Acrobat PDF file format commonly used to distribute documents. Simply type in a keyword or phrase in the Search box, select which folder PowerGREP should go through and which types of files are of interest. When you click the search button, PowerGREP will present you with a list of PDF and other files in which the text you entered in the search box was found. The list will show one line of context for each match. You can instantly inspect the entire context by double-clicking on the match in the results.
When you do not know in advance exactly what you are looking for, PowerGREP’s rich regular expression support allows you to search for virtually anything by specifying the form of what you want, and let PowerGREP find the actual text matching that form.
With PowerGREP’s collect data feature, you can extract data from PDF and other files and automatically save the extracted data into one or more new text files. You can group identical matches together and count them, producing informative statistics.
How PowerGREP Handles PDF Files
A PDF file stores a complete document using a binary file format developed by Adobe. PowerGREP internally converts the PDF file to a textual representation, ignoring text formatting and images, and search through the plain text representation of the PDF file. When inspecting search matches inside PowerGREP, the built-in file editor shows the plain text version that PowerGREP actually searched through. You can always use the Edit button’s drop-down menu open the PDF file in Adobe Reader to view the PDF file in its original form.
PowerGREP can search through PDF files that show images of scanned pages in Adobe Reader if (and only if) Adobe Reader can search through those PDF files. This will be the case if scanning software that created the PDF used an OCR function to convert the scanned pages to text and store that text in the PDF along with the image of the scanned page.
PowerGREP’s plain text conversion can mimic the page layout of the PDF file or it can show the text in reading order. You can choose this by editing the file format configuration in PowerGREP. The benefit of mimicking the page layout is that the text you see in PowerGREP will have the same layout as the text you see in Adobe Reader. The benefit of keeping the reading order is that searching for phrases is easier when they aren’t broken up by page formatting. Particularly text that is displayed in columns is easier to work with when converted in reading order.
Editing PDF files is not possible. The PDF format is primarily intended to distribute final documents.
|Searching through PDF files with PowerGREP|
“If the only thing I ever did with PowerGREP was grep help PDFs and CHMs, I’d find my PowerGREP license worth the cost. I’ll put a folder full of hardlinks to every local resource I have and GREP it for answers—wonderful tool. PowerGREP makes me feel like I can do an SQL query on my personal, unorganized data, wherever it is, and actually find what I need.”
— Clay Cundick
31 December 2016, Utah, USA