Collect Page Numbers
PowerGREP deals with plain text files. Plain text files consist of unformatted text, so there's no real concept of a page. Still, plain text files can contain page breaks represented by ASCII character 12 decimal. Some text editors, such as EditPad Pro and PowerGREP's built-in editor, allow page breaks to be inserted by pressing Ctrl+Enter and show them as horizontal lines.
PowerGREP's built-in decoder that converts PDF files into plain text (so PowerGREP can search through them) also inserts page breaks that match the page transitions in the original PDF. You can make PowerGREP search for these page breaks to determine the page numbers. In this example we'll do this to get search results that indicate on which page each search match was found. We'll use the "file sectioning" feature to split the file into one section per page. The main search then processed the PDF one page at a time, with the section number being the page number.
- Select the PDF files you want to search through in the File Selector.
- Start with a fresh action.
- Set the action type to "collect data".
- Set "file sectioning" to "split along delimiters".
- To use each page break as the delimiter to divide the file into sections (pages), we need to set the search term for the file sectioning to a page break. There are two ways to do this. Choose whichever way you find more comfortable.
- Set the "search type" to "literal text". Click on the "section search" box and then press Ctrl+Enter. A horizontal line representing the page break appears.
- Set the "search type" to "regular expression" and type in the regex \x0C into the "section search" box. This regular expression amtches ASCII character 12 which is the page break character.
- Specify your search term(s) in the main part of the action.
- In the collect box, use the match placeholder %SECTIONN% as a placeholder for the page number. E.g. %MATCH% on page %SECTIONN% collects found me on page 7 when the main part of the action finds "found me" in the 7th section (page).
- Click the Search button to run the search.
You can find this action in the PowerGREP.pgl standard library as "Collect page numbers".