Compile Indices of Files

By using path placeholders in a collect data action, you can easily index files with PowerGREP. Let’s say you have a large number of HTML files saved into a particular folder. Now you want to compile a single index of those files.

  1. Select the files you want to index in the File Selector.
  2. Set the action type to “collect data”.
  3. Turn on “group results for all files” and “group identical matches”. Since each file has only one TITLE tag, and we include the name of the file in the text to be collected, each text to be collected will be different. This means “group identical matches” won’t really group anything, but it does allow matches to be sorted alphabetically.
  4. In the Search box, enter the regular expression <TITLE>(.*?)</TITLE> and make sure to leave “case sensitive search” off. This regex will match an HTML title tag, and store its contents into the first backreference.
  5. In the Collect box, enter <P><A HREF="%FILENAME%">\1</A></P> The path placeholder %FILENAME% will be replaced with the name of the file in which the HTML title tag was found, and \1 will be replaced with the contents of the title tag.
  6. Select to sort collected matches alphabetically, and set the minimum number of occurrences to one.
  7. Select “save results into a single file” in the target file creation list.
  8. Click the ellipsis (...) button next to “target file location”, and select the name of the file you want to save your HTML index into.
  9. Leave “between collected text” set to “line break” so each index entry we collect appears on its own line.
  10. Turn on the “collect headers and footers” checkbox.
  11. In the list that appears, click on “target file header”. In the edit box next to that list, paste:
    <html><head><title>HTML Index</title></head>
    <body><h1>HTML Index</h1>
  12. Select “target file footer” in the list and type in </body></html>. These two steps make sure we collect a valid HTML file.
  13. Click the Collect button to run the action.

This action is available in the PowerGREP5.pgl library as “Indexing HTML files”.

How much information you can include in the index is up to your imagination. The above example is very minimal, to make it easy to understand. If you also want to include the first paragraph in each HTML file, you could search for:

<TITLE>(.*?)</TITLE>.*?<P[^>]+>(.*?)</P>

and collect:

<P><A HREF="%FILENAME%">\1</A></P>
<UL>\2</UL>

This action is available in the PowerGREP5.pgl library as “Indexing HTML files with first paragraph”.