Collect a List of Header and Item Pairs

This example illustrates how you can use file sectioning to extract items from sections. It also shows how named capturing groups carry over regex matches from the file sectioning to the main part of the action. This makes it easy to collect both part of the section (e.g. its header), and part of the item, for each item found in each section.

Windows applications often store their settings in .ini files. Such files consist of one or more headers, with one or more name and value pairs.

[Header1]
Name1=Value1
Name2=Value2
[Header2]
Name3=Value3
Name4=Value4
Name5=Value5
; etc...

With PowerGREP, you can easily extract a list of header and item pairs from such a list. E.g. let's produce the following list from the above:

Header1/Name1
Header1/Name2
Header2/Name3
Header2/Name4
Header2/Name5

To do this, we need two regular expressions. One to get the headers, and another to get the items for each header. This impossible with most grep tools, since they only allow you to use one regular expression. PowerGREP's file sectioning feature makes this task very straightforward.

You can find this action in the PowerGREP5.pgl library as "Collect header/item pairs from .ini files".

  1. Select the files you want to search through in the File Selector.
  2. Start with a fresh action.
  3. Set the action type to "collect data".
  4. Select "search for sections" from the "file sectioning" list. Leave the section search type as "regular expression".
  5. In the Section Search box, enter the regular expression ^\s*\[(?'header'[^]\r\n]+)](?:\r\n\s*+[^[].*+)+ and make sure to leave "dot matches newlines" off. This regex matches a header with ^\s*\[(?'header'[^]\r\n]+)] and everything that follows it up to the next header with (?:\r\n\s*+[^[].*+)+. It contains one named capturing group 'header'.
  6. In the Search box in the main part of the action, enter the regular expression ^([^=;\r\n]+)=.*$ and make sure to leave "dot matches newlines" off. This regex matches a single name=value pair, and captures the name into the first backreference.
  7. In the Collect box, enter ${header}/\1 to collect the name of the header (named capturing group carried over from the file sectioning) and the name of the value (first backreference), delimited by a forward slash.
  8. Click the Preview button to see the results.

When PowerGREP executes this action, the following happens for each file:

  1. The sectioning regex matches a section in the .ini file, e.g. [Header1]\r\nName1=Value1\r\nName2=Value2. The section's header Header1 is stored in the named group "header".
  2. The main action now searches through this section, and matches a name=value pair, e.g. Name1=Value1
  3. The main action substitutes backreferences in the text to be collected for this search match, e.g. Header1/Name1. The result is added to the results.
  4. The main action repeats steps 2 and 3 until all name=value pairs in the current section have been found.
  5. PowerGREP repeats steps 1 through 4 for all sections in the .ini file.

You can easily adapt the techniques shown in this example for your own purposes.

  1. Create a regular expression that matches all sections in the file you're interested in.
  2. Add named capturing groups to the regex for each part of the section (headers, footers, etc.) you want to collect for all items.
  3. Create a second regular expression that matches each item in those sections. This regular expression will only "see" one section at a time. You don't need to worry about this regex matching any part of the file outside the sections matched by the first regex.
  4. Add named or numbered capturing groups to the second regex for each part of the item you want to collect.
  5. Compose the text to be collected using backreferences to the groups you added in steps 2 and 4.