Action Parts and Named Capture

If you have some experience with regular expressions, you’ve certainly come across or even created regular expressions that use capturing groups and backreferences. The regular expression (one)(two)(three) matches the text onetwothree. If we pair the replacement text \3\2\1 with this regular expression then the actual replacement becomes threetwoone. A more useful example might be the regular expression \b(\d\d)/(\d\d)/(\d\d\d\d)\b to match a date in dd/mm/yyyy or mm/dd/yyyy format an the replacement text \2/\1/\3 to flip the day and month numbers.

Many modern regular expression flavors, included the one used in PowerGREP, also support named capturing groups. The only difference between a named capturing group and a traditional numbered one is that you can use a chosen name to reference the group instead of a number that requires you to count how many groups there are in your regular expression. It simply makes your regular expression easier to read and to maintain. The date regex could be written as \b(?'day'\d\d)/(?'month'\d\d)/(?'year'\d\d\d\d)\b and the replacement text as ${month}/${day}/${year}.

PowerGREP takes named capturing groups a step further. Normally, capturing groups can only be referenced by a single regular expression and replacement text. In PowerGREP, named capturing groups are shared between all the regular expressions on the Action panel. Text captured by a named capturing group is preserved until PowerGREP either attempts to match the same regular expression again or PowerGREP attempts to match another regular expression that defines the same capturing group or PowerGREP proceeds with the next file. As long as a capturing group is preserved it can be referenced by backreferences in any other regular expression or replacement text.

PowerGREP uses the regular expressions from the various parts of the Action panel in this order:

The “filter files” regular expressions are attempted once to check if they can be matched or not.
PowerGREP finds the first match of the “file sectioning” regular expressions. If you don’t use file sectioning, the remainder of this list is executed only once using the whole file as a single section.
PowerGREP finds the first match of the regexes in the main part of the action restricting its search to the section found in step 2.
PowerGREP builds the replacement text or text to be collected for the search match found in step 3.
If “extra processing” is used, PowerGREP runs a search-and-replace through the replacement text from step 4.
If PowerGREP needs to collect context it does so by applying the context regular expressions as many times as needed, starting from the start of the file or where it last stopped looking for context.
If step 3 found a match before the end of the section, PowerGREP goes back to step 3 to search through the remainder of the section.
If step 2 found a section before the end of the file, PowerGREP goes back to step 2 to find the next section.

If any of these action parts use a list of regular expressions, the “non-overlapping search” option comes into play. If this option is on, the list of regular expressions for that action part is treated as a single regular expression. Thus each match attempt of that part of the action clears all the named capturing groups defined in any of those regular expressions. If “non-overlapping search” is off then only one regular expression is attempted at a time. Each regex only clears its own named capturing groups. PowerGREP starts with the first regex in the list. It only proceeds with the next one after the previous one cannot find any more matches. This is why you need to turn off “non-overlapping search” when using the “filter files” feature to grab multiple parts of the file to be reused in the remainder of the action.

Examples: Insert proper HTML title tags, Rename files based on HTML title tags, Collect a list of header and item pairs and Make sections and their contents consistent