Replacing Named XML Entities

PowerGREP’s ability to search and replace using a delimited list of search terms makes it very easy to search-and-replace all reserved XML character with their named XML entities. Simply set the search type to “delimited literal text”, set the extra item delimiter to a line break, the extra pair delimiter to an equals sign, and paste in the following search text:

&=&
<=&lt;
>=&gt;
'=&apos;
"=&quot;

When extracting text from an XML file, you can easily turn things around to replace the named XML entities with the characters they represent:

&amp;=&
&lt;=<
&gt;=>
&apos;='
&quot;="

Collect XML Data with Entities Replaced

PowerGREP’s extra processing feature makes it very straightforward to collect text from an XML file with all entities replaced with their corresponding characters.

  1. Select the files you want to search through in the File Selector.
  2. Start with a fresh action.
  3. Set the action type to “collect data”. Leave the search type as “regular expression”.
  4. In the search box, enter the regular expression that matches the XML data you want to extract. E.g. <tag[^>]+>([^<>]+)</tag> matches any text (but no XML) between <tag> and </tag>.
  5. Type \1 in the Collect box. This will collect just the text between the tags matched by our regular expression.
  6. Tick the extra processing checkbox. An additional set of controls for entering search terms appears.
  7. Set the extra processing search type to “delimited literal text”.
  8. Leave the “extra item delimiter” field set to “Line break”. Type a single equals sign in the “extra pair delimiter” field.
  9. Copy the second list of search-and-replace pairs in the first section of this help topic. Paste it into the “extra processing search” box in PowerGREP.
  10. Set the target and backup file options as you like them.
  11. Click the Preview button to run a test.
  12. If all looks well, click the Collect button to actually collect the text.

PowerGREP will now collect all the text between <tag> and </tag> tags in your XML files. If any of the text contains named entities, they will be replaced before the text is collected. The replacements are only made to the text being collected. They’re not made to the original XML files.

You can find this action in the PowerGREP5.pgl standard library as “XML: Collect search matches with named entities replaced”.

The example “replace reserved characters in XML files” in the PowerGREP library shows how you might use the “extra processing” feature for doing the opposite: replacing reserved characters with entities.