File Format Configuration

The file format configuration tells PowerGREP which files it should convert to plain text prior to searching through them. It can also tell PowerGREP to exclude files. If you select a configuration that exclude files, those files immediately lose their gray tick marks in the folders and files tree.

These options obviously only affect hose proprietary formats and compound formats that PowerGREP knows about. Files in proprietary formats that PowerGREP cannot convert to plain text are treated as raw binary files.

Examples: Search through Microsoft Word documents, Search and replace through Microsoft Word documents, Search through PDF files, Search through XPS and OXPS files, Search through OpenOffice Writer documents, Search through OpenDocument Format files, Search through spreadsheets, Search and Edit Audio File Meta Data and Search and Edit EXIF and IPTC Image Meta Data

Editing File Format Configurations

File Format Configuration

Click the (...) button next to the "file formats to convert to plain text" drop-down list on the File Selector panel to edit the file format configurations, or just to see their details.

The list on the left shows the available file format configurations. Select one to see its settings or edit it. You can edit all configurations. You can even delete all the configurations. If you delete them all and do not add your own, PowerGREP restores the configurations that were predefined when you first installed PowerGREP.

If you edit a configuration presently selected on the File Selector panel, those changes take effect immediately. But editing configurations does not change the behavior of previously saved file selections. When you save a file selection, it stores the full details of the selected configurations. When you load a file selection, it continues to use the configuration you saved it with. If you edited that configuration between the time you saved and loaded the file selection, then the configuration loaded with the file selection is indicated with a number such as (2) to indicate its details are different from the configuration with the same name in the Preferences. If you want the loaded file selection to use the edited configuration, then you need need to select the edited configuration (without the number in parenthesis) on the File Selector panel after loading the exiting file selection. If you click the (...) button, both the edited configuration and the loaded configuration are shown in the dialog.

Each configuration has a name that identifies it on the File Selector panel. You can also add comments to explain in which situations you want to use this configuration.

The list on the right shows all the file formats that are configured in the selected configuration. File formats that PowerGREP has built-in support for cannot be moved or deleted. You can completely disable them though, as explained below. The predefined "(unused)" configuration disables them all.

You can add new file formats to the list. This is only useful if you have an external converter or an IFilter that can convert the new file format to plain text. New file formats should be given a name so you can easily identify their settings when editing the file configuration. Adding a file format only adds it to the configuration you're editing.

To enable a file format, you need to specify one or more file masks that match the files in this format. You can use the full syntax for traditional file masks as explained in the help topic about the File Selector panel. File masks are applied to the full file name, not just the extension. You can even use backslashes in file masks if you want the file mask to be applied to the full path of the files instead of just their names.

PowerGREP can use external applications to convert files to plain text. These need to be command line applications so that no windows pop up when PowerGREP invokes them. Tick the checkbox to use any external application. In the edit box below, enter the full path to the application's executable. Enclose the path with double quotes if it contains spaces. You can use %APPPATH% as a placeholder to PowerGREP's installation folder if you put the external application is in the same folder. Enter any command line arguments the application need after the path to its executable. You can use "%INFILE%" as a placeholder to the full path of the file to be converted. You can use "%OUTFILE%" as a placeholder to the full path of the plain text file that the application should create. Include the quotes around "%INFILE%" and "%OUTFILE%" on the command line. The paths may contain spaces and most command line applications need quotes around paths with spaces.

If the application reads the file to be converted from standard input, tick the option "send original file to standard input". If it writes the plain text conversion to standard output, tick the option "receive converted file from standard output".

You will also need to specify the encoding that the application uses when writing its plain text conversion. This makes sure PowerGREP uses the correct encoding when reading the plain text conversion.

If you are developing this external converter yourself, you should make it terminate with a non-zero exit code to signal error conditions. Terminate with exit code 1 to indicate the file could not be opened. Terminate with exit code 2 to indicate the file is not in the file format that your converter supports. Terminate with exit code 3 if the file is protected with a password. Terminate with exit code 4 to indicate failure without a specific reason. PowerGREP then adds an error for the file that could not be converted to the search results. Without an exit code, PowerGREP cannot distinguish between a file that contains no text and a file that could not be converted. It will accept the empty conversion as a proper conversion.

Example: Search through UOT files

The IFilter system is a DLL-based system used by Windows Search. Developers of software that uses proprietary file formats can include an IFilter DLL with their software to enable Windows Search to search through files in that format. Because Windows Search can only search, this system is read-only. Because Windows Search only displays a list of matching files, rather than a list of search matches with context like PowerGREP does, the plain text conversion produced by an IFilter have all text strung together rather than mimicking a page layout. You should only enable the IFilter option if you know you have a reliable IFilter installed for the selected file format. IFilter DLLs are registered by file extension, so the option will only work correctly for files with extensions for which an IFilter is actually registered.

PowerGREP's built-in decoders are truly built-in. They do not require any software to be installed, other than PowerGREP itself. Most of these converters are read-only, allowing you to search through the files, but not make replacements. Some converters are read-write. These allow you to search and replace through the plain text conversions and have the replacements written back to the file in its original format.

Examples: Search through Microsoft Word documents, Search and replace through Microsoft Word documents, Search through PDF files, Search through XPS and OXPS files, Search through OpenOffice Writer documents, Search through spreadsheets and Search through mailboxes and email messages

Depending on the replacements you've made, it may not always be possible to correctly write them back to the original format. This mostly happens when you're doing something that's not sensible for the file format you're working with. For example, the plain text conversion of audio and image files consists of labels and values like "Subject: My First Photo". Replacements you make in the value, such as replacing "First" with "1st" in this example, will always be written back correctly. But you can't make arbitrary replacements in the labels. If you were to replace "Subject" with "Topic", PowerGREP doesn't know how to deal with that, because "Topic" is not one of the labels it uses for image metadata.

PowerGREP can deal with replacements it can't convert back into the original format in two ways. The default way is to not save any replacements to the file, adding an error message for that file to the results instead. The other way is to force the file to be saved, which you can choose by ticking the "force files to be saved" checkbox. PowerGREP then makes a best effort to save the replacements, ignoring any replacements that it can't save.

Examples: Search and Edit Audio File Meta Data and Search and Edit EXIF and IPTC Image Meta Data

For PDF files, there is an extra option. The plain text conversion of PDF files can mimic the page layout, or it can show the text in reading order. The difference is most obvious with PDF files that have text in columns. When mimicing the page layout, the plain text conversion also has the text in two columns. In reading order, the plain text conversion puts all the text of the first column before all the text of the second column. Reading order makes it easier to search for text that may span across multiple lines in one column.

Example: Search through PDF files

If the file contains (some) human-readable content even without conversion, then you can enable the option to search through the file's raw contents. If you turn on this option and turn off all others, then you're effectively telling PowerGREP to treat files in this format like plain text files or raw binary files that do not need conversion. This can be useful for file formats like HTML or RTF that PowerGREP can convert to plain text, but that are also searchable in their original form if you're familiar with their structure.

Examples: Search and Edit Audio File Meta Data and Search and Edit EXIF and IPTC Image Meta Data

You can turn on several of the above options at the same time. Any combination is permitted. PowerGREP will try the conversions from top to bottom and will use the first one that succeeds. The "raw contents" option always succeeds. If you don't select that one and all the other selected ones fail, PowerGREP skips the file and adds an error message to the results.

Some file formats can be treated as compound documents. This includes file formats that are technically ZIP files as well as file formats for email messages that may contain attachments. This option cannot be used in combination with the other four options. Compound documents are treated as files when using the settings on the File Selector panel to determine whether the file should be searched through or not. If the file needs to be searched through, PowerGREP searches through all its constituent files. For email formats, it searches through both the email body and all attachments. Compound documents can be expanded in the folders and files tree on the File Selector panel to show and open the constituent files.

Examples: Search through Microsoft Word documents, Search and replace through Microsoft Word documents, Search through OpenDocument Format files and Search through mailboxes and email messages

Finally, the option to always exclude files overrides any other settings on the File Selector panel that may try to include files in this format. Such files never get gray tick marks in the File Selector. Directly marking such a file with a green tick results in an error when attempting to execute the action. This is useful for file formats for which PowerGREP only has read-only converters in configurations that you intend to use with search-and-replace actions. This makes PowerGREP skip files that it can't make replacements in.

Predefined File Format Configurations

The table below shows the details of the file format configurations that are predefined in the default preferences when you first install PowerGREP. The "(unused)" configuration is not shown. This configuration has no file masks for any of the file formats. This means it does not recognize any files as being in any of the file formats that PowerGREP can convert. The "(unused)" configuration is useful for actions like "file and folder name search" that do not search through the contents of files. Since those actions do not convert files anyway, you can select "(unused)" to treat files equally, regardless of their file format. All the other configurations use the same file masks. They only differ in the chosen conversion options.

"Skip", with a red background, means that files matching these file masks are never searched through when using that file format configuration. This overrides any other settings on the File Selector panel that may try to include those files. Directly marking such a file with a green tick results in an error when attempting to execute the action.

"Compound" means that the files will be treated as a compound document. A green background means that you can search and replace through the files inside the compound document. A white background means that you can search but not replace.

"Built-in" means that the files will be converted to plain text using PowerGREP's built-in converter for their format. A green background means that you can search and replace through the plain text conversion and have the replacements written back to the file in its original file format. A white background means that you can only search through the plain text conversion.

"Raw", with a green background, means that the files will be searched through without conversion.

"IFilter", with a white background, means that the files will be converted to plain text using an IFilter. The IFilter system is read-only, so you can only search through the plain text conversion.

Actions that modify files, like search-and-replace, only work with file format configurations that use options with red or green backgrounds for all file formats. If you try to run a search-and-replace with a configuration that uses a read-only converter (indicated with a white background) for at least one file format, then you will get an error telling you to select a different file format configuration. You will get this error even if the search does not actually include any files in the format for which you enabled a read-only converter. The error is triggered before PowerGREP gathers the list of files you want to search-and-replace through.


File Format PDF DOC DOCX ODT WRI WP XPS XLS XLSX Quattro Lotus Audio WMA Exif RTF HTML MHT EML Outlook Corel AceText Shell link Zipped CHM
File Masks *.pdf *.do[ct] *.do[ct][xm] *.odt
*.sxw
*.wri *.wp
*.wp[d456]
*.xps
*.oxps
*.xls *.xls[xm] *.wq[12]
*.wb[123]
*.qpw
*.wkq
*.wk[s134] *.mp[1234ag]
*.m4[ab]
*.flac
*.fl[ac]
*.wav
*.wave
*.wv
*.alac
*.aif
*.aif[fc]
*.afc
*.og[ag]
*.ape
*.mpc
*.ofr
*.opus
*.dsf
*.rf64
*.bwf
*.wm[av]
*.asf
*.jpg
*.jpeg
*.tif
*.tiff
*.psd
*.rtf *.html
*.htm
*.shtml
*.hta
*.mht
*.mhtml
*.eml *.msg
winmail.dat
*.cd[rt] *.atc *.lnk *.epub
*.thmx
*.kmz
*.ae[as]
*.hmxz
*.hmskin
*.chm
*.hxs
None Skip Skip Skip Skip Skip Skip Skip Skip Skip Skip Skip Skip Skip Skip Raw Raw Skip Skip Skip Skip Raw Skip Skip Skip
Proprietary formats Built-in Built-in Built-in Built-in Built-in Built-in Built-in Built-in Built-in Built-in Built-in Built-in Built-in Built-in Raw Raw Built-in Built-in Built-in Built-in Raw Built-in Compound Compound
Writable proprietary formats Skip Skip Built-in Built-in Skip Skip Skip Skip Skip Skip Skip Built-in Built-in Built-in Raw Raw Built-in Built-in Skip Built-in Raw Built-in Compound Skip
All formats Built-in Built-in Built-in Built-in Built-in Built-in Built-in Built-in Built-in Built-in Built-in Built-in Built-in Built-in Built-in Built-in Built-in Built-in Built-in Built-in Built-in Built-in Compound Compound
All writable formats Skip Skip Built-in Built-in Skip Skip Skip Skip Skip Skip Skip Built-in Built-in Built-in Built-in Built-in Built-in Built-in Skip Built-in Built-in Built-in Compound Skip
Attachments & proprietary formats Built-in Built-in Built-in Built-in Built-in Built-in Built-in Built-in Built-in Built-in Built-in Built-in Built-in Built-in Raw Raw Built-in Compound Compound Built-in Raw Built-in Compound Compound
Attachments & writable proprietary formats Skip Skip Built-in Built-in Skip Skip Skip Skip Skip Skip Skip Built-in Built-in Built-in Raw Raw Built-in Compound Skip Built-in Raw Built-in Compound Skip
Attachments & all formats Built-in Built-in Built-in Built-in Built-in Built-in Built-in Built-in Built-in Built-in Built-in Built-in Built-in Built-in Built-in Built-in Built-in Compound Compound Built-in Built-in Built-in Compound Compound
Attachments & all writable formats Skip Skip Built-in Built-in Skip Skip Skip Skip Skip Skip Skip Built-in Built-in Built-in Built-in Built-in Built-in Compound Compound Built-in Built-in Built-in Compound Skip
Compound documents Skip Skip Compound Compound Skip Skip Compound Skip Compound Skip Skip Skip Skip Skip Raw Raw Compound Compound Compound Compound Raw Skip Compound Compound
Writable compound documents Skip Skip Compound Compound Skip Skip Compound Skip Compound Skip Skip Skip Skip Skip Raw Raw Compound Compound Skip Compound Raw Skip Compound Skip
Compound documents & proprietary formats Built-in Built-in Compound Compound Built-in Built-in Compound Built-in Compound Built-in Built-in Built-in Built-in Built-in Raw Raw Compound Compound Compound Compound Raw Built-in Compound Compound
Writable compound documents & proprietary formats Skip Skip Compound Compound Skip Skip Compound Skip Compound Skip Skip Built-in Built-in Built-in Raw Raw Compound Compound Skip Compound Raw Built-in Compound Skip