Search Through Mailboxes And Email Messages

To search through email with PowerGREP you need to configure how PowerGREP handles email messages and how PowerGREP handles email folders or mailboxes.

Search Through Email Messages

The configuration you select for “file formats to convert to plain text” on the File Selector panel determines how PowerGREP handles email messages. This affects both email messages stored in separate files as well as email messages stored in mailboxes. Within the configuration, which you can view or edit by clicking the (...) button, the relevant settings are those for the “Email message (MIME or UUencode)” and the “Microsoft Outlook message” file formats. The MIME format is the most common format for storing email messages. It is used for emails saved separately in .eml files and for emails inside MBOX mailboxes and Outlook Express .dbx folders. The Outlook format is used for emails saved separately in .msg files and for emails in Outlook .pst folders.

If you select “Use PowerGREP’s built-in decoder to convert files to plain text” for these formats, then PowerGREP converts email messages to plain text. The plain text conversion consists of the basic email headers Subject, From, Date, and To followed by the body text of the email. PowerGREP converts the body text to plain text even for emails sent purely as HTML or RTF. Email messages appear as files in the folders and files tree on the File Selector panel. Attachments are not accessible and are not searched through. This option is recommended if you know that the text you’re searching for is in the body text of an email rather than in an attachment. Skipping attachments speeds up the search considerably. Default configurations that use this option are “proprietary formats” and “all formats”.

If you select “Search through the individual files inside the compound document” for these formats, then email messages appear as expandable nodes in the folders and files tree. The files inside that node depend on the format of the message and its contents. Default configurations that use this option are “attachments & proprietary formats” and “attachments & all formats”.

Inside single-part MIME messages you’ll see a file body.txt or body.html with the body text of the email and a file headers.txt with the basic email headers. Inside multi-part MIME messages, you’ll see numbered files with extensions corresponding to their content type. There will always be a file 0.txt with the basic headers of the email. The other files hold the various parts of the message, numbered in the order they have in the message. An email sent with the body in both text and html formats along with one attached image, for example, is shown as containing the files 1.txt, 2.html, and 3.png. If the attachment headers indicate file names, those file names are used instead of the numbers.

For UUencode messages, you’ll see a file body.txt with the body text and a file headers.txt with the basic email headers. Attachments appear as additional files with their file names.

For Outlook messages, you’ll usually see two files body.txt and body.rtf with the body text of the email in plain text and rich text formats as saved by Outlook. Attachments appear as additional files with their file names. There will also be a file headers.txt with the basic email headers.

To decide whether “attachments & proprietary formats” or “attachments & all formats” is the better choice, check out the example about searching through RTF and HTML as plain text. You’ll likely prefer “attachments & all formats” when dealing with HTML email so the HTML tags don’t get in the way. Since RTF is only used by Outlook emails and Outlook normally saves its own plain text conversion, you may even want to edit this configuration and select “always exclude files of this type” for “Rich Text Format”. Then the body.rtf inside Outlook messages is always skipped, but body.txt is still searched through.

Search Through Mailboxes

The configuration you select for “archive formats to search inside” on the File Selector panel determines how PowerGREP handles mailboxes. Within the configuration, which you can view or edit by clicking the (...) button, the relevant settings are those for the “MBOX mailboxes”, “Outlook folders (PST and OST)”, and “Outlook Express folders (DBX)” archive formats. The MBOX format is used by most email software on UNIX/Linux. It is also used by many Windows email clients. Microsoft’s Outlook and Outlook Express have their own mailbox formats. MBOX and DBX files store messages in MIME format. PST files store messages in Outlook’s own format. OST files are used by Outlook 2013 and later.

Since the MBOX format has its roots in the UNIX world, MBOX files are often saved without an extension. PowerGREP’s default archive configurations treat files named “INBOX” or “Sent” without an extension as MBOX files. You may need to edit the file masks for the MBOX format to make sure PowerGREP correctly recognizes your mailboxes.

If you clear the checkbox “search through files inside archives of this format” then all files matching that format’s file masks are excluded from the action. If you tick the checkbox, those files appear as folders in the folders and files tree in the File Selector.

For MBOX and DBX files, those folders contain numbered files starting with 1.eml where each file is one email message, numbered in the order the messages have in the mailbox. These .eml files hold the plain text conversion of each email or act as compound documents with the email body and attachments inside them depending on the file format configuration as described in the previous section.

For PST files, those folder nodes contain Outlook folders. Inside the Outlook folders you’ll see numbered files starting with 1.txt when the file format configuration is set to convert Outlook messages to plain text. But you’ll see numbered folders starting with 1 when the file format configuration is set to treat Outlook messages as compound documents. The reason messages are shown as folders is that PST files are an email database rather than a collection of separate MSG files.

Default configurations that search inside all mailbox formats are “mailboxes only”, “mailboxes and zip archives”, and “mailboxes and all archives”. If the file format configuration is set to search through email attachments, then the archive configuration determines whether PowerGREP searches inside attachments that are zip archives or other archives.

By default Outlook saves its PST files under the AppData or Application Data folder under your Windows user profile. This folder is normally a hidden folder. Hidden folders are hidden from PowerGREP’s view by default. If you want to search through PST files inside the AppData folder or inside another folder, then you need to set “hide files and folders” on the File Selector panel to a configuration that does not hide hidden files and folders. Of the predefined configurations, you can select any configuration that does not have “hidden” in its name.

If you want to restrict the search to certain mailboxes based on their file names, you need to use the “include folders” and “exclude folders” boxes on the File Selector panel. PowerGREP treats mailbox files as folders when searching inside them.

File selections for searching through email are available in the PowerGREP5.pgl library as “Email: Search through email body text” and “Email: Search through email body text and attachments”.