With PowerGREP, you can quickly search for a piece of information through files and folders on your computer, including Microsoft Word documents saved in DOC, DOT and DOCX files. Simply type in a keyword or phrase in the Search box, select which folder PowerGREP should go through and which types of files are of interest. When you click the search button, PowerGREP will present you with a list of MS Word documents and other files in which the text you entered in the search box was found. The list will show one line of context for each match. You can instantly inspect the entire context by double-clicking on the match in the results.
When you do not know in advance exactly what you are looking for, PowerGREP's rich regular expression support allows you to search for virtually anything by specifying the form of what you want, and let PowerGREP find the actual text matching that form.
With PowerGREP's collect data feature, you can extract data from MS Word documents and other files and automatically save the extracted data into one or more new text files. You can group identical matches together and count them, producing informative statistics.
How PowerGREP Handles DOC Files
Prior to Office 2007, Microsoft Word used a proprietary file format to store its documents. These files have a .doc or .dot extension. The file format is not documented officially. Because the file format is very complex, most applications that can handle Word documents rely on Microsoft Word itself.
Since PowerGREP needs to be able to search through a large number of files in a reasonable amount of time, running Microsoft Word in the background is not an option. Instead, PowerGREP will open the DOC file directly, without the help of MS Word, and extract the text from the file. Text formatting, images and other special content are ignored. The extracted text is what PowerGREP searches through. When inspecting search matches inside PowerGREP, the built-in file editor will show the plain text version that PowerGREP actually searched through. You can always click use the Edit button's drop-down menu to open the file in Microsoft Word.
PowerGREP is not able to modify DOC files. You cannot search-and-replace through DOC files using PowerGREP. If you want to modify Word documents with PowerGREP, consider switching to the DOCX format.
How PowerGREP Handles DOCX Files
Starting with Office 2007, Microsoft Word uses a new file format called Microsoft Open XML. These files have a .docx extension. This format is officially documented, and even standardized. Still, it is incredibly complex, simply because Word offers a tremendous amount of features.
DOCX files are completely different from DOC files. Whereas DOC files are opaque binary files, DOCX files are technically ZIP archives that contain a bunch of XML files along with support files such as images. While PowerGREP treats ZIP archives as compressed folders, transparently zipping and unzipping files inside them, PowerGREP treats DOCX files as single document files. If this sounds confusing: PowerGREP simply treats DOCX files the way you expect it to when marking files to be searched through, and when telling PowerGREP to copy or move files around. The whole DOCX file will be moved, rather than the XML files inside of it.
By default, PowerGREP uses the IFilter that is included with Office 2007 and later to extract the text from DOCX files, just like PowerGREP does on its own for DOC files. You'll see a plain text representation of your DOCX files in PowerGREP. When using the IFilter, PowerGREP cannot modify DOCX files because Microsoft's IFilter system is read-only.
If you disable the IFilter, or if you don't have Office 2007 or later installed, then PowerGREP searches through the raw XML content of DOCX files. PowerGREP can also search-and-replace through DOCX files in this mode. Though all those XML tags may seem to get into your way at first, that extra bit of complexity actually opens up a whole world of possibilities. The XML tags represent your document's formatting. By searching for, removing and inserting the XML tags, you can search for and alter the formatting of your document. To find out which tags to use, simply create a few test documents in Word using the formatting you want. Navigate inside the DOCX files in PowerGREP's File Selector. Right-click on the document.xml file, and select Edit. PowerGREP's editor will show you the raw XML code that PowerGREP searches through.
|Searching through MS Word documents with PowerGREP|
"I just want to tell you that you have a great program there. It turned a job that could have taken hours into a few minutes. The Microsoft OneCare back up system creates ZIP files, hundreds of them. We use an external hard drive and OneCare backs up the files over our network. My daughter is a college student and her computer died and she's been on the National Honor list for the past five years because of the meticulous notes she takes and saves in Word files. She needed all her notes for one of the topics for a final exam and by using PowerGREP we were able to search the backup drive and get her files immediately to load on another computer for her to continue her intensive studies. — Paul Mayer
21 February 2008, Illinois, USA