Search through Text Files to Find (and Replace) Text, a Keywords or Phrases
Which Files Are Text Files
A text file is a computer file that is stored entirely as text in a human-readable form. A text document or .txt file created with a plain text editor such as Notepad or EditPad is stored in that way. On the other hand, a document created with Microsoft Word is stored in a proprietary binary format. If you open an MS Word document in a plain text editor, you will see a whole bunch of gooblydook interspersed with the text that you actually typed into Word.
Files such as HTML files (i.e. web pages), XML files and RTF files (Rich Text Format) are all plain text files. If you open an HTML file in Internet Explorer, you will see a nicely rendered web page, just like the page you are reading now. If you open the same HTML file in a plain text editor such as Notepad, you will see the text of the web page along with the HTML tags that provide the formatting.
Although a plain text editor cannot render an HTML page as Internet Explorer does, it can accurately represent the contents of the HTML file. All the formatting and special controls are shown as textual tags, rather than a bunch of weird characters that you cannot make sense of. Though you may prefer a visual HTML editor such as Frontpage or Dreamweaver, it is perfectly possible to create a complete web site in a plain text editor. Incidentally, this web site was created entirely with EditPad Pro.
The key advantage of plain text files is that they can be easily handled by a wide range of software. The software does not need to contain any special logic to decode the file.
|Searching Through Text Files with PowerGREP to find email addresses inside HTML anchors|
Search (and Replace) through Text Files with PowerGREP
To find a piece of text inside a series of text files, simply type in the text you are looking for on the Search page in PowerGREP. You will quickly get a list of all occurrences. PowerGREP treats all the content in text files equally. If you search for the word "table" through the HTML files of your web site, PowerGREP will list both the word table whenever it appears inside the text of your web site, as well as the <TABLE> tags that you (or your visual HTML editor) used to place tables on your web pages.
This means that with a search-and-replace in PowerGREP, you cannot only modify the content of your web site, but also its formatting and structure. On a web page, <B>text</B> is bold text, and <I>text</I> is italic text. You can make all bold text italic by searching for the regular expression <B>(.*?)</B> and replacing it with <I>\1</I>. Don't worry if you do not have any experience with regular expressions. The documentation that comes with PowerGREP includes a detailed tutorial to regular expressions.
Line Breaks and Code Pages
Not all text files are alike, though. Computers deal with numbers, not with characters. When you save a text file, each character is mapped to a number, and the numbers are stored on disk. Different character mappings or code pages are used for different language and scripts. Since different computer manufacturers had different ideas about how to create character mappings, there's a wide variety of legacy character mappings.
While most Windows grep and search tools only support text files saved with a Windows code page or Unicode, PowerGREP supports a variety of character sets, including Unicode (UTF-8, UTF-16 and UTF-32), all Windows code pages, all ISO-8859 character sets (used by Linux), most legacy MS-DOS, PC DOS, and classic MacOS code pages, ECBDIC (used by IBM mainframes), KOI8 (popular in Russia and CIS countries), and a variety of other specialized code pages. PowerGREP can both search through files using any of these encodings, as well as save search results in those encodings.
See PowerGREP in Action
There are four ways to see PowerGREP in action:
Read more about PowerGREP's features and benefits.