A plain text file is a computer file that is stored entirely as text in a human-readable form. A text document or .txt file created with a plain text editor such as Notepad or EditPad is stored in that way. On the other hand, a document created with Microsoft Word is stored in a binary format. If you open a DOCX file in a plain text editor, you will see only garbage.
Line Breaks and Code Pages
Not all text files are alike, though. Computers deal with numbers, not with characters. When you save a text file, each character is mapped to a number, and the numbers are stored on disk. Different character mappings or code pages are used for different language and scripts. Since different computer manufacturers had different ideas about how to create character mappings, there's a wide variety of legacy character mappings.
While most Windows grep and search tools only support text files saved with a Windows code page or Unicode, PowerGREP supports all character sets that have or ever had any importance, including Unicode (UTF-8, UTF-16 and UTF-32), all Windows code pages, all ISO-8859 character sets (often used on Linux in the past), most legacy MS-DOS, PC DOS, and classic MacOS code pages, ECBDIC (used by IBM mainframes), KOI8 (popular in Russia and CIS countries), the many Vietnamese encodings, and a variety of other specialized code pages. PowerGREP can read and write all these encodings. So you can search through, make replacements in, collect matches from, and write results to files in any encoding that your other software may be using or expecting.
PowerGREP can automatically detect encodings in a variety of ways. This includes Unicode signatures or byte order markers, HTML meta tags, XML declarations, and even UTF-8 and UTF-16 byte patterns. If you work with many different encodings, you can use the setting "text encoding to read files with" on the File Selector panel to tell PowerGREP exactly which encodings it should use for exactly which files.
Inconsistent line break handling is also a problem with many grep tools. Windows text files normally use a CRLF pair for line breaks. But UNIX and Linux use a single LF and classic Mac used a single CR to end lines. This causes many Windows applications to display text from Linux files all on one line. On top of that, Unicode introduced additional line break characters. With PowerGREP, you don't need to worry about line breaks. PowerGREP transparently handles all line break styles, even when mixed together in one file. PowerGREP's regex flavor is also smart about line breaks. Anchors that match at line breaks recognize all line breaks, and treat CRLF pairs as indivisible. Literal line breaks match a line break in any style. And matches can span across lines if you want them to.
Convert Between Encodings and Line Break Styles
Other software that you use may not be as flexible as PowerGREP. If you have plain text files that use an encoding or line break style not supported by the application you want to use them with, use PowerGREP to convert or translate your plain text files from one encoding and/or line break style to another. To do so, simply run a search with "action type" set to "list files". You don't need to enter a search term if you want to convert all files you've included in the search. Then set "target file creation" to "convert matched files to text" or to "convert copies of matched files to text". Then choose the text encoding and/or line break style that the converted files should use.