Split Web Logs by Date

Software that generates log files often dumps everything into a single log file. As the log file grows in size it becomes difficult to work with. Using PowerGREP you can easily split the log into multiple files, such as one file per day.

For this example we’ll split an Apache web log which stores one log entry per line, with each entry storing the date formatted like 25/Apr/2010. The Merge Web Logs by Date example does the opposite.

  1. Select the log files you want to split in the File Selector.
  2. Start with a fresh action.
  3. Set the action type to “split files”.
  4. In the “file sectioning” list, select “line by line, including line breaks”. Each line in the file is one log entry.
  5. Turn on the option “split whole sections”. This makes sure lines will be extracted as a whole into the target files.
  6. In the Search box, enter the regular expression (?'day'[0-9]{1,2}+)/(?'month'Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)/(?'year'[0-9]{4}). Only lines that match this regex are written to a target file.
  7. In the Target File box, enter something along the lines of c:\logs\web logs ${day} ${month} ${year}.txt.bz2 to build a path using replacement text syntax that includes the date from the regex match. Lines with the same target file (after substituting backreferences) are written to the same file. Lines with different target files are written to different files. We’ve added a .bz2 extension to the target file name to make PowerGREP automatically compress the file.
  8. Set “between collected text” set to “nothing”. Since we’re collecting whole lines including line breaks, there’s no need to add more delimiters.
  9. Set the backup file options as you like them.
  10. Click the Quick Split button to split the file. Use this button instead of Split Files. Otherwise PowerGREP will waste a lot of time and memory to display your entire log files on the Results panel.

Splitting files does not delete the original files. It may overwrite original files if the Target File for one or more search matches is a file that is searched through. When splitting files PowerGREP does not write the final target files until the action has completed. Overwriting source files won’t alter the search matches.

This action is available in the PowerGREP5.pgl library as “Logs: Split Apache web logs”.

Recombining Log Files

The above example can also be used to recombine log files. Suppose your application writes log files to a certain size. It might write up to 100,000 entries in a single log file, and then start with a new file. Doing so keeps log file sizes manageable, but you’ll end up with entries from multiple days in the same file, and entries from the same day in multiple files.

To recombine the logs so you’ll have one file for the log entries of one day, simply mark all the files with your logs in the File Selector. Then execute the action described above. If log entries from different files result in the same target file, they’ll be merged into that target file, even though you’re executing a “split files” action. The key difference between “split files” and “merge files” actions is that “split files” calculates the target file for each search match, while “merge files” calculates the target file for each file searched through.

The “order of collected matches” drop-down list determines the order in which matches (log entries in this case) from different files are written when a “split files” action calculates the same target file path from matches from multiple files. This is important if you want your log entries in the combined file to have the same order as in the original files. If your original log files put the log entries in order if you sort the files alphabetically by name, then choose “sort files Alphabetically A..Z”. If the time stamp on the log files puts the files in the correct order (e.g. the time stamp on each log file is the time the last entry was written) then you can choose “oldest file to newest file”.