Merge Web Logs by Date

Software that generates log files is often configured to start with a new log file every now and then, such as one log file per day. This is great for keeping file sizes small, but results in a large number of log files. If the log files are small it may be more convenient if you combine them into a smaller set of files.

For this example we’ll merge an Apache web log which stores one log entry per line, with each entry storing the date formatted like 25/Apr/2010. The example assumes each file has the logs for one day. It combines these logs into one file per month. The Split Web Logs by Date example does the opposite.

  1. Select the log files you want to merge in the File Selector.
  2. Start with a fresh action.
  3. Set the action type to “merge files”.
  4. Leave the “file sectioning” set to “do not section files”. The “merge files” action always combines entire files. We don’t need to use file sectioning to process log entries separately. The first date we find in the file determines the target file.
  5. In the Search box, enter the regular expression (?'day'[0-9]{1,2}+)/(?'month'Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)/(?'year'[0-9]{4}). Only files in which this regex can find a match are combined. PowerGREP only finds the first regex match in each file. As soon as one match is found, the file is merged.
  6. In the “target file creation” drop-down list near the bottom of the Action panel, select “merge based on search matches”. This makes the Target File box visible.
  7. In the Target File box, enter something along the lines of c:\logs\web logs ${month} ${year}.txt.bz2 to build a path using replacement text syntax that includes the date from the regex match. Since only the first regex match is used, the first date found in the file determines the target file it is merged into. We’ve added a .bz2 extension to the target file name to make PowerGREP automatically compress the target file.
  8. Set “between collected text” set to “nothing”. Apache log files already have a line break at the end.
  9. The “order of collected matches” drop-down list determines the order in which our log files are combined. This is important if you want your log entries in the combined file to have the proper order. If your original log files put the log entries in order if you sort the files alphabetically by name, then choose “sort files Alphabetically A..Z”. If the time stamp on the log files puts the files in the correct order (e.g. the time stamp on each log file is the time the last entry was written) then you can choose “oldest file to newest file”.

  10. Set the backup file options as you like them.
  11. Click the Merge Files button to merge the files. A “merge files” action never lists anything but file names on the Results panel, so there’s no difference between Merge Files and Quick Merge.

Merging files does not delete the original files. It may overwrite original files if the Target File for one or more search matches is a file that is searched through. When merging files PowerGREP does not write the final target files until the action has completed. Overwriting source files won’t alter the search matches.

A “merge files” action always merges files as a whole. If you want to merge multiple files but put different parts of each file into different target files, essentially splitting and merging files at the same time, use a “split files” action. The Split Web Logs by Date example can do this.

This action is available in the PowerGREP5.pgl library as “Logs: Merge Apache web logs”.