Split Database Dumps

Database dumps often produce extremely large files that are difficult to deal with outside your database software. PowerGREP can easily split the dump for an entire server into separate dumps for each database, or the dump for an entire database into separate dumps for each table. This gives you smaller files that you can work with in applications like text editors.

  1. Mark the SQL files with your database dumps in the File Selector.
  2. Start with a fresh action.
  3. Set the action type to “split files”.
  4. Enter ^CREATE DATABASE[^‘'";]*[‘'"](\w+)[‘'"].*?(?=^CREATE DATABASE|\z) into the Search box.
  5. Enter a path like c:\\output\\$1.sql into the Target File box.
  6. Turn on “dot matches newlines” to allow the regex matches to span across lines.
  7. Click the Quick Split button to execute the action.

You want to use the Quick Split button rather than the Split Files button so that PowerGREP will not waste time and memory to try to display your entire database dump on the Results panel. When the action completes, you’ll have one file database_name.sql for each database that was in your dump files.

The regular expression needs to match the entire block of SQL for one database in your dump. The above regex assume the block begins with a CREATE DATABASE statement at the start of a line. This is matched with ^CREATE DATABASE[^‘'";]*[‘'"](\w+)[‘'"], which also captures the name of the database. The regex also assumes that the block ends just before the next CREATE DATABASE statement or at the end of the file. This position is matched with (?=^CREATE DATABASE|\z). Between that .*? matches anything.

If your dump contains the tables for a single database, you can use the regular expression ^CREATE TABLE[^‘'";]*[‘'"](\w+)[‘'"].*?(?=^CREATE TABLE|\z) to split it into separate files for each table.

If your dumps don’t start with CREATE DATABASE or CREATE TABLE, then you may need to use different regexes entirely. For example, mysqldump begins each database dump with a comment:

-- Current Database: ‘database_name‘

If you split these dumps with the regex from the steps above then the split files will have the comment for the next database in the log at the bottom of the file. To make sure you get the correct comment at the top of the file you can use this regex:

-- Current Database: ‘(\w+).*?(?=^--
-- Current Database: ‘|\z)

This regex contains two literal line breaks. In PowerGREP, when “search type” is set to “regular expression” (as opposed to “free-spacing regular expression”), a literal line break in a regex matches a line break of any style. So by using literal line breaks in this regex you can make sure that it will work correctly regardless of whether your database dump uses UNIX or Windows line breaks.

The $1 in the target file path is replaced with the contents of the first capturing group. In this case that’s the name of the database or table. If you’re splitting multiple dump files that may contain dumps for databases with the same name, then the above Target File setting will create only one file for each unique name that contains the dump for all the databases with the same name. If you want to separate those too you can use a target path like c:\\output\\%FILENAMENOEXT%_$1.sql to add the names of the original dump files to those of the split files.