10. Import data using WbImport

10.1. General parameters
10.2. Parameters for the type TEXT
10.3. Text Import Examples
10.4. Parameters for the type XML
10.5. Update mode

The WbImport command can be used to import data from text or XML files into a table of the database. WbImport can read the XML files generated by the WbExport command's XML format. It can also read text files created by the WbExport command that escape non-printable characters.

The WbImport command can either be used from within the GUI like any other SQL command (such as UPDATE or INSERT), or it can be used as part of a SQL script that is run in batch mode.

During the import of text files, empty lines (i.e. lines which only contain whitespace) will be silently ignored.

WbImport recognizes certain "literals" to identify the current date or time when converting values from text files to the approriate data type of the DBMS. Thus, input values like now, or current_timestamp for date or timestamp columns are converted correctly. For details on which "literals" are supported, please see the description about editing data.

The DataPumper can also be used to import text files into a database table, though it does not offer all of the possibilities from the WbImport command.

Archives created with the WbExport command using the -compress=true parameter can be imported using WbImport command. You simply need to specifiy the archive file created by WbExport, and WbImport will automatically detect the archive. For an example on creating and importing compressed exports, please refer to compressing export files

10.1. General parameters

The WbImport command has the following syntax

ParameterDescription
-type

Possible values: xml, text

Defines the type of the input file

-file

Defines the full name of the input file. Alternatively you can also specify a directory (using -sourcedir) from which all files are imported.

-sourceDir

Defines a directory which contains import files. All files from that directory will be imported. If this switch is used with text files and no target table is specified, then it is assumed that each filename (without the extension) defines the target table. If a target table is specified using the -table parameter, then all files will be imported into the same table. The -deleteTarget will be ignored if multiple files are imported into a single table.

-checkDependencies

When importing more than one file (using the -sourcedir switch), into tables with foreign key constraints, this switch can be used to import the files in the correct order (child tables first). When -checkDependencies=true is passed, SQL Workbench/J will check the foreign key dependencies for all tables.

-extension

When using the -sourcedir switch, the extension for the files can be defined. All files ending with the supplied value will be processed. (e.g. -extension=csv). The extension given is case-sensitiv (i.e. TXT is something different than txt

-commitEvery

A numeric value that defines the number of rows after which a COMMIT is sent to the DBMS. If this parameter is not passed (or a value of zero or lower), then SQL Workbench/J will commit when all rows have been imported. When using batch execution it is recommended to commit the batch using the -commitBatch parameter.

-mode

Defines how the data should be sent to the database. Possible values are 'INSERT', 'UPDATE', 'INSERT,UPDATE' and 'UPDATE,INSERT' For details please refer to the update mode explanation.

-continueOnError

Possible values: true, false

This parameter controls the behaviour when errors occur during the import. The default is true, meaning that the import will continue even if an error occurs during file parsing or updating the database. Set this parameter to false if you want to stop the import as soon as an error occurs.

The default value for this parameter can be controlled in the settings file and it will be displayed if you run WbImport without any parameters.

With PostgreSQL continueOnError will only work, if the use of savepoints is enabled. This can be done using by setting the property workbench.db.postgresql.import.usesavepoint=true in in the configuration file workbench.settings. If this is enabled, then each INSERT (or UPDATE) statement will be "wrapped" between savepoints so that a statement error can be recovered.

-keyColumns

Defines the key columns for the target table. This parameter is only necessary if import is running in UPDATE mode.

This parameter is ignored if files are imported using the -sourcedir parameter

-table

Defines the table into which the data should be imported

This parameter is ignored, if the files are imported using the -sourcedir parameter

-schema Defines the schema into which the data should be imported. This is necessary for DBMS that support schemas, and you want to import the data into a different schema, then the current one.
-encodingDefines the encoding of the input file (and possible CLOB files)
-deleteTarget

Possible values: true, false

If this parameter is set to true, data from the target table will be deleted (using DELETE FROM ...) before the import is started. This parameter will only be used if -mode=insert is specified.

-truncateTable

Possible values: true, false

This is essentially the same as -deleteTarget, but will use the command TRUNCATE to delete the contents of the table. For those DBMS that support this command, deleting rows is usually faster compared to the DELETE command, but it cannot be rolled back. This parameter will only be used if -mode=insert is specified.

-batchSize

A numeric value that defines the size of the batch queue. Any value greater than 1 will enable batch mode. If the JDBC driver supports this, the INSERT (or UPDATE) performance can be increased drastically.

This parameter will be ignored if the driver does not support batch updates or if the mode is not UPDATE or INSERT (i.e. if -mode=update,insert or -mode=insert,update is used).

-commitBatch

Possible values: true, false

If using batch execution (by specifying a batch size using the -batchSize parameter) each batch will be committed when this parameter is set to true. This is slightly different to using -commitEvery with the value of the -batchSize parameter. The latter one will add a COMMIT statement to the batch queue, rather than calling the JDBC commit() method. Some drivers do not allow to add different statements in a batch queue. So, if a frequent COMMIT is needed, this parameter should be used.

When you specify -commitBatch the parameter -commitEvery will be ignored. If no batch size is given, then -commitBatch will be ignored.

-transactionControl

Possible values: true, false

Controls whether SQL Workbench/J handles the transaction for the import, or if the import must be committed (or rolled back) manually. If -transactionControl=false is specified, the SQL Workbench/J will neither send a COMMIT nor a ROLLBACK at the end. This can be used when multiple files need to be imported in a single transaction. This can be combined with the cleanup and error scripts in batch mode.

-updateWhere

When using update mode an additional WHERE clause can be specified to limit the rows that are updated. The value of the -updatewhere parameter will be added to the generated UPDATE statement. If the value starts with the keyword AND or OR the value will be added without further changes, otherwise the value will be added as an AND clause enclosed in brackets. This parameter will be ignored if update mode is not active.

-startRow A numeric value to define the first row to be imported. Any row before the specified row will be ignored. The header row is not counted to determine the row number. For a text file with a header row, the pysical line 2 is row 1 (one) for this parameter.
-endRow A numeric value to define the last row to be imported. The import will be stopped after this row has been imported. When you specify -startRow=10 and -endRow=20 11 rows will be imported (i.e. rows 10 to 20). If this is a text file import with a header row, this would correspond to the physical lines 11 to 21 in the input file as the header row is not counted.
-badFile

Possible values: true, false

If -continueOnError=true is used, you can specify a file to which rejected rows are written. If the provided filename denotes a directory a file with the name of the import table will be created in that directory. When doing multi-table inserts you have to specify a directory name.

If a file with that name exists it will be deleted when the import for the table is started. The fill will not be created unless at least one record is rejected during the import. The file will be created with the same encoding as indicated for the input file(s).

-maxLength

With the parameter -maxLength you can truncate data for character (VARCHAR, CAHR) columns during import. This can be used to import data into columns that are not big enough (e.g. VARCHAR columns) to hold all values from the input file and to ensure the import can finish without errors.

The parameter defines the maximum length for certain columns as using the following format: -maxLength='firstname=30,lastname=20' Where firstname and lastname are columns from the target table. The above example will limit the values for the column firstname to 30 characters and the values for the column lastname to 20 characters. If a non-character column is specified this is ignored. Note that you have quote the parameter's value in order to be able to use the "embedded" equals sign.

-booleanToNumber

Possible values: true, false

When exporting data from a DBMS that supports the BOOLEAN datatype, the export file will contain the literals "true" or "false" for the value of the boolean columns. When importing this file into a DBMS that does not support the BOOLEAN datatype, the import would fail.

In case you are importing the boolean column into a numeric column in the target DBMS, SQL Workbench/J will automatically convert the literal true to the numeric value 1 (one) and the literal false to the numeric value 0 (zero). If you do not want this automatic conversion, you have to specify -booleanToNumber=false for the import. The default values for the true/false literals can be overwritten with the -literalsFalse and -literalsTrue switches.

-literalsFalse -literalsTrue

When dealing with boolean values in the input file, these two switches define the literals that represent the value false and the value true when parsing the input data.

The value to these switches is a comma separated list of literals that should be treated as the specified value, e.g.: -literalsFalse='false,0' -literalsTrue='true,1' will define the most commonly used values for true/false.

Please note:

  • The definition of the literals is case sensitive!
  • You always have to specify both switches, otherwise the definition will be ignored

-constantValues

With this parameter you can supply constant values for one or more columns that will be used when inserting new rows into the database.

The constant values will not be used when updating columns!

The format of the values is -constantValues="column1=value1,column2=value2". The values will be converted by the same rules as the input values from the input file. If the value for a character column is enclosed in single quotes, these will be removed from the value before sending it to the database. To include single quotes at the start or end of the input value you need to use two single quotes, e.g.-constantValues="name=''Quoted'',title='with space'" For the field name the value 'Quoted' will be sent to the database. for the field title the value with space will be sent to the database.

To specify a function call to be executed, enclose the function call in ${...}, e.g. ${mysequence.nextval} or ${myfunc()}. The supplied function will be put into the VALUES part of the INSERT statement without further checking (after removing the ${ and } characters, of course). So make sure that the syntax is valid for your DBMS. If you do need to store a literal like ${some.value} into the database, you need to quote it: -constantValues="varname='${some.value}'".

-preTableStatement -postTableStatement

This parameter defines a SQL statement that should be executed before the import process starts inserting data into the target table. The name of the current table (when e.g. importing a whole directory) can be referenced using ${table.name}.

To define a statement that should be executed after all rows have been inserted and have been committed, you can use the -postTableStatement parameter.

These parameters can e.g. be used to enable identity insert for MS SQL Server:

-preTableStatement="set identity_insert ${table.name} on"
-postTableStatement="set identity_insert ${table.name} off"
Errors resulting from executing these statements will be ignored. If you want to abort the import in that case you can specify -ignorePrePostErrors=false and -continueOnError=false.

-ignorePrePostErrors=false

Controls handling of errors for the -preTableStatement and -postTableStatement parameters. If this is set to false, errors resulting from executing the supplied parameters are ignored. If set to true (default) then error handling depends on the parameter -continueOnError.

-showProgress

Valid values: true, false, <numeric value>

Control the update frequence in the statusbar (when running in GUI mode). The default is every 10th row is reported. To disable the display of the progress specifiy a value of 0 (zero) or the value false. true will set the progress interval to 1 (one).

10.2. Parameters for the type TEXT

ParameterDescription
-fileColumns

A comma separated list of the table columns in the import file Each column from the file should be listed with the approriate column name from the target table. This parameter also defines the order in which those columns appear in the file. If the file does not contain a header line or the header line does not contain the names of the columns in the database (or has different names), this parameter has to be supplied. If a column from the input file has no match in the target table, then it should be specified with the name $wb_skip$. You can also specify the $wb_skip$ flag for columns which are present but that you want to exclude from the import.

This parameter is ignored when the -sourceDir parameter is used.

-importColumns

Defines the columns that should be imported. If all columns from the input file should be imported (the default), then this parameter can be ommited. If only certain columns should be imported then the list of columns can be specified here. The column names should match the names provided with the -filecolumns switch. The same result can be achieved by providing the columns that should be excluded as $wb_skip$ columns in the -filecolumns switch. Which one you choose is mainly a matter of taste. Listing all columns and excluding some using -importcolumns might be more readable because the structure of the file is still "visible" in the -filecolumns switch.

This parameter is ignored when the -sourcedir parameter is used.

-delimiter

Define the character which separates columns in one line. Records are always separated by newlines (either CR/LF or a single a LF character) unless -multiLine=true is specified

Default value: \t (a tab character)

-columnWidths

To import files that do not have a delimiter but a fixed width for each column, this parameters defines the width of each column in the input file. The value for this parameter is a comma separated list, where each element defines the width for a single column. If this parameter is given, the -delimiter parameter is ignored.

e.g.: -columnWidths='name=10,lastname=20,street=50,flag=1'

Note that the whole list must be enclosed in quotes as the parameter value contains the equal sign.

If you want to import only certain columns you have to use -fileColumns and -importColumns to select the columns to import. You cannot use $wb_skip$ in the -fileColumns parameter with a fixed column width import.

-dateFormat

The format for date columns.

-timestampFormat

The format for datetime (or timestamp) columns in the input file.

-quoteChar

The character which was used to quote values where the delimiter is contained. This parameter has no default value. Thus if this is not specified, no quote checking will take place. If you use -multiLine=true you have to specify a quote character in order for this to work properly.

-quoteCharEscaping

Possible values: none, escape, duplicate

Defines how quote characters that appear in the actual data are stored in the input file.

You have to define a quote character in order for this option to have an effect. The character defined with the -quoteChar switch will then be imported according to the setting defined by this switch.

If escape is specified, it is expected that a quote that is part of the data is preceded with a backslas, e.g. the input value here is a \" quote character will be imported as here is a " quote character

If duplicate is specified, it is expected that the quote character is duplicated in the input data. This is similar to the handling of single quotes in SQL literals. The input value here is a "" quote character will be imported as here is a " quote character

-multiLine

Possible values: true, false

Enable support for records spanning more than one line in the input file. These records have to be quoted, otherwise they will not be recognized.

If you create your exports with the WbExport command, it is recommended to encode special characters using the -escapetext switch rather then using multi-line records.

The default value for this parameter can be controlled in the settings file and it will be displayed if you run WbImport without any parameters.

-decimalThe decimal symbol to be used for numbers. The default is a dot
-header

Possible values: true, false

If set to true, indicates that the file contains a header line with the column names for the target table. This will also ignore the data from the first line of the file. If the column names to be imported are defined using the -filecolumns or the -importcolumns switch, this parameter has to be set to true nevertheless, otherwise the first row would be treated as a regular data row.

This parameter is always set to true when the -sourcedir parameter is used.

The default value for this option can be changed in the settings file and it will be displayed if you run WbImport without any parameters. It defaults to true

-decode

Possible values: true, false

This controls the decoding of escaped characters. If the export file was e.g. written with escaping enabled then you need to set -decode=true in order to interpret string sequences like \t, \n or escaped Unicode characters properly. This is not enabled by default because applying the necessary checks has an impact on the performance.

-columnFilter

This defines a filter on column level that selects only certain rows from the input file to be sent to the database. The filter has to be defined as column1="regex",column2="regex". Only Rows matching all of the supplied regular expressions will be included by the import.

This parameter is ignored when the -sourcedir parameter is used.

-lineFilter

This defines a filter on the level of the whole input row (rather than for each column individually). Only rows matching this regular expression will be included in the import.

The complete content of the row from the input file will be used to check the regular expression. When defining the expression, remember that the (column) delimiter will be part of the input string of the expression.

-emptyStringIsNull

Possible values: true, false

Controls whether input values for character type columns with a length of zero are treated as NULL (value true) or as an empty string.

The default value for this parameter is true

Note that, input values for non character columns (such as numbers or date columns) that are empty or consist only of whitespace will always be treated as NULL.

-trimValues

Possible values: true, false

Controls whether leading and trailing whitespace are removed from the input values before they are stored in the database. When used in combination with -emptyStringIsNull=true this means that a column value that contains only whitespace will be stored as NULL in the database.

The default value for this parameter can be controlled in the settings file and it will be displayed if you run WbImport without any parameters.

Note that, input values for non character columns (such as numbers or date columns) are always trimmed before converting them to their target datatype.

-blobIsFilename

Possible values: true, false

When exporting tables that have BLOB columns using WbExport into text files, each BLOB will be written into a separate file. The actual column data of the text file will contain the file name of the external file. When importing text files that do not reference external files into tables with BLOB columns setting this paramter to false, will send the content of the BLOB column "as is" to the DBMS. This will of course only work if the JDBC driver can handle the data that in the BLOB columns of the text file. The default for this parameter is true

-clobIsFilename

Possible values: true, false

When exporting tables that have CLOB columns using WbExport and the parameter -clobAsFile=true the generated text file will not contain the actual CLOB contents, but the a filename indicating the file in which the CLOB content is stored. In this case -clobIsFilename=true has to be specified in order to read the CLOB contents from the external files. The CLOB files will be read using the encoding specified with the -encoding parameter.

10.3. Text Import Examples

WbImport -file=c:/temp/contacts.txt
         -table=person
         -filecolumns=lastname,firstname,birthday
         -dateformat="yyyy-MM-dd";

This imports a file with three columns into a table named person. The first column in the file is lastname, the second column is firstname and the third column is birhtday. Values in date columns are formated as yyyy-MM-dd

[Note]

A special timestamp format millis is availalbe to identify times represented in milliseconds (since January 1, 1970, 00:00:00 GMT).

WbImport -file=c:/temp/contacts.txt
       -table=person
       -filecolumns=lastname,firstname,$wb_skip$,birthday
       -dateformat="yyyy-MM-dd";

This will import a file with four columns. The third column in the file does not have a corresponding column in the table person so its specified as $wb_skip$ and will not be imported.

WbImport -file=c:/temp/contacts.txt
         -table=person
         -filecolumns=lastname,firstname,phone,birthday
         -importcolumns=lastname,firstname;

This will import a file with four columns where all columns exist in the target table. Only lastname and firstname will be imported. The same effect could be achieved by specifying $wb_skip$ for the last two columns and leaving out the -importcolumns switch. Using -importcolumns is a bit more readable because you can still see the structure of the input file. The version with $wb_skip$ is mandatory if the input file contains columns that do not exist in the target table.

If you want to import certain rows from the input file, you can use regular expressions:

WbImport -file=c:/temp/contacts.txt
         -table=person
         -filecolumns=lastname,firstname,birthday
         -columnfilter=lastname="^Bee.*",firstname="^Za.*"
         -dateformat="yyyy-MM-dd";

The above statement will import only rows where the column lastname contains values that start with Bee and the column firstname contains values that start with Za. So Zaphod Beeblebrox would be imported, Arthur Beeblebrox would not be imported.

If you want to learn more about regular expressions, please have a look at http://www.regular-expressions.info/

If you want to limit the rows that are updated but cannot filter them from the input file using -columnfilter or -linefilter, use the -updatewhere parameter:

WbImport -file=c:/temp/contacts.txt
         -table=person
         -filecolumns=id,lastname,firstname,birthday
         -keycolumns=id
         -mode=update
         -updatewhere="source <> 'manual'"

This will update the table PERSON. The generated UPDATE statement would normally be: UPDATE person SET lastname=?, firstname=?, birthday=? WHERE id=?. The table contains entries that are maintained manually (identified by the value 'manual' in the column source) and should not be updated by SQL Workbench/J. By specifying the -updatewhere parameter, the above UPDATE statement will be extended to WHERE id=? AND (source <> 'manual'). Thus skipping records that are flagged as manual even if they are contained in the input file.

WbImport -sourceDir=c:/data/backup
         -extension=txt
         -header=true

This will import all files with the extension txt located in the directory c:/data/backup into the database. This assumes that each filename indicates the name of the target table.

WbImport -sourceDir=c:/data/backup
         -extension=txt
				 -table=person
         -header=true

This will import all files with the extension txt located in the directory c:/data/backup into the table person regardless of the name of the input file. In this mode, the parameter -deleteTarget will be ignored.

10.4. Parameters for the type XML

The XML import only works with files generated by the WbExport command.

ParameterDescription
-verboseXML

Possible values: true, false

If the XML was generated with -verboseXML=false then this needs to be specified also when importing the file. Beginning with build 78, the SQL Workbench/J writes the information about the used tags into the meta information. So it is no longer necessary to specify whether -verboseXML was true when creating the XML file.

-sourceDir

Specify a directory which contains the XML files. All files in that directory ending with ".xml" (lowercase!) will be processed. The table into which the data is imported is read from the XML file, also the columns to be imported. The parameters -keycolumns, -table and -file are ignored if this parameter is specified. If XML files are used that are generated with a version prior to build 78, then all files need to use either the long or short tag format and the -verboseXML=false parameter has to be specified if the short format was used.

When importing several files at once, the files will be imported into the tables specified in the XML files. You cannot specify a different table (apart from editing the XML file before starting the import).

-importColumns

Defines the columns that should be imported. If all columns from the input file should be imported (the default), then this parameter can be ommited. When specified, the columns have to match the column names available in the XML file.

-createTargetIf this parameter is set to true the target table will be created, if it doesn't exist. Valid values are true or false.

10.5. Update mode

The -mode parameter controls the way the data is sent to the database. The default is INSERT. SQL Workbench/J will generate an INSERT statement for each record. If the INSERT fails no further processing takes place for that record.

If -mode is set to UPDATE, SQL Workbench/J will generate an UPDATE statement for each row. In order for this to work, the table needs to have a primary key defined, and all columns of the primary key need to be present in the import file. Otherwise the generated UPDATE statement will modify rows that should not be modified. This can be used to update existing data in the database based on the data from the export file.

To either update or insert data into the table, both keywords can be specified for the -mode parameter. The order in which they appear as the parameter value, defines the order in which the respective statements are sent to the database. If the first statement fails, the second will be executed. For -mode=insert,update to work properly a primary or unique key has to be defined on the table. SQL Workbench/J will catch any exception (=error) when inserting a record, then it will try updating the record, based on the specified keycolumns. The -mode=update,insert works the other way. First SQL Workbench/J will try to update the record based on the primary keys. If the DBMS signals that no rows have been updated, it is assumed that the row does not exist and the record will be inserted into the table. This mode is recommended when no primary or unique key is defined on the table, and an INSERT would always succeed.

The keycolumns defined with the -keycolumns parameter don't have to match the real primary key, but they should identify one row uniquely.

You cannot use the update mode, if the tables in question only consist of key columns (or if only key columns are specified). The values from the source are used to build up the WHERE clause for the UPDATE statement.

If you specify a combined mode (e.g.: update,insert) and one of the tables involved consists only of key columns, the import will revert to insert mode. In this case database errors during an INSERT are not considered as real errors and are silently ignored.

For maximum performance, choose the update strategy that will result in a succssful first statement more often. As a rule of thumb:

  • Use -mode=insert,update, if you expect more rows to be inserted then updated.

  • Use -mode=update,insert, if you expect more rows to be updated then inserted.