Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The basic components are the file type, fixed or delimited, the delimiters, and the columns that make up the data set.

attributevaluesmeaning
format-typedelimited, fixedHow columns are demarcated - either by length as in fixed or a delimiter
row-delimiter
 How records are separated.
column-delimiter
 How columns are delimited. If specified on a fixed length file, it is an error
null-token
 The literal token to use to show a null column value. If not specified, the word 'null' is used.
primaryKey A list of column names which make up the primary key for the table. This uniqueness is generally enforced, even for files, so it should be correct if specified.
orderBy A list of columns to order the data set by. Ordering is critical when comparing two different data sources. The order by tells the comparator when two records match so that it can report that two rows match with one column difference, rather than reporting two non-matching columns. For a tutorial on this refer to this page.
columns A list of the columns in the data set. Must not be missing or empty.
columns.id The unique identifier for this column.
columns.type The JDBC type for this column. See the Jdbc Types class documentation for a list of values.
columns.length The length of this column. Required for fixed length files, helpful in delimited files for enforcing correctness. In the case of a decimal / numeric, this serves as the precision.
columns.scale The scale of the column. In the case of a decimal numeric, this is the number of decimal places.
columns.basic-typestring, integer, numericUsed for ordering the data set when be handled as a file.
columns.read-only An attribute which identifies the column as not writable. Primarily used for staging data into tables which don't accept values.
columns.format A string identifying the format for this data point. Useful when a file contains a date, e.g., as a packed integer but it is being staged into a table which stores the date as a datetime.
columns.default-value A value that can be used when none is present. This is distinct from null. If a column is explicitly set to null, null is used - if the data point isn't present at all, then the default value may be used, depending on the context.