...
Code Block | ||||
---|---|---|---|---|
| ||||
{ "flat-file" : { "format-type" : "delimited", "row-delimiter" : "\n", "column-delimiter" : "\t", "null-token" : null, "columns" : [ { "id" : "SEQ_", "type" : "INTEGER", "length" : -1, "basic-type" : "integer" }, { "id" : "FLOAT_", "type" : "DOUBLE", "length" : -1, "basic-type" : "numeric" }, { "id" : "DECIMAL_", "type" : "DECIMAL", "length" : -1, "basic-type" : "numeric" }, { "id" : "DOUBLE_", "type" : "DOUBLE", "length" : -1, "basic-type" : "numeric" }, { "id" : "REAL_", "type" : "DOUBLE", "length" : -1, "basic-type" : "numeric", "default-value": 1 }, { "id" : "NUMERIC_", "type" : "NUMERIC", "length" : -1, "basic-type" : "numeric" "read-only": true } ], "primaryKey" : [ "SEQ_" ], "orderBy" : [ "SEQ_", "FLOAT_", "DECIMAL_", "DOUBLE_", "REAL_", "NUMERIC_" ] } } |
The basic components are the file type, fixed or delimited, the delimiters, and the columns that make up the data set.
attribute | values | meaning |
---|---|---|
format-type | delimited, fixed | How columns are demarcated - either by length as in fixed or a delimiter |
row-delimiter | How records are separated. | |
column-delimiter | How columns are delimited. If specified on a fixed length file, it is an error | |
null-token | The literal token to use to show a null column value. If not specified, the word 'null' is used. | |
primaryKey | A list of column names which make up the primary key for the table. This uniqueness is generally enforced, even for files, so it should be correct if specified. | |
orderBy | A list of columns to order the data set by. Ordering is critical when comparing two different data sources. The order by tells the comparator when two records match so that it can report that two rows match with one column difference, rather than reporting two non-matching columns. For a tutorial on this refer to this page. | |
columns | A list of the columns in the data set. Must not be missing or empty. | |
columns.id | The unique identifier for this column. | |
columns.type | The JDBC type for this column. See the Jdbc Types class documentation for a list of values. | |
columns.length | The length of this column. Required for fixed length files, helpful in delimited files for enforcing correctness. In the case of a decimal / numeric, this serves as the precision. | |
columns.scale | The scale of the column. In the case of a decimal numeric, this is the number of decimal places. | |
columns.basic-type | string, integer, numeric | Used for ordering the data set when be handled as a file. |
columns.read-only | An attribute which identifies the column as not writable. Primarily used for staging data into tables which don't accept values. | |
columns.format | A string identifying the format for this data point. Useful when a file contains a date, e.g., as a packed integer but it is being staged into a table which stores the date as a datetime. | |
columns.default-value | A value that can be used when none is present. This is distinct from null. If a column is explicitly set to null, null is used - if the data point isn't present at all, then the default value may be used, depending on the context. |