Concept
Etlunit uses a concept of reference file types which are core to how it handles data in almost every context. Some examples of when reference file types are used:
- The database stage operation - putting data into a database table for testing.
- The database assertion operation - reading data from a database table and comparing to an expected data set for comparison.
- The file assertion operation - taking a file written by a process and comparing to an expected data set for comparison.
When discussing ETL processes - especially testing - the data inputs for the test and the assertions performed on the result of the data transformation are the key features of a testing platform. In etlunit, we take that seriously and the discussion that follows should help to fully understand the process so that your tests are well understood and stable - keys to agile unit testing and continuous integration success.
Determining effective reference file type
Whenever a reference file type is required, etlunit uses the following algorithm to determine which one to use.
- If a reference-file-type is specified, that is used.
- If the operation is operating on a named thing, such as a database table or assertion file, and a reference-file-type exists that has a matching name, then that is used.
- If the thing being acted upon has a generic name, then a reference file type matching that name is used. E.G., a database table named Table, in the schema edw, and connection id db, will first match a reference file type named Table.fml, and secondly db-edw-Table.fml.
- If there are two data sets involved, and the other data set has been identified using this same process, the same file type will be used for this one.
- At this point no file type has been found and may cause an error.
Locating Reference File types
Reference file types are located in the src/main/reference/file/fml/name.fml folder in the project. Any named file type will be located in this folder first. The next step is to search the classpath for reference/file/fml/name.fml.
The Reference File Type
Reference file types are stored as json objects in files named with '.fml' extensions. This is a sample reference file type:
{ "flat-file" : { "format-type" : "delimited", "row-delimiter" : "\n", "column-delimiter" : "\t", "null-token" : null, "columns" : [ { "id" : "SEQ_", "type" : "INTEGER", "length" : -1, "basic-type" : "integer" }, { "id" : "FLOAT_", "type" : "DOUBLE", "length" : -1, "basic-type" : "numeric" }, { "id" : "DECIMAL_", "type" : "DECIMAL", "length" : -1, "basic-type" : "numeric" }, { "id" : "DOUBLE_", "type" : "DOUBLE", "length" : -1, "basic-type" : "numeric" }, { "id" : "REAL_", "type" : "DOUBLE", "length" : -1, "basic-type" : "numeric", "default-value": 1 }, { "id" : "NUMERIC_", "type" : "NUMERIC", "length" : -1, "basic-type" : "numeric" "read-only": true } ], "primaryKey" : [ "SEQ_" ], "orderBy" : [ "SEQ_", "FLOAT_", "DECIMAL_", "DOUBLE_", "REAL_", "NUMERIC_" ] } }
The basic components are the file type, fixed or delimited, the delimiters, and the columns that make up the data set.
Add Comment