Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

Concept

Etlunit uses a concept of reference file types which are core to how it handles data in almost every context.  Some examples of when reference file types are used:

  1. The database stage operation - putting data into a database table for testing.
  2. The database assertion operation - reading data from a database table and comparing to an expected data set for comparison.
  3. The file assertion operation - taking a file written by a process and comparing to an expected data set for comparison.

 

When discussing ETL processes - especially testing - the data inputs for the test and the assertions performed on the result of the data transformation are the key features of a testing platform.  In etlunit, we take that seriously and the discussion that follows should help to fully understand the process so that your tests are well understood and stable - keys to agile unit testing and continuous integration success.

Determining effective reference file type

Whenever a reference file type is required, etlunit uses the following algorithm to determine which one to use.

  1. If a reference-file-type is specified, that is used.
  2. If the operation is operating on a named thing, such as a database table or assertion file, and a reference-file-type exists that has a matching name, then that is used.
  3. If the thing being acted upon has a generic name, then a reference file type matching that name is used.  E.G., a database table named Table, in the schema edw, and connection id db, will first match a reference file type named Table.fml, and secondly db-edw-Table.fml.
  4. If there are two data sets involved, and the other data set has been identified using this same process, the same file type will be used for this one.
  5. At this point no file type has been found and may cause an error.

Locating Reference File types

Reference file types are located in the src/main/reference/file/fml/name.fml folder in the project.  Any named file type will be located in this folder first.  The next step is to search the classpath for reference/file/fml/name.fml.

The Reference File Type

Reference file types are stored as json objects in files named with '.fml' extensions.  This is a sample reference file type:

{
  "flat-file" : {
    "format-type" : "delimited",
    "row-delimiter" : "\n",
    "column-delimiter" : "\t",
    "null-token" : null,
    "columns" : [ {
      "id" : "SEQ_",
      "type" : "INTEGER",
      "length" : -1,
      "basic-type" : "integer"
    }, {
      "id" : "FLOAT_",
      "type" : "DOUBLE",
      "length" : -1,
      "basic-type" : "numeric"
    }, {
      "id" : "DECIMAL_",
      "type" : "DECIMAL",
      "length" : -1,
      "basic-type" : "numeric"
    }, {
      "id" : "DOUBLE_",
      "type" : "DOUBLE",
      "length" : -1,
      "basic-type" : "numeric"
    }, {
      "id" : "REAL_",
      "type" : "DOUBLE",
      "length" : -1,
      "basic-type" : "numeric",
	  "default-value": 1
    }, {
      "id" : "NUMERIC_",
      "type" : "NUMERIC",
      "length" : -1,
      "basic-type" : "numeric"
	  "read-only": true
    } ],
    "primaryKey" : [ "SEQ_" ],
    "orderBy" : [ "SEQ_", "FLOAT_", "DECIMAL_", "DOUBLE_", "REAL_", "NUMERIC_" ]
  }
}

The basic components are the file type, fixed or delimited, the delimiters, and the columns that make up the data set.

 

 

  • No labels