ConvertingAssertionsTo3.2.0
Starting in etlunit 3.2.0, assertions have changed a bit in nature in order to make them work better and more consistently across both files and databases.
Â
What's Changed
Prior to 3.2.0, the source file type drove the operation. Â For example, in this assertion:
assert(source-table: 'TABLE', target: 'MASTER');
The reference file type is the relational definition of the TABLE table, and the target file is opened using that definition. Â Even if you specify file types for both the source and the target, the source is still the driver:
assert( source-table: 'TABLE', source-reference-file-type: 'T', target: 'MASTER', target-reference-file-type: 'M' );
Â
In this case, since etlunit has been given enough information to interpret the file types on both sides, etlunit will use the source file type to extract data from the source table, and the target file type to open the local data file, then it intersects the source and the target to create an intersection of the two, and that becomes the basis of the assertion. Â If columns are additionally specified, as in this, the column list is processed against the source, then intersected with the target.
assert( source-table: 'TABLE', source-reference-file-type: 'T', column-list-mode: 'exclude', column-list: ['ID'], target: 'MASTER', target-reference-file-type: 'M' );
What happens here is that the source file type, 'T', is narrowed with the column list - specifically in this case the ID field is excluded, then the source data is extracted. Â The target file type, 'M', is narrowed to match the source, then the target file is opened using that definition. Â Any columns that don't match were silently ignored. Â The result is that the local target file must exactly match the final definition of the target, not the original. Â
This entire process has changed in 3.2.0 - the target file must match the target reference file type exactly. Â This same operation in 3.2.0 is processed like this:
- Open the target file type.
- Process the column list on the target file type. Â This is the file type which drives the entire operation.
- Verify that the reference file type (the type created in (2)) is exactly a subset of the source file type. Â If it isn't, an error is thrown.
- Compare both the source and the target using the reference file type from (2).
File Assertions
This part requires a little more explanation. Â This is what a minimal file assertion looks like in < 3.2.0:
assert(file: 'name');
What this means is to compare a file named 'name', which is the output of some job, to a local file also named 'name'. Â Here is a complete example:
assert( file: 'name', target-file-name: 'target', source-reference-file-type: 'src', target-reference-file-type: 'tgt' );
Hopefully this illustrates the problem. Â Here, with target-file-name specified, file refers to the expected data file, what etlunit calls the target, and target-file-name refers to the actual file, what etlunit refers to as the source. Â Beyond that, though, since the file assertion shares it's implementation with the database assertion, the two file types, source and target, refer to target-file-name and file, respectively.
Â
So, to address this, we scuttled the existing file assertion and made it match database assertions so a single definition fits both. Â The minimal assertion above is represented like this:
assert(source-file: 'name');
Pretty simple. Â Just copy and paste, right? Â Not quite. Â In the case when the source and target names differ, it has to be handled like this:
assert( source-file: 'target', target: 'name', source-reference-file-type: 'src', target-reference-file-type: 'tgt' );
In this case, the attribute that was named file is renamed to target, and the attribute that was named target-file-name is now source-file. Â The fact that it isn't a simple search and replace illustrates why it was inconsistent in the first place.
Assertion files
Beyond the operations themselves, the target files will have to change any time the columns in the assertion don't match the target fml. Â In < 3.2, the assertion data file (the expected file - stored in the local data folder) had to match the final definition for the assertion - since the column specs were completely determined by the source before ever opening the data file. Â Now, however, the expected data file must 100% match the target file type, and any columns specs, etc, are processed on the target afterwards.