...
- Open the target file type.
- Process the column list on the target file type. This is the file type which drives the entire operation.
- Verify that the reference file type (the type created in (2)) is exactly a subset of the source file type. If it isn't, an error is thrown.
- Compare both the source and the target using the reference file type from (2).
File Assertions
This part requires a little more explanation. This is what a minimal file assertion looks like in < 3.2.0:
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
assert(file: 'name'); |
What this means is to compare a file named 'name', which is the output of some job, to a local file also named 'name'. Here is a complete example:
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
assert(
file: 'name',
target-file-name: 'target',
source-reference-file-type: 'src',
target-reference-file-type: 'tgt'
); |
Hopefully this illustrates the problem. Here, with target-file-name specified, file refers to the expected data file, what etlunit calls the target, and target-file-name refers to the actual file, what etlunit refers to as the source. Beyond that, though, since the file assertion shares it's implementation with the database assertion, the two file types, source and target, refer to target-file-name and file, respectively.
So, to address this, we scuttled the existing file assertion and made it match database assertions so a single definition fits both. The minimal assertion above is represented like this:
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
assert(source-file: 'name'); |
Pretty simple. Just copy and paste, right? Not quite. In the case when the source and target names differ, it has to be handled like this:
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
assert(
source-file: 'target',
target: 'name',
source-reference-file-type: 'src',
target-reference-file-type: 'tgt'
); |
In this case, the attribute that was named file is renamed to target, and the attribute that was named target-file-name is now source-file. The fact that it isn't a simple search and replace illustrates why it was inconsistent in the first place.
Assertion files
Beyond the operations themselves, the target files will have to change any time the columns in the assertion don't match the target fml. In < 3.2, the assertion data file (the expected file - stored in the local data folder) had to match the final definition for the assertion - since the column specs were completely determined by the source before ever opening the data file. Now, however, the expected data file must 100% match the target file type, and any columns specs, etc, are processed on the target afterwards.