stage
Draft in Progress
This document is a draft and is under development.
Description
Set up test environment by putting data into tables or files required by the ETL test.
Attributes
List of stage()
Attributes
stage()
for Database
- >> REF:
source: - >> TYPE: string
source-file: - >> TYPE: string
target-table: - >> TYPE: string
- - REQUIRED
connection-id: - >> TYPE: string
target-schema: - >> TYPE: string
reference-file-type:
- >> REF:
column-list: - >> TYPE: list of unique strings
column-list-mode:
- >>
POSSIBLE VALUES
:
include
exclude
mode: - >> TYPE: string
modes: - >> TYPE: list of strings
connections: - >> TYPE: object (key-value pairs, comma-separated)
{
<connection-name>: - >> TYPE: string
{
mode: - >> TYPE: string
modes: - >> TYPE: list of strings
}, ... multiple connections possible
}
stage()
for File
- >> REF:
file: - >> TYPE: string - - REQUIRED
variable-name: - >> TYPE: string
classifier: - >> TYPE: string
destination-name: - >> TYPE: string
context-name: - >> TYPE: string
stage()
for List-File
- >> REF:
list-file: - >> TYPE: string - - REQUIRED
variable-name: - >> TYPE: string
classifier: - >> TYPE: string
destination-name: - >> TYPE: string
context-name: - >> TYPE: string
stage()
for Data Sets
data-set-name: - >> TYPE: string - REQUIRED
data-set-id: - >> TYPE: string
ignore-data-set-properties: - >> TYPE: boolean
Individual stage()
Attributes
stage()
for Database Context
[Database Feature Module - Configuration properties not listed here.]
source:
- A source data file in the standard location.
- This file will be found in folder
data/
, a sibling of the ETLUnit test class file. - Typically these files on the file system should be named with
.fixed
or.delimited
file extension. - The
source
property value, however, should not include the file extension (.delimited
or.fixed
). - Typically used with
target-table
. - e.g.:
source: DATA_FILE,
- The file named by
source
would be placed in folderdata/
, a sibling to the ETLUnit test class file. - The file would there be named
DATA_FILE.delimited
orDATA_FILE.fixed
.
stage(
source: 'DATA_FILE',
target-table: 'TEST_TABLE'
);
source-file:
- A source data file in a custom location.
- Include the path with the file name. The path will start at the project root.
- Include the file name extension. It may be anything.
- If you do not provide a reference-file-type for the
source-file
, ETLUnit will apply the FML it finds for the target table.
stage(
source-file: 'src/test/etlunit/data2/BUCKET_GAP_FILLER_FF_COLM_ORD_src_file.abc',
connection-id:'pos_ods',
target-table: 'TICKET'
);
target-table:
- Database table name.
connection-id:
- Corresponds to
id
property in the@Database
annotation. - Any value other than one that corresponds to the
id
property in the@Database
annotation is not allowed.
target-schema:
- Name of target table's schema, if different from default.
reference-file-type:
- Applied to source, not target
- If you want to supply FML for the target, just name the FML so that it will be found by ETLUnit when it is looking for FML for the target.
- TODO: Fix link when available: SEE FML File Naming Options.
column-list:
- List of source columns to include or exclude when staging data.
- Only the column list that results from the combined filters of
column-list
andcolumn-list-mode
will be chosen from the source file and staged to the target table. - Q: What if FML is provided to give the source data file columns with different names than those in the target table?
- A: FAIL! The purpose of FML is to match files. Without matching field names, ETLUnit cannot match the file layouts.
- Q: What if the count of the net number of columns in the column list (after the include or exclude filter is applied) does not match the number of columns in the source data file?
- A:
- If
column-list
is found insidereference-file-type
structure, this will fail. - If
column-list
is found outsidereference-file-type
structure, and it has fewer columns than thesource
data file, only the enumerated columns will be applied.
- If
column-list-mode:
- Valid values:
include
- Include only the columns in the
column-list
.
- Include only the columns in the
exclude
- - DEFAULT, if column-list-mode is omitted.
- Exclude the columns in the
column-list
.
- Exclude the columns in the
mode:
- Represents a database connection.
- Corresponds to a mode declared in the @Database annotation.
- You cannot use a mode that has not been declared in the @Database annotation.
modes:
- A list of modes.
connections:
<connection-name>:
mode:
modes:
- Use the
connections
structure if the database connection you need is out of scope (overridden by method-level @Database annotation). - Example:
@Database(id: 'qpt-bmw-mart', modes: ['src', 'tgt'])
@Database(id: 'edw', modes: ['src'])
@Database(id: 'qpt_pet', modes: ['src'])
class load_pfix_agg_v2
{
@BeforeTest
loadReferenceData()
{
stage(
source: 'PRODUCT_ALL_GM',
target-table: 'PRODUCT',
connections:
{
qpt_pet: {mode: 'src'},
qpt-bmw-mart: {modes: ['src', 'tgt']}
}
);
stage(
source: 'REPORTING_ACTIVE_MODIFIER_GM',
target-table: 'REPORTING_ACTIVE_MODIFIER',
target-schema: 'CBE',
connections:
{
qpt_pet: {mode: 'src'},
qpt-bmw-mart: {mode: 'src'}
}
);
stage(
source: 'PLANT_SMALL_GM',
target-table: 'PLANT',
connections:
{
qpt_pet: {mode: 'src'},
qpt-bmw-mart: {modes: ['src', 'tgt']}
}
);
stage(
source: 'VW_QPT_SYSTEM_EFFECTIVE_DATE_20150509',
target-table: 'VW_QPT_SYSTEM_EFFECTIVE_DATE',
// When you only need to specify one connection...
connection-id: 'edw'
);
set(variable: 'population_date', value: '20150302');
}
...
}
stage()
for File Context
[File Feature Module - Configuration properties not listed here.]
file:
- Required
- Name of the file being staged.
- Include file name extension, if any.
- File will be locating in folder
files/
, a sibling to the ETLUnit test class.
variable-name:
- The variable into which is placed the value of the
file
property (above). - The value provided to this property will accomplish two things:
- At runtime, a variable named by this
variable-name
property will be populated with the name of the file specified in thefile
property of thisstage
operation. - This variable will be made available to the
execute()
operation.
- At runtime, a variable named by this
- In the case of testing with Informatica workflows, look for the variable in the Informatica parameter file used by the workflow being tested. The parameter file may be found in
src/main/informatica/<InformaticaWorkflowFolderName>
. The parm file name should be identical to the workflow name, with an ".xml
" file name extension. - In the example below:
- A variable "
inFile
" is populated with the value "bool_conversion_src.txt
," because that is the value of thefile
property of the samestage
operation. - Because of the configuration of the parm file "
wkf_BOOL_CONVERSION.TEST.xml
," the$InputFile1
" parameter used by the workflowwkf_BOOL_CONVERSION_TEST
will resolve to "bool_conversion_src.txt
," the value implicitly placed in variable "inFile
." In the parm file this variable is represented as "$inFile
." - The Informatica workflow will pull its data at the beginning of its run from file "
bool_conversion_src.txt
." You can see confirmation of this in the excerpt from the workflow log.
- A variable "
O|Y|Y|Y|Y|Y
O|N|N|N|N|N
O|||||
O|x|x|x|x|x
... [ more data ]
[Global]
$BadFileName1=nogood
$InputFile1=${inFile}
... [ more parameters ]
class bool_conversion_test {
@Test
testBounds() {
stage(file: 'bool_conversion_src.txt', variable-name: 'inFile');
//Run workflow
execute(workflow: 'wkf_BOOL_CONVERSION_TEST', folder: 'SHARED_EDW');
//asssertions
assert(
target: 'BOOL_CONVERSION_TGT',
reference-file-type: 'bool',
source-file: 'output'
);
}
}
classifier:
- When the
execute
operation that follows thisstage
operation involves Informatica workflow:- Valid values:
source
- - relates internally to variable "sourceFiles
"target
- - relates internally to variable "targetFiles
"lookup
- - relates internally to variable "lookupFiles
"
- In the Informatica PowerCenter Workflow Manager, the Edit Tasks window, the Mapping tab, the "Files, Directories and Commands" panel, the attribute Value column, you may see file directory variables like
$PMSourceFileDir
,$PMTargetFileDir
or$PMLookupFileDir
. When you follow where the file is being staged to, and what the file name is, you should be able to see what the classifier should be by looking on this panel. - File copy destination:
source
= "SourceFiles
" - - If global variable$PMSourceFileDir
is then set to${sourceFiles}
, the staged file will end up in subfolderSourceFiles/
.target
= "TargetFiles
" - - If global variable$PMTargetFileDir
is then set to${targetFiles}
, the staged file will end up in subfolderTargetFiles/
.lookup
= "LookupFiles
" - - If global variable$PMLookupFileDir
is then set to${lookupFiles}
, the staged file will end up in subfolderLookupFiles/
.
- When only
file
andclassifier
are specified: - File is copied to a location used by Informatica workflow, using the name found in the
file
property.
- File is copied to a location used by Informatica workflow, using the name found in the
- When
file
,classifier
andvariable-name
are specified:- File is copied to a location used by Informatica workflow, using the name found in the
file
property, as processed through thevariable-name
property.
- File is copied to a location used by Informatica workflow, using the name found in the
- When
file
,classifier
anddestination-name
are specified:- File is copied to a location used by Informatica workflow, using the name found in the
destination-name
property.
- File is copied to a location used by Informatica workflow, using the name found in the
- Valid values:
destination-name:
- The destination name becomes available as a file to, for example, Informatica workflows in the
execute
operation. - The generated file may be found locally in a location like this:
target/generated-sources/log/<testClassName>/<testMethodName>/<operationOrdinal>_execute
- The generated file may be found in the Informatica test environment in the location configured through the workflow's parameter files.
context-name:
- Used by etl-agent module on behalf of Informatica instance.
- SEE
execute
operation's list of attributes for Informatica. - SEE
context-name
in theexecute
operation.
stage()
for List-File Context
[File Feature Module - Configuration properties not listed here.]
list-file:
- Required
- Name of the list file being staged.
- Include file name extension, if any.
- File will be locating in folder
files/
, a sibling to the ETLUnit test class.
variable-name:
- The variable into which is placed the value of the
file
property. - SEE File context variable-name.
classifier:
- When the
execute
operation that follows thisstage
operation involves Informatica workflow:- Valid values:
source
- - relates internally to variable "sourceFiles
"target
- - relates internally to variable "targetFiles
"lookup
- - relates internally to variable "lookupFiles
"
- Valid values:
- Similar behavior as that for file context classifier property.
destination-name:
context-name:
stage()
for Data Sets Context
For example of the stage operation using a data set, see the Data Sets page.
data-set-name:
- Required
- Matches the name of a text file in folder '
dataset
,' a sibling of the ETLUnit test class. - The data set text file has file extension '
.dataset
.'
data-set-id:
- The
data-set-id
's value matches the value of theid
property in the data set member's properties section.
failure-id:
- The failure to be generated by ETLUnit if the operation using the data set fails.
- A failure elsewhere in the test method will not invoke this failure ID.
ignore-data-set-properties:
- Causes the ETLUnit test to ignore properties included in the data set member's properties section.
- This does not include the
id
property in the data set member's properties section.
Practices
stage()
for File Context
Stage a flat file without a target
- When working with Informatica tests, for instance, you may want to stage a flat file without specifying a target. The flat file itself is the target.
- In this case, you may want to specify a classifier, so Informatica will find the file in the right context (source, target, lookup).
@Database(id: 'edw', modes: ['src'])
class gap_bucket
{
@Test
columnsloadinorder_bucket()
{
stage(
file: 'FF_BUCKET_V2.txt',
classifier: 'source'
);
stage(
source: 'BUCKET_V1_GAP',
connection-id: 'edw',
target-table: 'BUCKET'
);
execute(
workflow: 'wkf_GAP_FILLER_CREATE_V1_BUCKET_DELTA',
folder: 'EDW_QPT_TICKET',
context:
{
edw-connection-src: 'informatica.connections.edw.src',
edw-connection-tgt: 'informatica.connections.edw.tgt'
}
);
assert(
source-file: 'FF_BUCKET_V1_DELTA_20150512',
reference-file-type: 'gap_bucket'
);
}
}
In Informatica, the workflow's source file name is based on a variable.
That variable is set in the ETLUnit project's parameter file for the workflow being tested.
...
$InputFile_FF_BUCKET_V2=FF_BUCKET_V2.txt
...
And we know from the parameter file above that our staged file, FF_BUCKET_V2.txt
is expected by the workflow. This stage operation instance makes that flat file available to the workflow.