Input files format and data validation
Input files
In our ETLs, we integrate CSV flat files that will feed your Actito licence Data Model.
Compression
Those files you'll provide should be compressed using ZIP or GZIP compression, or if not possible not compressed.
While multiple CSV files can be provided in the same ZIP archive, GZIP can compress one file at a time only.
Compressed file names and CSV file names can be defined separately, and there is no constraint on file extension usage, providing full flexibility when coming to naming files to retrieve.
CSV format
CSV files are flat files that follow the CSV file RFC. In Actito ETLs — as in most CSV integration systems — the comma separator can be replaced by any single character.
It is essential to pay close attention to the formatting of these CSV files. A common issue is poorly quoted or improperly enclosed column data, which can corrupt the file and make it impossible to parse.
Make sure that any column containing the separator character is properly enclosed, and that separator characters inside a cell are correctly escaped.
Be careful with columns containing free text: these often include carriage return or line feed characters, which must also be enclosed to prevent them from being interpreted as new CSV lines.
A valid CSV must contain the exact same number of columns on each line. This number must match the first line, which is expected to contain the column headers.
Actito ETLs always expect to find these column headers in the first line, so they do not need to rely on column order to map data to the data model attributes.
The definition of an ETL allows you to define those separator/enclosing/quoting characters.
Note that there is no constraint among the extension of the file to retrieve.
Even if Actito ETLs only deal with CSV formatted files, those can be named like myfile.txt for instance.
Encoding
Actito takes in charge the following encodings:
- UTF-8
- ISO-8859-1
You'll have to declare this encoding in all ETL definition, and overall ensure that the provided files are indeed compatible with this declared encoding.
A common issue is the presence of the BOM leading character in the file. UTF-8 WITH BOM or UTF-16 WITH BOM are not allowed in Actito ETLs.
Data formatting
When mapping a CSV file column to an attribute of an Actito Data Model table, it is mandatory to provide data in a format that is compatible with the type defined in the table structure definition. i.e. johnsmith can not be integrated in an INTEGER attribute.
Hereby follow the representation patterns that fit to every type of attribute you can find in a profile or custom table:
Raw data
String
- Max 255 characters (unless specifically defined in the attribute definition)
- Should fit the optional REGEXP (can be defined in the attribute definition)
Numeric
- Should contain integers, longs or decimals
- Negative numbers should be prefixed with the dash
-characters - Positive numbers should not be prefixed with the
+characters - Decimal separator should be the
.characters
Boolean
Can be:
TRUEorFALSE(case insensitive)1or0YorN(case insensitive)YESorNO(case insensitive)TorF(case insensitive)VRAIorFAUX(case insensitive)OUIorNON(case insensitive)OorN(case insensitive)
Date
- Should be formatted with the
yyyy-MM-ddpattern
Date-time
- Should be formatted with the
yyyy-MM-dd hh:mm:sspattern
When UPDATING existing records, empty values in the CSV file mean that the value currently found in DB will be voided, unless the ignoreEmptyValues parameter of the ETL is set to true (in which case it will be ignored).
However, providing an empty value for a boolean field (including subscriptions) will not remove the current value, at is always must be true or false.
Standard Actito attributes
E-mail address
- Should be a valid e-mail address
- Ex: john.smith@actito.com
Phone number
- Should only contain those characters :
+().- /0123456789 - Should be provided with the international country prefix
- Ex: +3210458514
Mother language
ISO 639-1format (2 characters)- Ex: FR
Country
ISO 3166-1 alpha-2format (2 characters)- Ex: BE
Sex
- Should be one of
M(male) orF(female)
Person title
- Should be one of
Mr,MrsorMs
UTM Coordinates
- WSG 84 format, separator between X (latitude) and Y (longitude) should by
|, decimal separator should be. - Ex : 4.610927|50.675338
If your system can not extract data with above formats, you can define data transformations so as the ETL can apply them on your extracted raw data before integrating into Actito data model.
Check Transform section from more information on available transformations.
Subscriptions
Subscriptions are a specific type of attributes in a Profile table that represent the opt-in preferences of an individual. In the CSV file, a column is needed for each subscription.
- The expected value for subscription is a Boolean with "true" and "false" as possible values.
- In the "attributesMapping" parameter of the ETL, the "attributeName" value should be
subscriptions#xxxxwherexxxxis the name of the subscription.
Segmentations
Segmentations are a specific type of attributes in a Profile table that represent the business categories an individual can be part of. A segmentation can be 'simple' or 'exclusive'.
In the CSV file, a column is needed for each segmentation.
- For simple segmentations, the expected value is "Member" if the profile belongs to the segmentation and an empty value if not.
- For exclusive segmentations, the expected value is the name of the segment sub-category in which the profile must be inserted. If the segmentation is not mandatory, an empty field indicates that the profile should not be put in any category of the segmentation.
- In the "attributesMapping" parameter of the ETL, the "attributeName" value should be
S_xxxxwherexxxxis the name of the segmentation.
Multi-value attributes
Multi-value attributes are fields of a Profile table that can hold several distinct values at once. For example, "hobbies": "football, hockey, cycling".
- The separator
,must always be used between the different values. It cannot be customized. - If there are multi-value attributes in your ETL, make sure to use another character as main separator between columns.
,is never allowed in the values of the multi-value attributes (even when escaped).