Setting up an ETL to feed a Profile or a Custom table
Introduction
Automated ETL synchronizations are the flat file exchanges used to feed data into your Actito data model.
ETL stands for Extract - Transform - Load
The files are extracted from a file server (also referred to as 'file transfer location') shared between you and Actito, then, after applying optional transformations, the data is loaded into your profile or custom tables.
Once an ETL is set-up, it operates automatically: it is either scheduled at a defined frequency or triggered when a file is uploaded on the file transfer location.
This means that ETLs are ideal if your aim is to minimize developments and technical intervention. Exports from your CRM or partner system can be defined to be dropped on the file transfer location, where they will be extracted automatically by Actito and loaded into your data model.
Setting up an ETL can be done either by the Actito team, or by yourself, using our public API. It is indeed a one-off call, that does not need to be integrated into code by developers, but that can be carried out by any tech savvy user on an API client like Postman or Thunderclient.
This use case will guide you through the different step required to set-up an automated ETL synchronization.
Step by step
-
Define the file transfer location
-
1.a List the file transfer locations of your license
-
(1.b Create a new file transfer location)
-
-
Set up an automated ETL synchronization
-
2.a Set up a scheduled ETL synchronization
-
2.b Set up a triggered ETL synchronization
-
Step 1. Define the file transfer location
Step 1.a List the file transfer locations of your license
File transfer configurations are FTP(S) or SFTP servers used to transfer the data in ETL synchronization.
The server can be:
- Provided by Actito: in this case, it will always use the FTPS protocol
- Your own server: the supported protocols are FTP(S) and SFTP
Unsecured FTP are supported if you provide your own server, but not recommended.
If you use the FTPS provided by Actito, it will be set up in your license by the Actito team.
You can retrieve its id with the call :
GET /file-transfers/v5/entities/MyEntity/filetransfer-configs
This gives you the list of file transfer configurations in your license.
Note the id of the Actito FTPS.
{
"_embedded": {
"configurations": [
{
"id": "123456",
"name": "An_Actito_FTP",
"displayName": "Actito FTP",
"connection": {
"serverType": "FTP",
"parameters": {
"host": "ftp.actito.com",
"port": 21,
"passiveMode": true,
"encryption": "EXPLICIT_OVER_TLS"
},
"authentication": {
"type": "LOGINPASSWORD",
"login": "actito"
}
},
"_links": {
"self": {
"href": "/file-transfers/v5/entities/MyEntity/filetransfer-configs/123456"
}
},
"_audit": {
"createdAt": "2018-12-02T17:38:41.783Z",
"updatedAt": "2019-01-07T16:00:43.560Z"
}
}
]
},
"filter": {},
"page": {
"pageNumber": 0,
"pageSize": 200,
"totalPages": 1,
"totalElements": 3
},
"_links": {
"self": {
"href": "/file-transfers/v5/entities/MyEntity/filetransfer-configs?page=0&pageSize=200"
},
"first": {
"href": "/file-transfers/v5/entities/MyEntity/filetransfer-configs?page=0&pageSize=200"
},
"last": {
"href": "/file-transfers/v5/entities/MyEntity/filetransfer-configs?page=0&pageSize=200"
}
}
}
If there is no Actito FTPS set up in your license and you require one, please contact support or your account manager.
A file transfer config is made of file transfer locations, which represent the directories of the FTPS.
Typically, the Actito FTPS is configured with an /IN and an /OUT directory.
Retrieve the file transfer locations with the following call and the id retrieved at the previous step :
GET /file-transfers/v5/entities/MyEntity/filetransfer-configs/123456/locations
Note the id of the /IN folder so you can use it in your ETL.
{
"_embedded": {
"locations": [
{
"id": "987321",
"name": "incoming_folder",
"displayName": "Incoming Folder",
"path": "/IN",
"_audit": {
"createdAt": "2019-10-07T02:31:20.093Z",
"updatedAt": "2019-10-29T05:54:23.111Z"
},
"_links": {
"self": {
"href": "/v5/entities/MyEntity/filetransfer-configs/123456/locations/987321"
}
}
},
{
"id": "987654",
"name": "outgoing_folder",
"displayName": "Outgoing Folder",
"path": "/OUT",
"_audit": {
"createdAt": "2019-10-27T19:48:09.792Z",
"updatedAt": "2019-10-01T02:18:26.651Z"
},
"_links": {
"self": {
"href": "/v5/entities/MyEntity/filetransfer-configs/123456/locations/987654"
}
}
}
]
},
"filter": {},
"page": {
"pageNumber": 0,
"pageSize": 200,
"totalPages": 1,
"totalElements": 3
},
"_links": {
"self": {
"href": "/v5/entities/MyEntity/filetransfer-configs/123456/locations?page=0&pageSize=200"
},
"first": {
"href": "/v5/entities/MyEntity/filetransfer-configs/123456/locations?page=0&pageSize=200"
},
"last": {
"href": "/v5/entities/MyEntity/filetransfer-configs/123456/locations?page=0&pageSize=200"
}
}
}
Step 1.b Create a new file transfer location
If you want to use your own server (regardless of the protocol), you need to set up the file transfer config.
Use the following call and follow the specs to push the correct body depending on the server type: FTP(S), SFTP with login/password or SFTP with private key authentication.
POST /file-transfers/v5/entities/MyEntity/filetransfer-configs
If you use your own file transfer server, you must provide the IP address of this server to Actito for whitelisting.
Please forward them to support or your account manager.
Here we are configuring a SFTP with a SSH private key.
{
"name": "SFTP_3RD_PARTY",
"displayName": "SFTP Third Party",
"connection": {
"serverType": "SFTP",
"parameters": {
"host": "sftp.thirdparty.com",
"port": 22
},
"authentication": {
"type": "PRIVATE_KEY",
"login": "actito"
}
}
}
Note the id you receive as response.
For a private key SFTP, you also need to generate the key, with the following call (and the id of the config):
POST file-transfers/v5/entities/MyEntity/filetransfer-configs/456789/generate-ssh-private-key
You will obtain the public key required for the set-up of the server on your side.
After creating the file transfer configuration, you need to create the locations, which represent the directories of the (S)FTP(S) with the following call:
POST /file-transfers/v5/entities/MyEntity/filetransfer-configs/456789/locations
Here we are setting up the reference to the /IN folder, that we will use in our ETL:
{
"name": "input_folder",
"displayName": "Input folder",
"path": "/IN"
}
Please note the id you receive as response.
Automated ETLs must always retrieve the files on a file transfer server, enabling a no-code integration that runs automatically once it is defined.
If you want to import a file directly without going through such a server, it is possible to program API calls with a concept of one-shot ETLs.
Step 2. Set up an automated ETL synchronization
There are 2 types of automated ETLs relying on file transfer servers:
-
a scheduled ETL runs at a set frequency (based on a CRON expression): at the defined time, Actito will check that the file with the matching name pattern is present on the file server and extract it. A retry policy can be implemented to keep looking if the file is missing. This is the default type of ETL.
-
a file synchronization triggered ETL is based on a file synchronization, which is an active polling configuration: every 5 minutes, Actito checks whether a file with the matching name pattern is present on the file server and extract it. This means there is no set frequency: the ETL runs as soon as the file is available and it can be triggered multiple times per day.
Please consider the most relevant option depending on your scheduling constraints and the availability of the source file.
Step 2.a Set up a scheduled ETL
You can create a scheduled ETL using the following call and the SCHEDULED
type in the specs:
POST /mass-imports/v5/entities/MyEntity/etls
If the type of the ETL is declared as SCHEDULED
, the call must match the expected body of a scheduled ETL.
A scheduled ETL always contains the following parameters:
-
the "frequency" parameter in the "scheduling" object: represented by a valid CRON expression. A scheduled ETL can run once a day.
-
an "input" object in the "fileTransfer" object:
- the "remoteLocationId" parameter is the id of the file transfer location retrieved/created at step 1.
- the "retryPolicy" lets you define an interval after which Actito will check again, if the file was missing at the scheduled time of the ETL. You also need to define after how long Actito will keep trying, after which the ETL will fall in error if the file is still missing.
Instead of a file transfer location, you can also use the transferbox of the license, although it is unadvised for input files, as the transferbox cannot be used to build integrations.
However, output files (result or error files) can be dropped in the transferbox for manual review.
{
"type": "SCHEDULED",
"name": "erp-daily-integration",
"description": "The daily integration of all retail data coming from our ERP. Set up and managed by John Smith.",
"scheduling": {
"frequency": "0 0 5 * * ?",
"paused": false
},
"reportRecipients": [
"john.smith@actito.com"
],
"fileTransfer": {
"input": {
"files": [
{
"fileCode": "shop",
"fileNamePattern": "shops_$YYYYMMDD.csv",
"compressionType": "NONE"
},
{
"fileCode": "customer",
"fileNamePattern": "customers.csv",
"compressionType": "GZIP",
"compressedFileNamePattern": "customers_$YYYYMMDD.csv.gz"
},
{
"fileCode": "order",
"fileNamePattern": "orders.csv",
"compressionType": "ZIP",
"compressedFileNamePattern": "orders_$YYYYMMDD.zip"
},
{
"fileCode": "order_details",
"fileNamePattern": "orders_details.csv",
"compressionType": "ZIP",
"compressedFileNamePattern": "orders_$YYYYMMDD.zip"
}
],
"location": {
"type": "REMOTE",
"remoteLocationId": "1"
},
"deleteFilesOnSuccess": true
},
"output": {
"location": {
"type": "REMOTE",
"remoteLocationId": "1"
}
}
},
"inputFilesProperties": [
{
"fileCode": "customers",
"csvFormat": {
"encoding": "UTF-8",
"separator": ";",
"enclosing": "\""
}
},
{
"fileCode": "shops",
"csvFormat": {
"encoding": "UTF-8",
"separator": ";",
"enclosing": "\""
}
},
{
"fileCode": "orders",
"csvFormat": {
"encoding": "UTF-8",
"separator": ",",
"enclosing": "'"
}
},
{
"fileCode": "order_details",
"csvFormat": {
"encoding": "UTF-8",
"separator": ",",
"enclosing": "'"
}
}
],
"dataTransformations": [
{
"fileCode": "orders",
"transformations": [
{
"header": "purchase_moment",
"transformation": {
"type": "DATE_FORMAT",
"format": "MM-dd-yyyy HH:mm:ss"
}
},
{
"header": "withShipping",
"transformation": {
"type": "MAPPING",
"mapping": [
{
"values": [
"no"
],
"replacement": "false"
},
{
"values": [
"yes"
],
"replacement": "true"
}
]
}
},
{
"header": "delivery_address_country",
"transformation": {
"type": "COUNTRY_FORMAT",
"format": "ISO 3166-1 alpha-1"
}
},
{
"header": "webshop_language",
"transformation": {
"type": "LANGUAGE_FORMAT",
"format": "ISO 639-2"
}
},
{
"header": "delivery_contact_phone_number",
"transformation": {
"type": "PHONE_NUMBER_FORMAT",
"countryHeader": "delivery_address_country",
"defaultCountry": "FR"
}
},
{
"header": "delivery_address_street",
"transformation": {
"type": "STRING_NORMALISATION",
"normalisationRule": "TO_LOWER_CASE"
}
},
{
"header": "total_amount",
"transformation": {
"type": "STRING_NORMALISATION",
"rule": "REPLACE",
"text": ".",
"replacement": ","
}
}
]
}
],
"dataLoadings": [
{
"fileCode": "shops",
"destination": {
"type": "CUSTOM_TABLE",
"id": "bezoeiodif-sdfsdf-dfs-sdfsdf",
"attributesMapping": [
{
"header": "name",
"attributeName": "name"
}
]
},
"parameters": {
"mode": "CREATE_OR_UPDATE",
"generateResultFiles": false,
"generateErrorFiles": true
}
},
{
"fileCode": "customers",
"destination": {
"type": "PROFILE_TABLE",
"id": "1",
"attributesMapping": [
{
"header": "adresseEmail",
"attributeName": "emailAddress"
}
]
},
"parameters": {
"mode": "CREATE_OR_UPDATE",
"generateResultFiles": false,
"generateErrorFiles": true
}
},
{
"fileCode": "order_details",
"destination": {
"type": "CUSTOM_TABLE",
"id": "15",
"attributesMapping": [
{
"header": "quantity",
"attributeName": "quantity"
}
]
},
"parameters": {
"mode": "CREATE_OR_UPDATE",
"generateResultFiles": false,
"generateErrorFiles": true
}
},
{
"fileCode": "orders",
"destination": {
"type": "CUSTOM_TABLE",
"id": "16",
"attributesMapping": [
{
"header": "amount",
"attributeName": "amount"
}
]
},
"parameters": {
"mode": "CREATE_OR_UPDATE",
"generateResultFiles": false,
"generateErrorFiles": true
}
}
]
}
Step 2.b Set up a triggered ETL
To define a triggered ETL that will automatically check for new files, you must first configure the active polling by creating a file synchronization with the call:
POST /file-transfers/v5/entities/MyEntity/file-synchronizations
The "remoteLocationId" is the file transfer location retrieved/created at step 1, where the files corresponding to the "fileNamePattern" will be dropped. Various date patterns are accepted, as documented in the specs. The declared mimeType must match the file extension.
{
"name": "continuous-leadgeneration-filesync",
"userName": "Lead Generation Integration Flow",
"location": {
"type": "REMOTE",
"remoteLocationId": "1"
},
"fileDescription": {
"fileNamePattern": "leadgeneration_$yyyyMMdd.csv",
"mimeType": "text/csv"
},
"paused": false,
"postAction": {
"type": "DELETE"
}
}
As the synchronization could run several times on the same day, with the same date pattern, the postAction
is always to DELETE the file after retrieval.
After noting the file synchronization "id" that you receive in response, you can set-up a triggered ETL, using the following call and the TRIGGERED
type in the specs:
POST /mass-imports/v5/entities/MyEntity/etls
If the type of the ETL is declared as TRIGGERED
, the call must match the expected body of a triggered ETL.
A triggered ETL always contains the following parameter:
- the "fileSynchronizationId" parameter in the "triggering" object: this is the id you received at the previous step.
{
"type": "TRIGGERED",
"name": "lead-generation-automated-integration",
"description": "The integration of generated lead all along the day",
"triggering": {
"type": "FILE_SYNCHRONIZATION",
"fileSynchronizationId": "456789",
"paused": false
},
"reportRecipients": [
"john.smith@actito.com"
],
"fileTransfer": {
"input": {
"files": [
{
"fileCode": "leads",
"fileNamePattern": "leads_$YYYYMMDD.csv",
"compressionType": "ZIP",
"compressedFileNamePattern": "leadgeneration_$YYYYMMDD.zip"
},
{
"fileCode": "forms",
"fileNamePattern": "forms.csv",
"compressionType": "ZIP",
"compressedFileNamePattern": "leadgeneration_$YYYYMMDD.zip"
}
]
},
"output": {
"location": {
"type": "REMOTE",
"remoteLocationId": "1"
}
}
},
"inputFilesProperties": [
{
"fileCode": "leads",
"csvFormat": {
"encoding": "UTF-8",
"separator": ";",
"enclosing": "\""
}
},
{
"fileCode": "forms",
"csvFormat": {
"encoding": "UTF-8",
"separator": ";",
"enclosing": "\""
}
}
],
"dataTransformations": [
{
"fileCode": "leads",
"transformations": [
{
"header": "acquisitionMoment",
"transformation": {
"type": "DATE_FORMAT",
"format": "MM-dd-yyyy HH:mm:ss"
}
}
]
}
],
"dataLoadings": [
{
"fileCode": "leads",
"destination": {
"type": "PROFILE_TABLE",
"id": "1",
"attributesMapping": [
{
"header": "eMail",
"attributeName": "emailAddress",
"ignoreEmptyValues": false,
"ignoreInvalidValues": false,
"ignoreValuesWhenAlreadyKnown": false,
"mergeValuesWhenMultivalued": false
}
]
},
"parameters": {
"mode": "CREATE_OR_UPDATE",
"generateResultFiles": false,
"generateErrorFiles": true
}
},
{
"fileCode": "forms",
"destination": {
"type": "CUSTOM_TABLE",
"id": "bezoeiodif-sdfsdf-dfs-sdfsdf",
"attributesMapping": [
{
"header": "source",
"attributeName": "formSource",
"ignoreEmptyValues": false,
"ignoreInvalidValues": false,
"ignoreValuesWhenAlreadyKnown": false,
"mergeValuesWhenMultivalued": false
}
]
},
"parameters": {
"mode": "CREATE_ONLY",
"generateResultFiles": false,
"generateErrorFiles": true
}
}
]
}
You are done!
You've now defined your ETL, which will run at a fixed frequency (if SCHEDULED
) or when a matching file is detected (if TRIGGERED
).
If you've defined a "reportRecipient", they'll automatically receive the execution result, and the result and/or error files (if activated) will be dropped automatically on the "outputFiles" file transfer location.
Marketeers can also monitor ETL executions in the 'Manage imports' app of the UI.