Best Practices for Flat File Source

This page lists out the best practices that we recommend for Flat File Sources.

At Least One User ID per Source

Each file must have at least one user ID. This user ID can be the client’s first-party user ID (must be the same as sent in the web JS or app SDKs) preferable along with a digital identifier like MAID that can be used for user matching.

Accurate Country to Region Mapping

Ensure that the files uploaded belong to the correct region as per the country of the records. That is, Spanish data must not be uploaded to the US bucket. In case you have data or users belonging to multiple countries, start by splitting your files based on the countries (see Appendix below) and then upload the files to the appropriate region buckets to ensure data privacy and compliance.

Same File Format for Multiple Files

When uploading multiple files under a source, ensure that all files are of the same file format and delimiter as defined at the time of creation. For example, if the source was created as a CSV file with the delimiter semicolon (;), then ensure that all the subsequent uploads to this source are of the same format. Failure to do this results in data corruption. Note that CSV formats must not have "" and they must be separated by a comma (,) only. Click here to download the sample file.

Casing of Column Names

When uploading multiple files under a source, ensure that the same attributes are always named the same, with the same casing. This ensures proper mapping at the collection level. Otherwise, it is treated as a new column.

Configure the Correct Path

If the file upload is not set up from the Collect UI but through other methods, ensure that the file upload happens to the exact path as mentioned in detail. Otherwise, it can impact the processing of this data and subsequently the reporting and segment creations.

Country Data in the File

It is mandatory that records are tied to a country in order to create data collections. Ensure that you always attach the country as a field in the files. Otherwise, the country has to be hardcoded later. It is recommended that you use alpha ISO 3 country codes for sending the country information.

Number of Fields

While there’s no restriction on the schema, it is recommended that you check the Zeotap catalogue and start with only the fields that are relevant to the source. You can create new sources for newer data points you want to capture. If new fields are sent for the existing sources, then ensure that you map the new fields to start ingesting the data.

Validation to be performed on Files, Catalogue and Mapping

Mentioned below are the validation to be performed on files, catalogue and mapping.

On File

Ask the customer to share a sample data file with the actual data. Validate the sample file to ensure the following points:

Encoding type is one of the supported types
File format matches the selected source option
Delimiter matches the selected option
No fields are repeated
All columns must have a header (post-ingestion check if the Header column appears as _CX in the Preview section)

On Catalogue

After a source is created, ask the customer to push a sample data file with the actual data. The values passed for the field in the sample file must be in line with the catalogue definition. Validate the sample data against your catalogue to ensure the following points:

Data type
Attribute type
Date field’s timestamp format is one of the acceptable formats
Country enricher must be ISO 2 or ISO 3 or hardcoded
Field name (casing) must be consistent across the Source catalogues as well as it must be repeated

On Mapping

Before saving the mapping ensure the following points:

The fields are mapped to the correct Zeotap field.
Enrichers are applied wherever required and as per the incoming value.
Select Timestamp from the list of supported formats. Once selected, the system continues to expect the same format throughout the source’s lifetime, otherwise, the ingestion may fail.

​At Least One User ID per Source

​Accurate Country to Region Mapping

​Same File Format for Multiple Files

​Casing of Column Names

​Configure the Correct Path

​Country Data in the File

​Number of Fields

​Validation to be performed on Files, Catalogue and Mapping

​On File

​On Catalogue

​On Mapping