Databricks - Zeotap docs

Overview

Databricks is a cloud-based data platform that unifies data engineering, data science, and analytics on a single collaborative workspace, built around Apache Spark. It enables organizations to process large-scale data, build machine learning models, and perform advanced analytics efficiently. Integrated with Zeotap, the customer can push the audience created in Zeotap to their Databricks instance.

Supported Identifiers

You can send any identifier or attribute of your choice from Zeotap CDP to Databricks using this integration.

Note:Sending of event data is not supported currently.

Available Actions and Supported Features

The following table lists the available action types for the integration and the supported features for each action type:

Action Name	ID EXTENSION	DELETE	DELTA UPLOAD
Send attributes and identifiers to Databricks	-	-	-

Prerequisites

Before you create a Databricks Destination in Zeotap CDP, ensure you have the following details ready.

DataBricks Host: Host can be found in the url for the databricks account.

DataBricks Access Token:Access token can be managed in Settings → User setting → Developer → Generate new token

Cluster ID:Cluster id can be found in the compute → cluster name → automatically added tags

Catalog: Catalog name within Unity Catalog or Databricks Metastore (e.g., zeotap_new).

Schema: Schema name where the table exists or will be created (e.g., db_promotion).

Table: Target table name for loading the data (e.g., activation).

GCP Bucket:The name of the GCS bucket where the exported data will be delivered (e.g., gcs-ireland-all-eu-qa-backend-export-zeotap-com). This is the client’s bucket where the data will be placed before being picked up by Databricks Autoloader into Delta tables.

GCP Project ID: Project ID owning the Client’s GCS bucket (e.g., zeotap-staging-datalake).

Service Account:

Client Service Account JSON: This is the Client Service Account provided by the client, which Zeotap uses to upload transformed audience (segment) data into the Google Cloud Storage (GCS) bucket. The uploaded data is then picked up automatically by Databricks Autoloader for further processing and loading into Databricks tables.
Zeotap Service Account: This is the Zeotap Service Account that is used to access the Google Cloud Storage account. Ensure that you whitelist the Zeotap Service Account to successfully push audiences (segments) from Zeotap CDP to Google Cloud Storage.

Create a Destination for Databricks

Perform the following steps to create a Destination for Amazon S3:

Log into the Zeotap CDP App and go to the DESTINATIONS application.

Click + Create Destination.

Under the All Destinations section, search for Databricks.

Click Databricks. A screen appears displaying details about the particular destination towards the left. On the right-hand side of the screen find a list of fields that are required for the integration to be established. Provide the required details as mentioned in the following steps:a. Enter a name for the Destination.b. Enter the Destination Instance Name.c. In the Databricks Host field, enter your Databricks workspace URL.d. In the Databricks Access token field, provide the Databricks personal access token for API authentication.e. In the Cluster ID field, enter the Databricks cluster ID where the job should run.f. In the Catalog field, enter the Unity Catalog name used for table registration in Databricks.g. In the Schema field, enter the schema (or database) name in the Databricks metastore.h. In the Table field, enter the table name where the data should be written in Databricks.i. In the Upload Type dropdown, choose the Upload Type to define the kind of connection or location to which you may want to push your data.

Currently only GCP upload type is supported.

j. In the Bucket field, enter the name of the Google Cloud Storage bucket where input files are stored.k. In the Project Id field, enter the Google Cloud project ID associated with your GCS bucket.l. Under Account, choose either Zeotap Service Account or Client Service Account Json using the drop-down menu based on the type of authentication you need. If you wish to use Zeotap Service Account, then ensure that you whitelist this it to push audiences (segments) from Zeotap CDP to Google Cloud Storage Platform.

If you choose Zeotap Service Account as the Account, then ensure that you whitelist the service account provided by Zeotap CDP to push the audiences (segments) from Zeotap CDP to Google Cloud Storage Platform. This service account information auto-populates under Service Account to be Whitelisted as shown in the image below.
If you choose Client Service Account Json as the Account, then you need to upload the JSON file with the required authentication information using the + Select File option so that Zeotap CDP can push the audiences to Google Cloud Storage Platform.

In the new screen that appears, choose the appropriate action and mapping as explained below. Under Choose your Action, Send JSON to GCS as the action for activating your audience (segment) in Audiences.a. You can send any number of identifiers and attributes to your Databricks instance using this action.

Click Create Destination. The created Destination gets listed in the Audiences application, which can be linked to an audience.

​Overview

​Supported Identifiers

​Available Actions and Supported Features

​Prerequisites

​Create a Destination for Databricks

Overview

Supported Identifiers

Available Actions and Supported Features

Prerequisites

Create a Destination for Databricks