If you’re a Salesforce Admin or Developer and have never heard of Data Pipelines, don’t worry–you might be surprised by how much you already know about it. Data Pipelines may seem like Salesforce’s best new feature, but it’s actually already six years old. This seemingly new addition is actually part of Tableau CRM, but it can now be used separately as a no-code Extract, Transform, and Load (ETL) tool, inside of Salesforce.
Data Pipelines is the data preparation engine inside of Tableau CRM. It can be used to digest data from Salesforce as well as external sources, perform complex transformations against that data, and then output those results to Salesforce records, external sources, or to Tableau Online and Tableau CRM datasets for analysis and visualization. The best part is it can do all of this without writing any code—which makes it another one of the awesome declarative options in a Salesforce Admin’s toolkit.
This opens the door to many potential use-cases like integrating to external systems (or to multiple Salesforce instances), data cleansing and standardization, and running complex calculations across Records and Objects. It even includes some machine learning powered functionality like sentiment detection and clustering—that was previously only available to developers that could write code against those APIs.
Two main components of Data Pipelines
Connections allow you to connect to your Salesforce data, as well as over 50 external sources, such as AWS (S3, Redshift, etc.), Azure, Snowflake, Marketo, NetSuite, and more. Connecting to these sources allows the data to be brought into Data Pipelines so it can be used in a Recipe.
A Recipe is the data preparation editor that can bring in data from any connected source, and apply transformations to it. It will then output back to a Salesforce Record, an Amazon S3 bucket, a Snowflake database, an Azure Data Lake, or even to a CSV file. It can also be used to register a Tableau CRM dataset, that can power advanced dashboard apps or generate Einstein Discovery predictions (like the propensity for a customer to buy a product or service, likelihood for a service case to escalate, or probability for a customer to cancel their service).
The best part of using Connections and Recipes is that they allow Salesforce Admins to build complex data transformations without writing any code, and move your data around without having to buy extra ETL applications or waiting for IT teams to set up custom integrations.
How do you use Data Pipelines?
To frame up Data Pipelines in the context of real world applications, here are some common use-cases we’re implementing for our customers:
- Simple Extract, Transform, Load (ETL)
As previously mentioned, Data Pipelines is a great alternative to implementing an off-platform (i.e. an application outside of Salesforce) ETL application. With a list of 50, and growing, supported external sources for ingesting data, and with the ability to output to AWS, Azure, and Snowflake, an integration built and managed with Data Pipelines reduces the complexity and cost of basic ETL jobs.
- Weighted scoring models
Forget about trying to build an aggregate score using a formula field or Apex; Data Pipelines eliminates the limitations with formula fields, and you don’t need to know how to write Apex code. A weighted scoring model lets you assign points based on any attributes of a record (e.g. account type, age, status, etc.) and add that to point values generated from child objects (e.g. lifetime opportunity value, open case counts, most recent survey results, etc.) and generate a value that represents a customer’s “Health Score” based on how your organization defines a healthy customer. Take weighted scoring to the next level by including additional data points from the Enterprise Resource Planning (ERP) and Marketing Automation platforms (or even your proprietary systems) using the supported external connectors.
- Roll-up fields
In the past, we needed to use third party AppExchange apps or write custom Apex code in order to generate roll-up summary fields for child records without a Master-Detail relationship. Calculating these same roll-ups in a Recipe is a much easier option. Be aware that a Recipe runs on a schedule, so you won’t want to use this option for real-time scenarios, but is still great if hourly or daily latency is acceptable.
- Data blending
Imagine being able to see your customers’ most recent order or invoice information from the ERP without all the extra overhead of setting up a middleware integration between those systems—that’s where Data Pipelines comes in. Results can be written to new Records in Salesforce, or if you don’t want to consume valuable Object storage, they can be aggregated and written to custom fields if you only need summary level information.
Since it’s already built into the Tableau CRM application, which itself is the on-platform data visualization and predictive analytics solution, it’s obvious that using Data Pipelines to prepare data for dashboards and predictions is a no brainer. But with the recent addition of a Tableau Online output connector, it’s now the best way to get your Salesforce data into Tableau Online, in the form of .hyper files. Also, pulling the data from external sources and registering a Tableau CRM dataset, as an alternative to writing the data to an Object with an integration is a way to “virtualize” the data in Salesforce without consuming (and paying for) additional storage.
- Data cleansing
The Transformation node in Recipes includes powerful functions that can keep Salesforce data clean and values standardized. My favorite of these transformations is “Predict Missing Values”, where blank fields are filled in based on values in other strongly correlated columns. Other functions allow bucketing of values, standardizing formats, and clustering based on common characteristics.
Additionally, a column “Profile” feature is built into Recipes, so you can view which field values occur most frequently and view completeness (valid vs. missing values). This gives you a better understanding of the composition of the data while you’re working within the Recipe.
- Einstein predictions
No longer constrained to Salesforce data only, now you can run external data through a predictive model built with Einstein Discovery in a Recipe, then output the prediction result to one of the supported sources using an Output Connector.
Data Pipelines will allow better decision making
Data Pipelines is a game-changer for all of us who work with data on the Salesforce platform. In the past, the use-cases covered in this post would have required extra development overhead and writing a lot of complex code. Today with Data Pipelines, you can point and click your way through building external integrations, managing and cleansing your data, and ultimately allow better decisions to be made using all of the data you’re collecting.