Skip to Main Content



How education incorporates data warehouses and data lakes into their Salesforce strategy

July 23, 2021By
girl with backpack

Incorporating data warehouses and data lakes

A common question our customers with a university-wide Salesforce presence have is, how can they incorporate a data warehouse into their strategy. Many institutions striving to achieve a 360-degree view of the constituent have a vision of a “single source of truth” within Salesforce. However, upon diving into strategic planning, some institutions find they encounter limitations when attempting to use Salesforce as a repository for all the information they hope to capture long term (think everything from year-over-year program insights, all the way down to records of student interactions with their advisor or a donor’s direct mail history). Colleges and universities hope to leverage larger data sets to drive engagement and decision making, and that amount of data can start to tax the CRM over time leading to performance issues. That is where a data warehouse or data lake comes in.

In a slight departure from past “single source of truth” messaging, Salesforce has pivoted to support the needs of enterprise customers through their new partnership with Amazon Web Services (AWS) to support nonprofit and education. This takes the form of a tighter integration with AWS via declarative configuration from within the Salesforce platform, making use of AWS easier for customers. Additionally, (SFDO) makes the use of data warehouse and data lake technologies more accessible with heavily discounted rates to both nonprofits and education.

Some higher education institutions may elect to leverage solutions other than AWS, but regardless of approach, customers need to know where, when, and how data warehouses and data lakes support their enterprise CRM strategy. Below, we shed light on issues in education that can be addressed using a data warehouse strategy and walk through considerations for defining your institution’s path forward.

Getting started with data warehouses and data lakes

Before we can start designing solutions around the technologies available from AWS and how they complement the data already stored in Salesforce, we need to talk about data warehouses and lakes. Both are used as a place to store large amounts of data, but there are several differences that need to be considered before determining which is best for specific use-cases.

Data warehouse vs. data lake

  • What are data warehouses? a purpose-built database with a defined structure that powers business intelligence and analytics activities. Generally, a business/staff member is the intended user, and the data warehouse structure ensures that reports and KPI's are being generated consistently without the user defining the schema or structure themselves.
  • What are data lakes? stores raw data; it may contain structured, semi-structured, and unstructured data, in many different formats. Typically data scientists and analysts are the primary users of data lakes, where they’ll determine how to use the data and define the structure for each use-case independently. Data lakes are typically built as a data repository and don't always need predefined KPIs or reporting use-cases to exist, but that doesn't mean they can be ungoverned; without careful planning and governance, a data lake can quickly deteriorate into a data swamp.
  • Example: Practically speaking, a data warehouse is what would power a Tableau app that, for example, Recruiting and Admissions staff use to monitor prospective students through the intake process. A data lake may contain an archive of all historical engagement data that an analyst would use to plan and model future campaigns.

With these definitions in mind, we walk through three typical higher education use cases and provide our recommendations on how to use big data in education.

Use case 1: 360-degree view of a constituent

To consider the data necessary to support a 360-degree view of a constituent in education, let's break down a common student life cycle. The image below illustrates how many potential activities exist between the prospect through to the alumnus status.

Source: Traction on Demand, Education Practice
student lifecycle

In the Salesforce Education Data Architecture, we leverage a common Contact record and Administrative Account, then add related Affiliation, Relationship, Opportunity, Case, Notes, Files, and Program Enrollment records at each step of the student lifecycle. Tack on Attributes, Campaigns and marketing automation information, and all of a sudden we have hundreds of records per student, per year. Multiply that by decades of enrollment, add on other constituent information (such as non-alumni donors and corporate relations), and the university starts to experience cost and performance issues (like account data skew and ownership skew).

Our recommendation

Keep the critical data in Salesforce; this includes the Contact record, plus any recent Opportunities, Cases, and other data points that users still need to interact with on a daily basis. Older records that might still need to be referenced but not actually used or modified can be moved to an archive on AWS. Don’t worry, the data is still accessible without forcing users to leave Salesforce. There are two common patterns our clients use to show “external” data to the Salesforce user to ultimately give them a 360-degree view of the Contact’s lifetime history with the organization:

  • Salesforce Connect (External Objects) - Using a connector to the AWS data warehouse, the legacy data can be displayed in a Related List on the record page. For the end-user, this creates a seamless experience where the transaction-level information is displayed like normal related records, but doesn’t take any storage space in Salesforce.
  • Tableau CRM - By integrating the data from AWS (via the Tableau CRM S3 or Redshift Connectors), all archive data can be replicated to a Tableau CRM dataset, which can then be displayed on the record page (via an embedded dashboard component). Again, no Salesforce storage is consumed.

Use Case 2: Valuable historical insights at the program-level

With revenue margins slimmer than ever and the mandate to support new modalities like blended learning, academic programs are tasked with advanced planning of future program size, courses, resources, facility utilization, student retention, and more—all with incredible accuracy. As a result, we turn to past performance to identify trends and analyze historical data to develop reliable projections. When looking year-over-year and even term-over-term, splicing data countless ways, staff want as much information at their fingertips as possible. But is there value in housing so much historical data in your CRM simply to support periodic snapshot reports?

Our recommendation

This is where a data lake becomes helpful. Data lakes can empower the analysis of the vast, complex, and highly granular data sources that are generated through student interactions with LMS and other platforms. Blended with CRM-sourced insights, these granular data sources can reveal trends in student engagement, retention, interests, and successes that are key to the success of an academic program—but may not be feasible without the support of a data lake. For instance, a program that’s just launched new online courses will want to understand how students interact with those resources and how those students take advantage of other on-campus resources like tutoring and advising. Blending CRM and LMS data, made possible through well-structured and governed data lakes, empowers the complex analytics that can drive improvements in academic programs.

Storing massive volumes of data from different sources is no longer expensive. By using Amazon S3 as a data lake, all of your CRM, LMS, SIS, website analytics, and email engagement data (plus any other data sources), can be stored and used for performing advanced analytics. If Amazon S3 is just the storage, how do you actually use all of this information? That’s where the Tableau connector for Amazon Athena comes in. By connecting Tableau directly to the data lake, analysts can more easily slice and dice that information without provisioning ETL tools or writing custom API code.

Use Case 3: Institutional reporting for strategy development

Similar to program-level insights, universities must rely on accurate data to drive short- and long-term strategic planning. A culmination of program, enrollment, application, operational, investment, philanthropic, and grant information helps those performing institutional-wide analysis to inform long-term planning. CRM is a key component in the production, management, and analysis of this information, but implementing a data warehouse can power the blending of multiple and complex data sources in a way that’s vital to institutional strategy development and planning.

Understanding the impact of the institution on students, whether for accreditation, continuous improvement, or strategic planning, requires constructing data sources beyond the CRM. To support this capability, there must be access to large data sets, reconciled and standardized across a variety of sources. Institutions embarking on big data projects aimed at improving student outcomes are harnessing granular data sets that capture interactions with systems and services across the institution. Together with the core of CRM records, these additional strands of data allow institutions to produce vital insights into the complex patterns and interactions that impact student outcomes.

Our recommendation

This is where a structured data warehouse is most useful. Using the data that is already stored in the data lake, a warehouse can be constructed using Amazon Redshift and access across the organization is provided by interactive Tableau dashboards. This same information can be used to build models with Einstein Discovery to predict the risk of student attrition, or the likelihood for alumni to donate, so likely outcomes can be proactively managed by student advisors and fundraisers in Salesforce.

Bringing it all together

We’ve mentioned a lot of different technologies in this post, so this might help illustrate how it all fits together.

Source: Salesforce
salesforce integration with amazon data warehouse and amazon data lake

Salesforce is a powerful solution, but it’s only as useful as the information within it, and at Traction on Demand, we often say that data is our bread and butter. To support your institution's data needs from basic data hygiene to complex integrations, Traction on Demand has you covered with our dedicated Analytics and Integrations practices that supports your implementations. And with Propel, our Managed Integration Solution, organizations wanting to focus on their business purpose, rather than manage their integration middleware technology can benefit.

Data is our thing, let’s make it your thing too

We’re excited to announce an official partnership with AWS to be a preferred partner to support your institution's data strategy, including implementation of data warehouses and lakes on AWS, integrating all of the data from Salesforce and other sources, and using analytics to make better decisions based on all of that data.

From basic data hygiene to complex integrations and data warehouse strategies—we have a team that can help

Have questions about the approach that best fits your institution’s needs? Our dedicated teams can help you reach your goals.

Discover more