Building Funnel’s Data Platform for Marketing Analytics

Written by Funnel Dev | Dec 5, 2024 9:30:45 AM

By: Jonas Björk, CTO at Funnel

Funnel’s data platform is architected and designed to make it easy for customers to integrate all of their marketing data and make it business-ready. To achieve this, Funnel pulls data from across multiple providers and stores the data so it can be efficiently queried and transformed. Funnel’s storage layer enables the data to be served to Funnel's proprietary applications for reporting, analytics, activation and measurement, as well as to third party data tools such as BI, spreadsheets, and more. As marketing data is only a part of our customers’ full scope of data, we also export the same data they are working with in Funnel to their data warehouse, and enable cross department data use-cases that go beyond the marketing scope.

Data Integration

Funnel has over 500 connectors that integrate data from all the providers that our customers need in their use case, and in addition we offer development of bespoke connectors for providers that aren’t part of the Funnel connector catalog already. Most providers, such as Meta Ads or Google Analytics, offer APIs for data export or reporting, but often these APIs aren’t designed to enable cross-provider analysis. Cross-provider analysis is a core use case of Funnel, and so our connectors are not designed to just extract and load data as is from the provider APIs, they are designed to extract and structure data for analytical purposes. For example, we may transform the data e.g. denormalisation of datasets and entities, to make sure that the data is structured and ready to be queried across providers. At the same time, we do store the raw data from the provider, except some basic cleaning (datetime normalization, trimming whitespace, etc). This enables customers with their own data warehouses to export raw provider data from Funnel, as well as transformed data for cross-provider analytics to suit their needs. Our connectors are written in Python and have a fairly wide range in code complexity depending on the complexity of the provider APIs—especially around the aggregation level options—as well as how much transformation is needed to structure the data for querying and analysis.

We approach data integration with marketing data and “business ready” as the focus, with all the quirks that exist in the marketing domain, as well as in the marketing data providers’ platforms. Our data platform isn’t designed to “just schedule” a cron job to extract data as is from a platform. It is designed to automatically handle e.g. mutable data that we know will be updated multiple times at the provider after being downloaded to Funnel. For example, providers’ reporting metrics can be available with different delays and some providers only compute them once per day, and also mutate historical data in their daily runs. So rather than having e.g. a cron schedule for downloading data, our scheduler is built to optimize for “data freshness” and provide eventually immutable partitions. Funnel automatically partitions data, eliminating the need for manual partitioning by users. Each partition has its own time-to-live which depends on the platform and the type of data, and the Funnel platform refreshes the data by downloading a newer version of it from the provider. We average about 7 million download jobs per day and download about ~20TB data per day.

Funnel’s storage layer is a distributed and fault-tolerant system backed by Amazon Web Services S3. Previously we used a proprietary storage format but have recently migrated to Parquet. Funnel’s download scheduler and connector run-time is a proprietary system designed specifically for marketing data, optimizing for “data freshness.” In order to ensure a secure platform for our customers, we sandbox the download jobs and create a restricted environment that only has access to signed paths in storage. This guarantees complete isolation between tenants as well as minimizes the risk of any security breach from misconfiguration or bugs in the connectors. Both our download scheduler and our connector run-times are written in Rust to ensure a safe and performant platform.

Data Transformation

One thing that differentiates Funnel from our competitors is that we store the raw data in a structure optimized for analytics, and then all actual transformation of the values happens when querying and reading the data. Funnel comes with a set of pre-defined Fields that enable the customer to transform data across providers into a single dataset that can be used for reporting and analytics, or exported to a data warehouse to enable easy integration of the marketing data with the rest of the company’s data.

Funnel does not need to up-front compute and store any computed values of Fields; the platform is designed to enable the user to modify their business logic and see the results of those changes without having to wait for a batch job to be scheduled and run. Analytics on marketing data is a challenging and dynamic problem. For companies analyzing marketing data at scale across many markets, it is a constantly evolving challenge to define and stick to campaign naming conventions, tracking and conversion targets, and in general keep a sound data structure. We enable this with an immediate feedback loop directly in Funnel due to our architecture and design. The request load on our query engine is a mix of exports and live querying and we are currently averaging about 1.7 million queries per day, with a “spiky” request behavior pattern.

Our no-code UI for defining Fields makes it easy to define how to analyze data across providers, offering flexibility for each customer's specific conventions. Our proprietary query engine computes Fields at run-time, with computations running on workers in a distributed compute system powered by Amazon Web Services EC2 and Lambda. This part of the Funnel platform has organically evolved over the years and we are now in the process of rewriting parts of it, which includes moving this part of the system to Rust. We recently released a rewrite of our file compaction for better control over chunk sizes in memory, and migrated to Apache Arrow as the in-memory columnar format. We’re investing in more transform capabilities and looking to rewrite parts of the transformation layer in Rust as well.

Ongoing Initiatives

Our decision to gradually move to Rust—especially for systems that are part of the infrastructure used to process customer data—are due to Rust’s increased traction for generic infrastructure where all the three major cloud vendors are writing more and more of their infrastructure components in Rust. We see additional traction in the data infrastructure ecosystem for Rust; one example is that a Rust based query engine, Apache DataFusion, took the lead in querying Parquet files according to the ClickBench benchmark now in November 2024. Rather than introducing the JVM or C++ based systems in Funnel, we are doubling down on Rust as the default programming language for infrastructure, and Python for connector libraries. We have started to identify and leverage open source components that fit well within our data platform and are looking at doing that more, as well as contributing to these ecosystems going forward.

From our customers’ perspectives, our data platform solves problems for which a number of tools would be needed in a bespoke solution architected around a cloud-based data stack. The entire stack would likely consist of a data integration tool, a data modeling tool, a data warehouse and a BI/reporting tool. Even if the customer already has some of these tools in use within their organization, they aren’t built specifically for marketing use cases. Considerable time and effort is needed to develop the initial solution as well as to maintain it, since there are so many dependencies to external providers of data that are constantly evolving with updates and breaking changes in their APIs. Thus, Funnel’s data platform is an end-to-end solution for marketing data.

We keep evolving our data platform to not just be the best solution for the challenges our customers are faced with today, but to be an outstanding platform in which marketing data is easy to work with even as the industry and the problem space continues to evolve. Many of the things that we do overlap with tools in a generic cloud based data stack, but that’s not what we’re building. We are building the world’s largest, most capable, and performant data platform for Marketing Intelligence.

View full post