Too Big to Paste Into ChatGPT: Customer Transcript Compression

Written by Funnel Dev | Feb 26, 2026 10:37:05 AM

By Ola Gustafsson, Data Scientist at Funnel

You have 10000 transcripts, now what?

Transcripts are everywhere, but recorded speech quickly gets overwhelming when transferred to text. As a B2B SaaS company, it's common for us to have more than fifty fully transcribed calls with one customer organization within a few months. Preparing for the next call or QBR and keeping track of who said what to whom is a challenge.

Even if you paste all of that conversation text into your LLM of choice, you will run out of room, consuming the context window you have available, and quicker than you think. The product and dev organizations have an even greater challenge. They want to know what feature requests or needs have been brought up by customers across all conversations. Meanwhile, marketing is asking how customers reacted to changes in pricing. The answers exist, but buried in transcripts we already record. The real problem is scale.

Why the Obvious Workarounds Break

You could do recursive summarization, but you'll lose resolution with each pass; who said what, and when, gets flattened away.

A solution with retrieval augmented generation (RAG) will go a long way, but eventually it will have used up your token budget and filled the context window with information that you don't care about, muddying up the results. It is bad at tracking decisions over time or aggregating.

Either way, RAG is impractical, bordering on impossible. It isn't auditable, traceable back to the source. It's just too big to paste into ChatGPT. Text doesn't aggregate well. Until now.

We have found a way to query and get very relevant answers from the unstructured, qualitative data. Questions like these become answerable:

How should I prepare for my quarterly business review with customer X?
Why are we really losing deals, by segment?

Here’s how it works at Funnel.

The Core Idea: Lossy Compression Into Structured Claims

What fundamentally changes things is when we start looking at conversation transcripts not as documents or chunks but as "claims." The same holds for any text where people make claims; support threads, email, account notes, survey comments, not only call transcripts. By “claim” we mean a statement or assertion, something that cannot be broken down into smaller units without losing meaning. They are the atomic units. We have the LLM extract claims from each chunk of the conversation in parallel, and for each one we capture a set of structured metadata:

What kind of assertion is it (decision / hypothesis / question / assumption / observation)? This is the epistemic status, telling us something about the degree of certainty with which the claim is made.
Who said it? Who was the actor stating it? Is it an internal or external actor, and are they identifiable in some way?
How committed is the speaker to the claim? (proposed / committed / completed / deferred)
What is the statement about? (scope: customer, platform, channel, market). Is there a familiar frame of reference for it?
At what point in time did the claim originate?

We can call this a "lossy compression" because it discards most of the conversational noise and keeps the signal. A conversation of 30 minutes can result in 50–100 independently addressable claims. Looking at them, most are trivial. But that is the point. We're able to discard them at query time and only keep that which is valuable in our context window. It is also what makes text meaningfully aggregatable.

How It Actually Works (Pipeline)

We organize this as a chain of events:

The document is chunked and passed to an LLM, prompted to extract claims in a structured format (pydantic/json model).
Validation takes place, empty claims are rejected, low-confidence ones are flagged and unresolved scope generates warnings rather than blocking.
The scope of the claim is resolved into known dimensions using matching and taxonomies (e.g. "TikTok," "the US campaign").
Claims are indexed in a database with vector embeddings and keywords (BM25) to enable search.
Querying of the claims is done using natural language and hybrid search, combining semantic vectors and keywords.

Only the most relevant claims fill the context window; the rest are discarded at query time. On top of this query engine sits an AI agent that iteratively selects, filters and slices claims to best answer the question asked.

What Questions Become Answerable?

By doing this, we enable answering questions that you previously could not.

Sales: Which accounts show escalation language? What commitments are still open?
Product: What integrations are customers requesting? What workarounds are they describing?
Marketing: How do customers describe our value in their own words?
Account intelligence: How should I prepare for my QBR with customer X? What has customer X been talking about in the last 90 days?

These questions have depth (all conversations for one customer) and breadth (one question across all customers) and they require counting, filtering and tracking state transitions, things summarization can't do, all while providing source references back to the original conversation if you want quotes to support the findings.

The Agent Layer: Slicing Context Across Dimensions

This allows for the creation of tools that we put in the hands of an analyst, our AI agent. Each tool has a distinct purpose and capacity that the agent decides to use in order to reach the objective:

Discover filters: Inspect what values exist for a dimension (e.g. what product areas, what regions, what epistemic statuses) before committing to a query.
Search: Hybrid semantic + keyword retrieval with multi-dimensional filtering (e.g. classification, scope, entity IDs, time range, participant identity, keywords, confidence threshold).
Count: Aggregate claims grouped by any dimension (e.g. customer, intent function, product area, actor). This is how "which customers have the most escalations?" gets a number, not a narrative.
Query structured data: The agent also queries the data warehouse for quantitative metrics (e.g. ARR, MRR, pipeline, deal values).
Visualize, render charts from SQL results.

The real unlock is cross-source coordination. The agent bridges qualitative claims and quantitative business data:

Quantitative → Qualitative: "MRR dropped for Acme, why?" The agent pulls the MRR data, identifies the customer, then searches claims for the reasoning behind the drop.
Qualitative → Quantitative: "What are the most requested features, weighted by ARR?" The agent searches claims for feature requests, collects the customer IDs, then queries the data warehouse for their ARR to weigh the results.

Entity-scoped deep dive: "Prepare me for my QBR with customer X." The agent pulls claims, metrics, deal status and usage for one customer in a single pass.

New Level Unlocked: Combine Qualitative and Quantitative

Analytics from here take on a different character altogether. You can have the agent pull up charts, while also telling you how and why the numbers are what they are. Some example queries:

Revenue-weighted qualitative analysis: "What are the top objections, weighted by deal value?"
Causal bridging: "Pipeline conversion dropped, what are reps and customers saying differently this quarter?"
Entity-scoped deep dives: "Prepare me for my QBR with customer X", pulling claims, metrics, deal status and usage in one pass.

Dimensional counting with business context: "Which product area generates the most escalations from EMEA customers above $50K ARR?"

You will get descriptions and be able to generate hypotheses about what makes things different, as well as why they happen. These hypotheses are how we learn, by observing and making connections. This can inform what hypotheses to test with causal methods like incrementality testing or marketing mix modeling.

Auditability: Answers You Can Check

There is always a trace back to the source documents. You can go back and verify by reading the quotes. Citations are validated at synthesis time and hallucinated references are caught. The output is inspectable: counts are counts, not vibes.

What This Costs You

Lossy compression is genuinely lossy. Nuance, tone and hedging can be flattened; the claim schema is a choice about what to keep and what to discard. Taxonomies that map free-text mentions to canonical dimensions need ongoing curation as vocabulary evolves. Resolution coverage varies by dimension, and some claims stay partially unresolved. And extraction quality depends on the underlying LLM which means it can drift. None of this is free to run; extraction costs LLM calls per chunk and indexing costs embedding calls per claim.

These are real operational costs. The bet is that structured, queryable, auditable answers across thousands of conversations are worth more than the nuance lost in compression.

Closing: From "What Happened" to "Why"

The combination of claims (qualitative data) and metrics (quantitative data) with a reasoning layer you can talk to will transform your understanding of the business and what goes on:

Metrics tell you what happened. Claims tell you why, according to whom and when.
This is not a chatbot over documents. It is a pre-computed structured data layer that makes unstructured text aggregatable.
The value is not the summary; it is the ability to count, filter and trace.

View full post