By Ola Gustafsson, Data Scientist at Funnel
Transcripts are everywhere, but recorded speech quickly gets overwhelming when transferred to text. As a B2B SaaS company, it's common for us to have more than fifty fully transcribed calls with one customer organization within a few months. Preparing for the next call or QBR and keeping track of who said what to whom is a challenge.
Even if you paste all of that conversation text into your LLM of choice, you will run out of room, consuming the context window you have available, and quicker than you think. The product and dev organizations have an even greater challenge. They want to know what feature requests or needs have been brought up by customers across all conversations. Meanwhile, marketing is asking how customers reacted to changes in pricing. The answers exist, but buried in transcripts we already record. The real problem is scale.
You could do recursive summarization, but you'll lose resolution with each pass; who said what, and when, gets flattened away.
A solution with retrieval augmented generation (RAG) will go a long way, but eventually it will have used up your token budget and filled the context window with information that you don't care about, muddying up the results. It is bad at tracking decisions over time or aggregating.
Either way, RAG is impractical, bordering on impossible. It isn't auditable, traceable back to the source. It's just too big to paste into ChatGPT. Text doesn't aggregate well. Until now.
We have found a way to query and get very relevant answers from the unstructured, qualitative data. Questions like these become answerable:
Here’s how it works at Funnel.
What fundamentally changes things is when we start looking at conversation transcripts not as documents or chunks but as "claims." The same holds for any text where people make claims; support threads, email, account notes, survey comments, not only call transcripts. By “claim” we mean a statement or assertion, something that cannot be broken down into smaller units without losing meaning. They are the atomic units. We have the LLM extract claims from each chunk of the conversation in parallel, and for each one we capture a set of structured metadata:
We can call this a "lossy compression" because it discards most of the conversational noise and keeps the signal. A conversation of 30 minutes can result in 50–100 independently addressable claims. Looking at them, most are trivial. But that is the point. We're able to discard them at query time and only keep that which is valuable in our context window. It is also what makes text meaningfully aggregatable.
We organize this as a chain of events:
Only the most relevant claims fill the context window; the rest are discarded at query time. On top of this query engine sits an AI agent that iteratively selects, filters and slices claims to best answer the question asked.
By doing this, we enable answering questions that you previously could not.
These questions have depth (all conversations for one customer) and breadth (one question across all customers) and they require counting, filtering and tracking state transitions, things summarization can't do, all while providing source references back to the original conversation if you want quotes to support the findings.
This allows for the creation of tools that we put in the hands of an analyst, our AI agent. Each tool has a distinct purpose and capacity that the agent decides to use in order to reach the objective:
The real unlock is cross-source coordination. The agent bridges qualitative claims and quantitative business data:
Entity-scoped deep dive: "Prepare me for my QBR with customer X." The agent pulls claims, metrics, deal status and usage for one customer in a single pass.
Analytics from here take on a different character altogether. You can have the agent pull up charts, while also telling you how and why the numbers are what they are. Some example queries:
Dimensional counting with business context: "Which product area generates the most escalations from EMEA customers above $50K ARR?"
You will get descriptions and be able to generate hypotheses about what makes things different, as well as why they happen. These hypotheses are how we learn, by observing and making connections. This can inform what hypotheses to test with causal methods like incrementality testing or marketing mix modeling.
There is always a trace back to the source documents. You can go back and verify by reading the quotes. Citations are validated at synthesis time and hallucinated references are caught. The output is inspectable: counts are counts, not vibes.
Lossy compression is genuinely lossy. Nuance, tone and hedging can be flattened; the claim schema is a choice about what to keep and what to discard. Taxonomies that map free-text mentions to canonical dimensions need ongoing curation as vocabulary evolves. Resolution coverage varies by dimension, and some claims stay partially unresolved. And extraction quality depends on the underlying LLM which means it can drift. None of this is free to run; extraction costs LLM calls per chunk and indexing costs embedding calls per claim.
These are real operational costs. The bet is that structured, queryable, auditable answers across thousands of conversations are worth more than the nuance lost in compression.
The combination of claims (qualitative data) and metrics (quantitative data) with a reasoning layer you can talk to will transform your understanding of the business and what goes on: