In Universal Analytics (UA), 'Sessions' was a go-to metric for many of us. It was an easy way to see how much traffic a website or app was getting, and how well different acquisition channels were working.
When Google Analytics 4 (GA4) came along it introduced a new data model that shifted from being based on ‘hits’ to ‘events' and this had some important effects on the Sessions feature. In addition to this, Google Analytics began giving estimated numbers for some metrics, including Sessions, right in the interface. These changes mean using the Sessions metric isn't as straightforward as it used to be. If not understood well, it could lead to confusion and mistakes in reports.
In this post, we're going to look at how Sessions in GA4 has changed from how it was in UA. We'll see what's causing mismatches in reports and how you can avoid them.
What are the differences between Sessions in GA4 and UA?
Before delving into the differences in reporting on Sessions in Google Analytics 4 compared to Universal Analytics, let’s take a quick look at what has changed in terms of measuring. In Universal Analytics, you can think of a Session as a container of interactions with a website or app. A session would last until there were 30 minutes of inactivity, the session crossed midnight (based on the timezone the UA View used), or new campaign parameters were encountered. A user could have one or several Sessions in a day, week or month.
By contrast, everything is an event in Google Analytics 4, including the start of a Session (event name session_start). When a new Session is started in Google Analytics 4, a session ID is generated and subsequent events by the same user are categorized to the same session ID. A Session will only end after 30 minutes of inactivity, if the same user comes back after that a new session ID will be generated.
Another important difference is that in UA, Sessions were created in the backend whereas GA4 calculates everything is the frontend of the app. As we will see, this has some important implications when it comes to reporting on Sessions, both inside and outside the user interface.
Google Analytics 4 is summing Sessions incorrectly - why?
When you start using Google Analytics 4, the look and feel of the platform will be very different, but at least you will find many familiar dimensions and metrics. For example, you are still able to get your Session counts broken down by whatever dimension you are interested in. However, when you start working with granular data, things might not be adding up, literally. In the below example, we can see a screenshot from GA4 where Sessions is broken down by just one dimension, Device Category. Values are spread across four device categories, with the total row showing 55,266 Sessions. So far so good, right?
However, as some eagle-eyed readers might already notice, the total row (55,266) is not the sum of the individual Device Category rows, as you can see in the image below:
This is because in GA4, the total number of Sessions is estimated. Google Analytics 4 estimates the cardinality (distinct count) of some of the most commonly used metrics, including Sessions, using an algorithm called HyperLogLog++. As a result, you will typically see that summing the rows together will not give the same value as you see on the Totals row.
Reporting on Sessions outside the GA4 interface
It’s not only in the GA4 interface that you will see this discrepancy. A simple way to test from outside GA4 is to use Google’s own developer tool Query Explorer. As you can see, the response from the API gives the same mismatch as the one we see in the GA4 interface:
The challenge that this example highlights is that Sessions is no longer a metric that is reliably reproducible via aggregation in GA4, in contrast to how it was in UA. This means that when you work with GA4 data in a database, or Sheets, or anywhere else where you do your own aggregation you have to be mindful of this fact, or find a way to work around it.
“But the difference is only -0.82%, why should I care?”
In the example of Device Category, the difference is tiny and it’s important to remember that Sessions is not and has never been an accounting metric, so it does not have to be exact. What is important is that it’s not too far off from the true value so that it directionally still gives a correct idea about how things are moving.
However, if you were to request Sessions by a dimension such as Event Name and/or Page and then sum Sessions by rows, the totals can be inflated by hundreds of percent. This is because, as we mentioned earlier, everything in GA4 is an event. Therefore, if you request Sessions by Event Name the API will return Session values for all Event Names or Pages. Since one session can span many Events or Pages, the totals are massively inflated. Needless to say that a Session metric inflated by hundreds of percent is not useable so this needs to mitigated.
How to get accurate Session counts for GA4
As we’ve seen, Sessions can be challenging to work with because depending on the dimensions you include in the request, the totals can be more or less off the mark. Sessions is not a metric where exactness is necessary, but we need it to be close enough so that we can rely on its directionality and for it to be easily reproducible. Thankfully, GA4 has thrown us a bone that allows just that.
When you set up GA4, everyone gets the same basic set of Events and one of these events is called “session_start”. As expected, this event is triggered every time a new session is started. Since a session_start can only happen once, it also means that the aggregate Session count of session_start should be (and is) very close to the number of Sessions that GA4 has estimated. In the example below, you can see what it looks like in GA4 when you break down Sessions & Event Count by the Event name session_start:
As you can see, the aggregated totals using both Sessions or Event count together with the Event name session_start gets us within around 1% of the estimated Session value by GA4 which is well within limits where we can use it for reporting purposes. The best thing is that this solution scales, so no matter the number of dimensions you query, as long as you include Event name and filter for session_start for Sessions or Event count you will be very close to the true value.
How does Funnel help?
Seeing as Session counts are reproducible by aggregation if you use the "session_start" event, Funnel has created two derived metrics that you are able to pick in the connect configuration for GA4: Sessions (event based) and Sessions (session based). If you include Event name and Event count in your GA4 data source, we recommend that you use the Sessions (event based) metric in your reporting. As we saw in the example above, this will get you very close to the estimated total in GA4 but also provide the assurance that totals will not be inflated when you use this metric.
If you want to include Sessions in your data source, we recommend that you use Sessions (sessions based) as the metric to report on Sessions. This value will be even closer to GA4's estimated total value. However, note that in order to derive this metric you need to also include the "normal" Sessions metric in your data source. If you have many users in your Funnel subscription, this opens up the possibility that a user that is unaware of how Sessions works in GA4 might create reports with inflated values by using the "normal" Sessions metric, as opposed to one of the derived metric that Funnel has provided.
In summary, we have looked at how the Sessions works in GA4, how it is different compared to in UA, and what you have to keep in mind when using Sessions in your GA4 reporting. We have also looked at an alternative where you use a Sessions metrics that utilizes Event name "session_start", and the benefits it has. The key benefit being that it provides a Session metric that is reproducible by aggregation results in almost exactly the same values as the estimated Session totals you see in GA4.