Cohort analysis is conceptually pretty simple yet it’s one of the most important and powerful analysis approach a startup can adopt. I had in my earlier post discussed the importance of Lean Methodology for startups to minimize wastage of resources and getting to product/ market fit first before scaling up. Cohorts play a crucial role in helping us understand user behavior on each iteration or improvement to the product. There are plenty of other business questions that can be understood better using Cohort Analysis. To give you some examples:
1) How are the optimizations made to the product in a defined period affecting conversions?
2) Which traffic source is generating maximum conversions?
3) Which source tends to bring in users with maximum engagement on the platform?
4) Are customers acquired via email marketing more likely to repeat purchase or are they more likely to upgrade, compared to those acquired e.g. via AdWords marketing?
And more. Products such as Mixpanel and Kissmetrics enable us to easily create and analyze cohorts. Cohorts have never been a core part of Google Analytics, however there are certain hacks you can do to make it work. Even then there are restrictions to creating different types pf cohorts using GA, for eg: a cohort based on the date of purchase of any product on the website. With the latest update GA does allow one to segment users based on the date of their first visit.
What is a Cohort?
A cohort is simply a group of people who share something in common and is time bound, ie, they had something in common when the grouping was first made. A Cohort is very similar to a segment and often there is a lot of confusion on the difference. To understand better, you can consider a segment as “Employees working in the Marketing Department” while a cohort would be more like “Employees who joined in November 2013”.
Cohort Analysis is very popular in medicine where it is used to study the long term effects of drugs and vaccines:
A cohort is a group of people who share a common characteristic or experience within a defined period (e.g., are born, are exposed to a drug or a vaccine, etc.). Thus a group of people who were born on a day or in a particular period, say 1948, form a birth cohort. The comparison group may be the general population from which the cohort is drawn, or it may be another cohort of persons thought to have had little or no exposure to the substance under investigation, but otherwise similar. Alternatively, subgroups within the cohort may be compared with each other.
We can apply the same concepts for an online portal/ startup to understand better the different type of users and their behavior on the platform. How we define the cohorts to compare and what we compare about their behavior will depend on the business question we are seeking an answer for. In the case of a Lean Startup, the basic premise is that the product is constantly iterated to find the product/market fit and then iterated on to optimize conversions and scale. This is one of the prime applications of a cohort analysis. We can use Cohort Analysis to compare the users acquired during each iteration and compare their behavior on the platform in terms of retention, engagement, conversions etc. Joshua Porter’s excellent blog post on twitter’s use of Cohort Analysis to track engagement with product improvements is a great example of this.
If you look at the fig, it has rows for cohorts ( User acquired during each month is grouped as a separate cohort) and the columns give the engagement or retention figures for the cohort over a 12-Month period. As you can see this is the only manner in which one could clearly understand if the iterations and product improvements which twitter was rolling out on a regular basis was continually improving the engagement on the platform. Under a normal graph where in the cohorts are not present, many a times this picture won’t get reflected as the engagement from the early set of users will mask the engagement metrics of a particular group, be it in a negative or a positive manner.
The above example from twitter represents just one application of Cohort analysis. There are various business questions as discussed earlier that can be answered using cohorts. Let’s first understand the various ways to define cohorts:
1. Cohorts defined by when the user first Visits:
Many a times a user does not sign up or engage the first time they visit a platform. Grouping users based on their first visit will help one to understand the number of touches required before they sign up or engage on the platform and on what product iterations does one increase the conversion or the engagement metric based on the date of first visit. The earlier case study of Twitter is a good example of using cohorts to understand user engagement for a product.
2. Cohorts defined by when the user Converts:
By Converts, I mean any type of conversion or micro-conversion on the platform. It could be signing up, registering, making a first purchase, subscribing to the list etc.
3. Cohorts define by what channel the user was acquired on:
It’s really important to understand the best channels of user acquisition and the behavior of the users acquired through each channel so that one can focus more on the channels that yield best results. Cohorts based on the Channel of acquisition helps in this.
4. Cohorts based on User behavior:
Users can also be grouped based on the behavior they exhibit on the platform. For eg: In case of Zoomdeck, there are users who are frequent visitors and infrequent visitors. Users can be grouped in to various cohorts based on their re-visit rate and engagement on the platform. This is important as it helps us better understand them by having a look at other metrics exhibited by them. For an e-commerce companies one would need to strategize differently for frequent buyers vs infrequent buyers and this can be done better through cohorts.
5. Cohorts based on Customer Lifecycle:
For a platform having a number of stages it’s important to track various metrics like retention, Customer Lifetime Value, Engagement etc. It could be a simple game having various levels and classifying users based on the levels they are in and understanding the various metrics exhibited by these cohorts would help one take better decision to incentivize the users and make them shift levels.
6. Cohorts based on User Characteristic:
There might be cases where one would also want to create cohorts based on certain user characteristics like Men Vs Women, The Country of Origin, Age Group etc to create targeted campaigns or provide customized incentives to improve the engagement, retention or revenue metrics exhibited by them.
We have covered in general the various cohorts that can be created, although I do agree there might be a few specific ones related to the niche you are operating in. Creating cohorts form just one part of the puzzle, the most important part is to use various metrics to understand the behavior exhibited by these cohorts which enables you to take business decisions. There are various metrics one would need to track depending on the niche, type of product and the product lifecycle stage the Product is in.
Metrics most often tracked between cohorts are:
1. Measures of User Engagement:
During the early stage of a product before validation, User Engagement (including activation) and Retention becomes two of the most important metric. Cohorts based on date of first visit/ conversion, enables us to understand how product iteration is improving user engagement or if any changes made to the product has negatively affected engagement. The earlier example of Twitter was about tracking engagement on the platform. Depending on the product you can define what user action is termed as engagement or activation on your platform.
Just like engagement is important as a metric, any successful product should have good retention figures as well. I had covered the importance of retention and how it affects virality, cost of user acquisition and customer lifetime value in my earlier posts on Virality. Cohorts help us understand retention better by enabling us to accurately define what features and user flows are improving the retention numbers. Funnel tools don’t help us track retention which needs to record user activity over longer periods.
3. Customer Lifetime Value:
Customer Lifetime Value is probably the most difficult metric to track. One of the questions we might want to understand could be the channels of user acquisition that result in giving us the max. value for CLV, the particular activity that drives a user to upgrade plans, split-test different pricing plans to understand the optimum one, features or user flow changes that results in better CLV. All of these can only be understood better using a cohort group as it allows us to track a cohort over a period of time to better understand their behavior on the platform.
4. Measuring long life-cycle events:
A product undergoes many iterations and feature roll-out. It’s impossible to measure long lifecycle events using just funnels. A prime example could be measuring revenues or retention which is typically a long term thing.
Now depending on the niche and the stage of growth your startup is in, you would have to choose the various metric that you need to track and also for the various cohorts we had earlier described. At the end of the day for any product, things finally boil down to user growth, engagement, retention and revenue. Analytics enable us to improve on each of those metric and cohort analysis is a technique that gives us great insights in measuring metric that are typically long cycle.
Cohort Analysis Presentation (Example)
I love this presentation of Cohort analysis (quoted from this Blog post) :
What you can see immediately is that the area on the right (Period 5) stacks up the current status with users from Period 1 to Period 4. The really interesting piece of the puzzle comes into play when you are considering what exactly your users represent: active, subscribers, etc. So here is what we can infer from the chart:
- The height of the chart at Period 5 (at 280) is the number of users currently using (or paying for) our system/app.
- The individual stacks have a drop-off. As we can see, the drop-off is high in the beginning and then starts to level out but does not go down to zero. Since this is homogeneous across all periods, we can infer that there is something we are doing right: user behavior becomes predictable.
- For each period 1 to 4, new users were signing up and the number of users from Period 1 makes up 17.8% (50 out of 280) of the users in Period 5.
- The fall off of users from one Period to the next is higher in subsequent Periods, leveling out at about 25% of the original sign-ups after 3 periods.