Understanding Cohort Analysis

Cohort analysis is conceptually pretty simple yet it’s one of the most important and powerful analysis approach a startup can adopt. I had in my earlier post discussed the importance of Lean Methodology for startups to minimize wastage of resources and getting to product/ market fit first before scaling up. Cohorts play a crucial role in helping us understand user behavior on each iteration or improvement to the product. There are plenty of other business questions that can be understood better using Cohort Analysis. To give you some examples:

1) How are the optimizations made to the product in a defined period affecting conversions?
2) Which traffic source is generating maximum conversions?
3) Which source tends to bring in users with maximum engagement on the platform?
4) Are customers acquired via email marketing more likely to repeat purchase or are they more likely to upgrade, compared to those acquired e.g. via AdWords marketing?

And more. Products such as Mixpanel and Kissmetrics enable us to easily create and analyze cohorts. Cohorts have never been a core part of Google Analytics, however there are certain hacks you can do to make it work. Even then there are restrictions to creating different types pf cohorts using GA, for eg: a cohort based on the date of purchase of any product on the website. With the latest update GA does allow one to segment users based on the date of their first visit.

What is a Cohort?

A cohort is simply a group of people who share something in common and is time bound, ie, they had something in common when the grouping was first made. A Cohort is very similar to a segment and often there is a lot of confusion on the difference. To understand better, you can consider a segment as “Employees working in the Marketing Department” while a cohort would be more like “Employees who joined in November 2013”.

Cohort Analysis
Cohort Analysis is very popular in medicine where it is used to study the long term effects of drugs and vaccines:

A cohort is a group of people who share a common characteristic or experience within a defined period (e.g., are born, are exposed to a drug or a vaccine, etc.). Thus a group of people who were born on a day or in a particular period, say 1948, form a birth cohort. The comparison group may be the general population from which the cohort is drawn, or it may be another cohort of persons thought to have had little or no exposure to the substance under investigation, but otherwise similar. Alternatively, subgroups within the cohort may be compared with each other.
Source: Wikipedia

We can apply the same concepts for an online portal/ startup to understand better the different type of users and their behavior on the platform. How we define the cohorts to compare and what we compare about their behavior will depend on the business question we are seeking an answer for. In the case of a Lean Startup, the basic premise is that the product is constantly iterated to find the product/market fit and then iterated on to optimize conversions and scale. This is one of the prime applications of a cohort analysis. We can use Cohort Analysis to compare the users acquired during each iteration and compare their behavior on the platform in terms of retention, engagement, conversions etc. Joshua Porter’s excellent blog post on twitter’s use of Cohort Analysis to track engagement with product improvements is a great example of this.


If you look at the fig, it has rows for cohorts ( User acquired during each month is grouped as a separate cohort) and the columns give the engagement or retention figures for the cohort over a 12-Month period. As you can see this is the only manner in which one could clearly understand if the iterations and product improvements which twitter was rolling out on a regular basis was continually improving the engagement on the platform. Under a normal graph where in the cohorts are not present, many a times this picture won’t get reflected as the engagement from the early set of users will mask the engagement metrics of a particular group, be it in a negative or a positive manner.

The above example from twitter represents just one application of Cohort analysis. There are various business questions as discussed earlier that can be answered using cohorts. Let’s first understand the various ways to define cohorts:

1. Cohorts defined by when the user first Visits:
Many a times a user does not sign up or engage the first time they visit a platform. Grouping users based on their first visit will help one to understand the number of touches required before they sign up or engage on the platform and on what product iterations does one increase the conversion or the engagement metric based on the date of first visit. The earlier case study of Twitter is a good example of using cohorts to understand user engagement for a product.

2. Cohorts defined by when the user Converts:
By Converts, I mean any type of conversion or micro-conversion on the platform. It could be signing up, registering, making a first purchase, subscribing to the list etc.

3. Cohorts define by what channel the user was acquired on:
It’s really important to understand the best channels of user acquisition and the behavior of the users acquired through each channel so that one can focus more on the channels that yield best results. Cohorts based on the Channel of acquisition helps in this.

4. Cohorts based on User behavior:
Users can also be grouped based on the behavior they exhibit on the platform. For eg: In case of Zoomdeck, there are users who are frequent visitors and infrequent visitors. Users can be grouped in to various cohorts based on their re-visit rate and engagement on the platform. This is important as it helps us better understand them by having a look at other metrics exhibited by them. For an e-commerce companies one would need to strategize differently for frequent buyers vs infrequent buyers and this can be done better through cohorts.

5. Cohorts based on Customer Lifecycle:
For a platform having a number of stages it’s important to track various metrics like retention, Customer Lifetime Value, Engagement etc. It could be a simple game having various levels and classifying users based on the levels they are in and understanding the various metrics exhibited by these cohorts would help one take better decision to incentivize the users and make them shift levels.

6. Cohorts based on User Characteristic:
There might be cases where one would also want to create cohorts based on certain user characteristics like Men Vs Women, The Country of Origin, Age Group etc to create targeted campaigns or provide customized incentives to improve the engagement, retention or revenue metrics exhibited by them.

We have covered in general the various cohorts that can be created, although I do agree there might be a few specific ones related to the niche you are operating in. Creating cohorts form just one part of the puzzle, the most important part is to use various metrics to understand the behavior exhibited by these cohorts which enables you to take business decisions. There are various metrics one would need to track depending on the niche, type of product and the product lifecycle stage the Product is in.

Metrics most often tracked between cohorts are:

1. Measures of User Engagement:
During the early stage of a product before validation, User Engagement (including activation) and Retention becomes two of the most important metric. Cohorts based on date of first visit/ conversion, enables us to understand how product iteration is improving user engagement or if any changes made to the product has negatively affected engagement. The earlier example of Twitter was about tracking engagement on the platform. Depending on the product you can define what user action is termed as engagement or activation on your platform.

2. Retention:
Just like engagement is important as a metric, any successful product should have good retention figures as well. I had covered the importance of retention and how it affects virality, cost of user acquisition and customer lifetime value in my earlier posts on Virality. Cohorts help us understand retention better by enabling us to accurately define what features and user flows are improving the retention numbers. Funnel tools don’t help us track retention which needs to record user activity over longer periods.

3. Customer Lifetime Value:
Customer Lifetime Value is probably the most difficult metric to track. One of the questions we might want to understand could be the channels of user acquisition that result in giving us the max. value for CLV, the particular activity that drives a user to upgrade plans, split-test different pricing plans to understand the optimum one, features or user flow changes that results in better CLV. All of these can only be understood better using a cohort group as it allows us to track a cohort over a period of time to better understand their behavior on the platform.

4. Measuring long life-cycle events:
A product undergoes many iterations and feature roll-out. It’s impossible to measure long lifecycle events using just funnels. A prime example could be measuring revenues or retention which is typically a long term thing.

Now depending on the niche and the stage of growth your startup is in, you would have to choose the various metric that you need to track and also for the various cohorts we had earlier described. At the end of the day for any product, things finally boil down to user growth, engagement, retention and revenue. Analytics enable us to improve on each of those metric and cohort analysis is a technique that gives us great insights in measuring metric that are typically long cycle.

Cohort Analysis Presentation (Example)

I love this presentation of Cohort analysis (quoted from this Blog post) :


What you can see immediately is that the area on the right (Period 5) stacks up the current status with users from Period 1 to Period 4. The really interesting piece of the puzzle comes into play when you are considering what exactly your users represent: active, subscribers, etc. So here is what we can infer from the chart:

  • The height of the chart at Period 5 (at 280) is the number of users currently using (or paying for) our system/app.
  • The individual stacks have a drop-off. As we can see, the drop-off is high in the beginning and then starts to level out but does not go down to zero. Since this is homogeneous across all periods, we can infer that there is something we are doing right: user behavior becomes predictable.
  • For each period 1 to 4, new users were signing up and the number of users from Period 1 makes up 17.8% (50 out of 280) of the users in Period 5.
  • The fall off of users from one Period to the next is higher in subsequent Periods, leveling out at about 25%  of the original sign-ups after 3 periods.


The Only Growth Hacking Resource List You’ll Ever Need

If you are worried about user acquisition, growth rate, engagement, retention and more, here is something that will cheer you up! List of all the resources you can fall back upon to drive up your growth numbers.

Growth Hacking Tools List:

User Acquisition & Retention:

Blogs & Communities for Inspiration:

 (PS: Thank you autosend.io for the list)

The Only Digital Marketing Guide You’ll Ever Need

I can understand the feeling when someone asks you to setup the entire digital marketing engine at an emerging organization and how it can completely overwhelm you! The sheer amount of channels that you leverage to engage with your consumers is both a boon and a bane. It sure does give you options to measure conversions across various channels and optimize your marketing spend based on whatever channel works best for your business, but it also will stretch you to the core with the amount of effort required to setup the various channels and optimize conversions for each of those channels.

Oli Gardner has created this massive infographic on all the tasks you have at your hands as someone setting up the digital marketing engine from scratch. This is impressive, beautiful and brilliant – Digital Marketing Guide!


The Noob Guide to Online Marketing - Infographic
Unbounce – The DIY Landing Page Platform

Using Data & Analytic Tools to Better Understand Your Users – Measuring the Right Metrics

Startups be it a product or a services based one, is in an extremely competitive landscape vying for every impression it can get among the millions of potential customers available online. Getting your startup visible or discoverable is one thing, getting them to convert on your website and retain them is an even tougher task with the plethora of services and products that the consumer is forced upon. This is why it becomes so very important for startups to understand each and every activity of the user right from the first time a potential customer/ user discovers their service or product on the web to the point they convert and start coming back to their website.

There are plenty of data that’s available to a startup these days and a vast variety of analytic tools to analyze them as well. A few years back, one would have managed analytics and data tracking using just a Visitor analytics tool like Google Analytics, but that is no more the case now. With growing competition, you have far less room to fail. Based on your website and your requirements you can choose from the various Analytic Tools that’s available to you. More often than not, you would need to have a combination of these tools below to better understand user behavior. The below chart gives you the various classes of Analytic tools and their strength in measuring various parameters:


Source: www.moz.com

It is crucial for a marketer to appreciate the insights data can provide on user behavior and take necessary actions to correct and optimize wherever required. It is also crucial for a marketer to measure the right data and understand it’s essence for better improvement of the customer lifecycle on their website.

In my previous post, we had discussed the importance of measuring the right macro metrics. For understanding and validating Product/ Market fit, one needs to measure Activation and Retention. However to completely understand the lifecycle of the Customer one needs to also measure the other three elements: Acquisition, Revenue and Referral.


Funnels are a great way to understand user behavior on your website. They are visual, simple and map well to most of the events related to measuring the macro metrics. But Funnels alone have their limitations as well. Imagine if you wanted to measure the impact of repeated product iterations you have pushed out to during a period on the revenue. It becomes extremely difficult to track the same using only funnel, one because the impact on revenue is a long term thing and also because you would need to segment users who signed up during the period when each iteration was rolled out to effectively understand the impact on revenue for the set of users who started off with a particular variation of the product. This is where cohorts play an important part. Think, I would cover cohorts in the next post and explain in detail the methodology to track metrics like retention, revenue, impact of feature iterations on both and more. In this post, we will focus on using Google Analytics in tracking the channels resulting in any of your user interacting with your brand, converting on your product/ service and also on coming back to your product/ service. The Digital Marketing Funnel as represented in the figure earlier can be broken down in to 3 components:

  • TOF – Top of Funnel
  • MOF – Middle of Funnel
  • BOF – Bottom of Funnel

Top of Funnel:

Top of the funnel represents the first interaction a user has with your brand/ product. There are plenty of channels on which the interaction would happen and one would need to optimize for each of the channels the interaction happens on. The best solution is to always focus on at max two of the channels where the interactions seem to be most effective. With the new Universal Google Analytics Tool, you can get the channel details at Acquisition » All Traffic.


The above table gives you a good understanding of all the various channels that drive traffic on to your platform. You can export the data to an excel sheet and then use a pivot table to understand what medium acts as the best option to drive first time traffic so that you can focus and optimize for that channel/ medium.

You can drill down further to understand the best referral sources through Acquisition » All Referrals


Determining which sites have referred the best traffic to your website is important as it enables you to focus on those channels. You can focus on important parameters like Bounce Rate and Time Spent on site to understand the engagement of the users coming from various channels. Not only that, you can also identify websites that are similar to the ones driving traffic on to your website by doing a search on Google [ Use the search query related:”site name”]or on Similar Web to try and leverage on to the similar audience on those sites to generate traffic. For eg: If weheartit.com is a major referrer to your site, then doing a search for related websites on google gives you these results:


The above search result gives you a healthy number of similar sites with similar target audience who would be interested in your site. Refining and cross-posting your contents across these websites can also help you in getting additional traffic. You can even automate a few of these by using a service like IFTTT where you create recipes for simultaneously posting on a number of these platforms.

Remember, it’s always a good practice to tag the various URLs you use to drive traffic from various campaigns on referring sites. You can use the standard URL builder which google provides to generate tags.

By generating campaign URLs, you can identify the source of referrals to your website, whether visitors found the link from within a newsletter, social media post or other marketing campaigns. By naming the three main campaign tagging elements:  source, medium and campaign, Google Analytics will display information about where the referral originated. Simply complete the tool’s three-step form.

Here are just a few examples of valuable KPI data points you might consider tracking as part of acquisition:

  • Organic Search (SEO)
  • Paid Search Marketing (SEM)
  • Social Campaigns
  • Banner Campaigns
  • Links from External Sites
  • Links from Online Videos
  • Email Recipients
  • RSS Subscribers

Another important parameter which you would want to track is the landing page and how you can optimize them for better conversions. Google analytics helps you identify the most important landing pages on your site and the user flow thereafter. This would give you a better understanding on which pages are performing badly and helps you understand what you can do to further improve user interaction on those pages. [Behavior » Site Content » Landing Pages or Content Drill Down ]


On Improving weak landing pages:

  • Optimize the content to make it relevant if it’s outdated.
  • If it’s your main landing page, change the message or positioning if required. Use the heatmap tool to better understand the user interaction on the pages and optimize your page accordingly.
  • Make the content more comprehensive so that more people will find it interesting and informative.
  • Build more relevant internal links to the weaker pages to give them more link juice.
  • You can prompt the user to sign-up for email newsletters or at least try and convert them on any of your micro-conversions before the user leaves.

Middle of Funnel:

Middle of Funnel in the Digital Marketing Funnel is the point where in the user is moving from an initial product or brand interaction to a first sale/ to any major interaction on the platform. You might not be able to get a user to convert during this stage but it’s crucially important for companies to target micro-conversions during this stage.

It’s important to track the sources or channels through which the users come back to your site during this stage and it’s also important to measure the paths taken by the users in completing the micro-conversions or goals set on your page. For understanding user paths, GA has an option called Visitor Flow under Audience that visually represents the user path on the website and the drop-offs at each stage. The Visit Flow Report is a nice and a better representation of the traditional click path report. One can view the visitors moving between nodes. One also has the option to view particular segments of users based on region, campaign, traffic source, country etc and their flow/ browsing pattern on the website.


You can also create your own funnel for any of the goals you have set using GA to better understand where the users are dropping off. For setting up goals or micro-conversions in your site, you would need to clearly define the business objectives for creating goals (micro-conversions). Few examples of good engagement goals to track:

  • Account signup
  • Email signup
  • RSS subscription
  • Watching video
  • Content interactions (e.g. photo zoom, faceted search attributes, etc.)
  • Product Purchase

The goals would vary based on the type of website you are measuring for. To set up these goals, you can login in to the admin panel of your Google Analytics dashboard and then click on the Goal tab.


You have different goal types to chose from: Destination, Duration, Pages/ Screens per visit or Event. In case of an E-commerece website for eg, if the marketer needs to track how many users complete the check-out process, then he/ she would have to chose the type of the goal as “Destination” in the first step. In the second step he/ she would have to define the destination page which would complete the goal (Conversions).


For creating the funnel, you would need to specify each step (page) the user traverses before completing the final goal. The funnel visually represents each stage in the micro-conversion process also specifying the drop-offs at each stage. You can create, based on your requirements, multiple mini-conversions and funnels to better understand user flow during this middle stage of user lifecycle.


[Fig: A funnel representation of a goal set to White paper Downloads from the start page clearly indicating the conversions and drop-offs at each stage.]

In the middle of the funnel (MOF) for the Digital Marketing Funnel, it’s also important to analyze the most effective and popular channels that bring the user back. For this, GA provides Multi-Channel attribution tools under the “Conversions” section. There are various attribution models one could use. For a full guide refer this. The Linear Attribution Mode, which gives equal weightage to any channel in the funnel irrespective of where it appears,  gives us great insight in to which channel accounts for the most revenue overall. You can use the Model Comparison Tool in GA to find this out:


For figuring out the most popular channels in the MOF, we would have to do some manipulation using excel to weed out the first and the last interaction channels.

Bottom of the Funnel:

The bottom of the funnel is the last touch before someone buys. These channels are very important as it let’s you identify which channels to focus on to complete conversions. You can find this data in Conversion > Attribution > Model Comparison Tool and select your model as the Last interaction.


You can use these data on the best channels for driving traffic on to your website to further improve and optimize.


In addition to standard segments that are available in GA to chose from ( You would have noticed this when we discussed the User Flow path), there are also a wide variety of custom user segmenting options that lets you better understand each set of users. You can create your own segments from the dashboard by clicking on the drop-down next to the All Visits tab that’s present as default. GA with the latest update now has the ability to segment visitors and not just visits, which is something GA lacked compared to tools like Kissmetrics and Mixpanel.

Now click on the Create Segments Icon to define your segments. There are a wide variety of parameters you can use to create segments or else you can use any of your own created events as well to define a segment.



Refer this post for a great list of custom advanced segments which you can use.

Using segments, you can slice and dice your audience in ways never imagined before. You can create segments based on first purchase value, browser being used, platform being used, device on which the visitor opened the site, purchase value during a period etc. I can very well use this data to do a cohort analysis which is very important at an early stage especially if you are on a lean methodology and constantly iterating, measuring the behavior of the set of users who come in during each of these iterations. Even otherwise, there is tremendous amount of insights analyzing segments will give you. I will cover Cohort analysis in detail in the next post.