Tuesday, 23 April 2013

[How To] Setup Cohort Analysis in 30 minutes using Google Analytics

[Alert: this article might be Startups-oriented and presume readers to have basic understanding on related areas. Still, no worry, it's totally okay if you're not one, simply skip directly to the "Solution" section and grab the insight as quick takeaway. Then just leave the rest as bed-time story later. ;) ]

Cohort analysis has been a very hot topics in Analytics & Startup era, especial after Eric Ries published his startup bible: Lean Startup. Need not to say, it's a powerful tool (otherwise why a hot topic?). For those who don't know anything about such methodology, originated from Medical field, a cohort means grouping certain subjects (usually people), based on certain common characteristics (says, user experience that we've delivered) within a defined period (ref: Cohort Study - Wikipedia: http://bit.ly/XZn5Fb); and a cohort analysis will be analysis performed across different cohorts. While it might be somehow confusing, simply speaking, cohort analysis allows us to understand behavior of groups of subjects, specified by time, over time.

Got it? No? Don't worry, since we've plenty of articles out there which talks about this topic (see the end of article), i might cover this area more later but not today. Today, we focus mainly on how to prepare proper installation for Cohort Analysis, using Google Analytics & client-side javascripts, in a quick and dirty way.

Hey wait, why another article for installation?

Seriously, because i would love to have a solely client-side solution, as current solutions usually work like this:

  • 1. Client script check if local cookies exist to determine First-time Visitors
  • 2a. if not, Client script then ask Server for First-time visitors cookies, with assigned values
  • 3a. Fire GA tag, either as Events or Custom Variables based on values stored in Cookies (usually the UID + Timestamp of First-Visit)
  • 2b. if it does, it grab the cookies value and pass to GA.

But what if we can't assign Cookies, nor having any accessibility on Server side operations? With so many cloud-based solution for quick market-testing, including LaunchRock, Strikingly, or even just a Blogspot or Wordpress, this challenge is getting more obvious. A solution like this hence creates bottlenecks for quick & proper Lean testing prcoess. Well, sounds like an workaround is needed.

The (Dirty) Solution

As long as we can get the job done (quickly). :P

Let's break down the problem. Indeed, to implement a proper Cohort, we only need two steps:

  • 1. Find a way to determine if it's a First-time Visitor
  • 2a. if it is, create a First-Visit timestamp and fire it;
  • 2b. Otherwise, we fire the old timestamp again (as we need to let GA knows it's the same "Group" of people)

Existing solutions usually handle the creation and storage of the timestamp by server... is there a way to abuse Google Analytics API, says, we simply "store" the value on GA and get it back on-the-fly for further operations? Absolutely.

(Ref: Google Developer - ga.js Basic Methods - http://bit.ly/17HzSuO)
For Customer Variable (CV) that stored at Visitors-level, CV value can then be fetched using the above call. In other words, we could have the visitors' timestamp stored on GA and download it for checking whenever we need. (Thanks Google!). With this function in place, we can then have the following logic implemented:

  • 1. Fire request to GA to pull the VisitorCustomVar() of current visitor
  • 2a. if the variable is "empty", we create a timestamp using javascript and fire it back from the client side
  • 2b. if the variable is already here, we resend it to GA (this tell GA that "Current Visit is contributed by someone we met at certain timestamp")

Easy enough, i then spent a couples of minutes to write the following script after the ga.js initiation tag (i simply :
Side-note: make sure to use "1" for the last parameters of the function so as to store the value at Visitor Level!

visitorCustomVariValue is the Timestamp we wish to store, and it's in a format of  "Y_M_D_H_I_W_"  to indicate the different part of the time. Sweet, isn't it?

So we're half way through. Now take a night of rest, we will play with the data tomorrow. (as we need to wait for the data anyway...)

How about Reporting?

Sidenote: so this is what i have got from the above setting...

We have the timestamp ready, now all we need is to create the Cohort segment, which can be proceeded by Regular Expression.

Easy enough, since we have the timestamp broken down into Year, Month, Day, Hours, and Weekdays, Regular Expression is therefore our best friends to get the dirty job done. Assuming you want to define a "April 2013" cohort, all you need is to have the following statement:


Which the D[1,30] will filter all the timestamp from Y2013M4D1, Y2013M4D2, and all the way down to Y2013M4D30.

What if you need the last week of April as cohort? (i.e. 28th Apr - 4th May)


See the "|" between? it means "OR" condition for text matching, so it's either the left expression OR the right one. No fancy magic anyway. ;)   (learn more about it at http://bit.ly/128zGUV)

After you have created different cohorts segment, you can now pull the Unique Visitors Report and observe something as follow:

Sidenote: no good-looking data from my GA, so grabbed an image sourced from Jono's answer on Stackoverflow to give you guys an idea of how it will look like: http://stackoverflow.com/questions/12436255/doing-cohort-analytics-on-google-analytics. Credit to Jono of course. :)

The Peaks denotes the First-time Visits while the long-tails indicate their returning! Guess now you know where to look into for retention rate of different Cohort now, right?  :)

Bonus - Cohort for a Blog?

As a proof of concept as usual, i would love to take a step further. 

While Cohort on First-Visit is definitely great for retention study, but as a blogger (or even should other business/startup), i concerns about the "First Experience" that they have encountered. For a blog, such "First Experience" would come from "the Article that they locate this blog", and would certain entries, or simply types of entries, provide the best experience to my ready and encourage them to consume more as well as come back again (well especially during the post-GoogleReader time...)? Could a similar mindset be applied to startups that seeking quick pivoting and would actually practically benefit to their decision making? I don't know, so i setup a "First-Impression" cohort for my blog and leave the questions opened. Let's see if we can draw more conclusion for later discussion. :)     (do share with me your thoughts on this one as well!)

Side-note: I should use "window.location.pathname" instead of "document.url" in order to filter out the domain... updated but guess that won't hurt. :P

Last Thought(s)

A short summary for this tricky approach...

Pros - Easy setup, quick to use, best for situation without server scripting or technology background.
Cons - Just like other hacks, it's never a complete solution for a 100% proper Cohort Analysis. Please consider trying KISSmetrics or RJMetrics if you do. :P

Last word: it's just a tricky hack on measuring cohort and nothing fancy actually. Do remember that It's never about how good in measuring stuff but the actual work afterwards: learning and pivoting. Guess i might cover this more in the future.

Leave me a message below, whether good or bad, your involvement is always my motivation to continue writing. :)

Glad to have you as my reader, again.


Additional Stuff

Suggested Article
Pinterest Analytics - From Strategic Planning to Tactical Measurement (Trilogy)


  1. Dickson, best article, thanks!
    What for you create the next parametrs: I and W in custom var?