Standard

Create a Google Analytics implementation doc using R

You can get the code of post and a bit more at Github : R_Google_Analytics_Doc

One of most frustrating things that you can face during a day is someone to ask you to give her the definitions of the funnel that someone configured a couple of years ago for 10 product categories. The only thing that can help you in this is to have an Implementation document. Oh, you bet you have it updated! (side-note: with the spread of Google Drive people tend to create documents for fun, however they rarely get to update them or even revisit them).

God, you definitely need to have written for each account you handle

  1. What you are measuring for
  2. What the configuration for the views are
  3. What is that special thing you have opted in for this view?

How we gonna do this fast?

After you run this code, you’ll have an Excel file that has all the following configuration point noted down

  1. Accounts
  2. Web Properties
  3. Views
  4. Goals
  5. Segments
  6. Custom Data Sources

The utility that enables us to get 1-5 is the Google Analytics Management API.

The Management API (v3) exposes multiple Google Analytics configuration entities. Most are organized hierarchically:

  • Each user’s Account can have one or more Web Properties, each of which can have one or more Views (Profiles) or Custom Data Sources.
  • Each user’s Account can also have one or more Filters.
  • Views (Profiles) can have one or more Goals, Unsampled Reports or Experiments, and Custom Data Sources can have one or more Uploads. Unsampled Reports are available for Google Analytics Premium customers only.
  • There are also entities that allow users to manage user permissions by creating a User Link at the Account, Web Property, or View (Profile) level.
  • Similarly, AdWords Links can be constructed at the Web Property level.
  • Filters can be connected to a particular View (Profile) with a Profile Filter Link. A user can also define Segments, which are not hierarchically related to any of the other entities. This diagram represents the parent-child relationships among the entities:

Google Analytics Management API

Limitations of the Management API

We cannot get some interesting stuff with the API

  1. Channel groupings
  2. Brand keywords
  3. Enhanced eCommerce settings
  4. Events
  5. Custom dimensions/metrics

to name a few.

Another R Google Analytics API package…

Let’s go on and load a new package that enables us to connect to the Google Analytics various APIs, RGA, all capitals since there is another package with lowercase rga.

I will not go through the details on how to get authenticated on Google Console as it is well explained in the package repository.

The main functions we will need to create our implementation doc are

  1. get_accounts()
  2. get_profiles()
  3. get_segments()
  4. get_webproperties()
  5. get_custom_sources()
  6. get_goals()

Off to the real thing!

In the lines below we load the package and pass the client.id and client.secret to get authorised in the API. Once we done this we can get the views (or profiles as we used to call them few months ago)

# Initiate RGA ------------------------------------------------------------
# load package
library(RGA)

# Connect to API ----------------------------------------------------------
# get access token
ga_token

# Get the accountIDs ------------------------------------------------------
# get the unique accountIDs
accounts > head(accounts)
[1] "183263" "431217" "4476693" "7369781" "22603640" "27175492"

Now, that we have the unique accountIds we can iterate over them (say in a for() loop) and get the elements needed.

We will create an excel file for each account we have since this is the standard file businesses are using. We could also change this to work on a website (URL) level but let’s make the assumption that we have things organised under one accountIds for each client. I will wrap it with an introductory sheet that you can edit to show more of your documentation style. It will be as plain as

This is a technical report of your Google Analytics Implementation
Updated at : D/M/YYYY h:mm

The code to produce the result of this process is below

# Start getting the information! ------------------------------------------
require("dataframes2xls")

# Add an intro sheet
intro

for (i in 1:length(accounts))
{
tmp_goals tmp_profiles tmp_webproperties tmp_segments tmp_custom_sources # Let's write the results to xls files
write.xls(c(tmp_webproperties,tmp_profiles,tmp_goals,tmp_segments,tmp_custom_sources), file=paste0("ga_doc_",websiteUrl[i],".xls"))
}

Is that all to document for?

Well if you have everything near your hands you can do something for yourself as well. Like getting all data together and trying to find if you have anything mis-configured (or”forget”). You can do this easily by creating a master table of all accountIds as in the following lines

# We will use gdata to read the excel files
require("gdata")
# Now we can merge them to run some
# diagnostics. Taken from
# http://psychwire.wordpress.com/2011/06/03/merge-all-files-in-a-directory-using-r-into-a-single-dataframe/
setwd("./reports")

# this list the files and filters Excel files
file_list

for (file in file_list)
{
# if the merged dataset doesn't exist, create it
if (!exists("dataset"))
{
dataset }
# if the merged dataset does exist, append to it
if (exists("dataset"))
{
temp_dataset <-read.xls(file, sheet = "Sheet2")
dataset<-rbind(dataset, temp_dataset)
rm(temp_dataset)
}
}

Now, with a simple plot we can find things that are going wrong (eg missing e-commerce setting OFF where it should be ON).

The end is the beginning

This covers only the basics of a Google Analytics Documentation. There are a lot of stuff you might consider adding to this document like :

  1. The actual business questions that the implementation is aiming at
  2. KPIs you are trying to optimize for
  3. Definitions of the technical piece of tracking (passing value definitions, when an event is triggered etc)
  4. A sheet with Change history. You can get this information from the Google Analytics Admin console.
Standard

A Roll-up GA dashboard using Google Sheets & Apps Script

If you manage a lot of web properties then you surely have run into the problem of replicating the same report across all of the web properties. While, having the ability to create an aggregation of web properties is the most convenient (a roll-up account) there are cases where you cannot actually implement it or the implementation of such a solution will either require significant cut back on data points sent to the roll-up as Google Analytics comes with limitations to quota usage. It always advisable to check your current utilization, you can do it using this tool. As always, reality is not ideal so a roll-up account might not be available. In this case our fallback data retrieval mechanism is the Google Analytics API. We have described ways to access the API using R in two posts : Google Analytics + R = FUN! & Basic A/B Testing plots and stats with R. In the context of our post we need to create a simple dashboard that tracks the conversion rate of our handful of web properties across channels, so this is something that we would pursue using the Google Apps Script under the notorious name Magic Script.

Getting the Magic Script

Open the Script Editor in a Google Spreadsheet clear the code appeared and paste the Magic Script code found in this gist. Note : Use this gist because it contains some modifications compared to the initial Magic Script from the GA team.

If you haven’t used the Google APIs in apps scripts before go through the following steps to enable the Google Analytics API:

  • In the script editor go to: Resources > Use Google APIs
  • Turn ‘Google Analytics API’ to ON
  • Click the link to Google APIs Console
  • Turn ‘Google Analytics API’ to ON
  • Accept the terms of service
  • Close the ‘Google APIs Console’ window
  • Click OK in the script editor window and close that window
  • Back in the spreadsheet, the ‘Google Analytics’ menu should now be working. (You may need to re-authenticate.)

Now, a new tab should appear next to ‘Help’ names Google Analytics. If you click it you will see that you can create not only basic reports configuration but Multi-channel funnel reports as well.

Define a report

We will track conversion rate for the last 30 days. This means that we will not store data but we will only use the fact that the API can query based on relative dates. Let’s create a core report

query1 value1
type core
ids ga:XXXXX
start-date
end-date
last-n-days 30
metrics ga:Transactions,ga:visits
dimensions ga:date
sort
filters
segment
start-index
max-results
sheet-name Website_A_data

The report you will define here will be replicated for all web properties you have. To replicate the report you have to simply drag the first row until the aut0 increment of queryX,valueX reaches the number of web properties. Also, name the target sheet accordingly to reflect the web property. Then you need to create a intermediate sheet to aggregate the data (as the data will be refreshed on the target sheet hence every run will clear all contents of the sheet) plus in the future you might need to add more dimensions to the query that will result to more columns added). Essentially, in a new sheet we aggregate data from the queries’ fed sheets, like below.  

Tip : Create the first formula and then select Show all formulas from the View menu. Then simply Find & replace the sheet name in the respective web property column and you are done!

=UNIQUE(website_A_data!$A$12:$A$150) website_A website_B website_N Total
1/31/2014 =sum(website_A_data!B13) =sum(website_B_data!B13)  … =sum(website_N_data!B13) =sum(B2:G2)
2/1/2014 =sum(website_A_data!B14) =sum(website_B_data!B14) =sum(website_N_data!B14) =sum(B3:G3)
2/2/2014 =sum(website_A_data!B15) =sum(website_B_data!B15) =sum(website_N_data!B15) =sum(B4:G4)
2/3/2014 =sum(website_A_data!B16) =sum(website_B_data!B16) =sum(website_N_data!B16) =sum(B5:G5)

Create blocks of the above defined by the metrics you have defined in the report. this means you will make another table as above with source data the sheet-name!C13 which will give you a table with the visits. Then creating row totals will give us daily totals for all web properties reported and column totals will  give period wide totals for each web property. Last a sum of all cells in a block will give the total visits/conversions/dimensionX for the period summing over web properties. The division of totals gives period wide conversion rate for all wep properties and a cell by cell division provides the conversion rate for a specific web property for a day.

How to fight Execution Errors

Google spreadsheets have a time limit of 5 minutes when running scripts. So, if you have a heavy on queries spreadsheet you might get errors like that. The solution to this is to break down queries to more spreadsheets and then import data into your dashboard (the master spreadsheet) using the importrange() function which is magical by all means! However, in our case here if an error occurs (lets assume that it will occur in the last query) you can let viewers of the dashboard know using a message in cell using the next if() statement.

=if(now()-B2>0.019,"Beware : Execution Error in the last run!","")

Dashboard without sparklines?

No sir! In case you didn’t know Google Spreadsheets have a sparkline() function to use for the dashboards you are designing which are a great source of eye-picking information for spotting ups and downs in the time trends of your metrics and KPIs. Make sure that you won’t exaggerate like me with the use of sparklines…

Sparkline of Type I : Line

Line will show you time evolution, for your conversion rates, traffic, qualified visits etc. Look for peaks and lows or relative flat linings

Sparkline of Type II : Bar

Really nice to show you relative performance, say today vs yesterday (or vs this day last week). To use this the formula is simply

=SPARKLINE(B6:C6,{"charttype","bar"})

In the end

The following is a sample dashboard with a few runs on my personal site and some other blogs I have access to. It’s not really e-commerce rate but rather goal conversion rate for a contact form submission. Take a look at this and get some ideas on how to create more for your personal dashboard.

There is no end, if there is no tracking code added…

The Magic Script you got above is modified and enables you to track the usage of the Spreadsheet (using pageviews) and the execution of the getData function with event tracking. simply open the Script Editor and replace your UA in the script. I suggest having a separate Google Apps web property and use a content grouping per Google App (spreadsheets, docs, forms etc) to track utilisation of your drive creations. This is really exciting to see how (and if!) people are actually using the docs you shared like the screenshot below.

Happy dashboarding!

 

Standard

Basic A/B Testing plots and stats with R

Have you used Google Analytics Content Experiments?

I’ve been using it lately and the most useful thing is that all data are integrated into the API to do what you want to do except for the standard reports. So, you can get them into your favorite data analysis tool and get more things done. In the following I simulate an example for an e-commerce site, so I am interested in checking the next (among others)

  • did the proposed variation increased the Funnel entrances?
  • did the feature uplift the Average Order Value (AOV) ?
  • did the conversion rate (sales wise) improved?

We can easily accomplish this using the Google Analytics API and R. We have found a way to get data off Google Analytics using R previously  (see : Google Analytics + R = FUN!)

ab.test.comparison.Rplot

The code that gets the above chart into a presentation of yours is hosted over at GitHub. We source the multiplot.r file in order to be able to add multiple plot of the ggplot kind within a loop (see this)

..
initiate_RGoogleAnalytics.r First commit
multiplot.r First commit
plot_experiments_across_profiles.r First commit

 

Standard

Google Analytics + R = FUN!

The scope of this post it to show how simple it is to get data out of the Google Analytics and create your own reports (that you hope that they can be semi-automated at least) and you favourite statistical graphs (those that GA is currently missing). As you already know R is a favourite tool to me, so this will be the main tool to get the data, reshape them and depict them. You will need elementary knowledge of the R language and in the end you’ll soon relaize that google search is more than 40% of the code polishing stuff your code will ever need…

R packages

There are two packages (or libraries) that connect to the Google Analytics API and return data to you, the older one is RGoogleAnalytics and the new champion is rga. Both are excellent and I’ve used both at all occasions. rga seems a bit nicier but RGoogleAnalytics is certainly more robust and works under all occasions. Apart from the core, I will use ProjectTemplate for my personal organisation (you won’t see it however) and ggplot2 for graphics.

Google Authentication

First of all go to Google’s API Console and create a new API application after you make sure that you have Google Analytics Service enabled.

<Pause>

Because of the lengthy script that would ruin the flow of the post I have created a Github repo where all scripts reside [zip]

Ready,Set,Goooo!

Now, that the API access is set-up in the one side,we should make the connection to R. You should already know that RCurl is a bit tricky as I have oulined in the A tiny RCurl headache note. The solution proposed there is applied here as well. note that this issue will be solved in the next release of rga. On the other hand RGoogleAnalytics seems to be already on the spot. Have in mind that using

ssl.verifypeer = FALSE

isn’t the most secure way to use network communication in R.  You can use the following to create a connection to the API [rga_initiate_API_connection.R] This is heavily copy-pasted from Randy Zwitch’s (not provided): Using R and the Google Analytics API post.

One issue is how to get the Profile IDs. The hard way would be to go to Query Explorer and cycle through all profiles and write down the IDs that you are interested in. However, luck is all you got as there is a function that will return you all profiles that you have access with the account tied to the API access you created. (BTW, it is excellent that there is provision for access to the Management API in the rga package)

In the next I will assume that you have defined the ids that you are interested in.

The main hypothesis that I want to get a taste of is whether the different post categories (eg. measure, statistics, music etc) have different load times. This will be interesting given that all categories don’t have the same burden to get loaded (images vary, youtube videos scripts). To achieve the following you will need to use a filters vector and loop over it. Give appropriate names in the page.group vector and you will be done. Note, that we have created extra metrics

  • e-commerce rate : this is not meaningful in the case of a blog, but if you are advanced in analytics you might have implemented goals as e-commerce events as B. Clifton suggests.
  • bounce rate : the bounce rate should be correlated to the page load
  • buckets of page load time : we use a 4 seconds range for each bucket to be consistent with the the Apdex standard.

Because I want to get a more metrics than a single query allows (11) I use another query in the loop to ge the rest and then merge them.Now, if you run all these scripts you will have a data frame like this extracted in the end of the script using the head() function

Enough with scripting!

Now, that the data are on our console we can finally get some graphs. The following histogram is the aggregated page load speed histogram of this blog. You should note that there is a significant volume of sample units that belongs to the 12-16 bucket. I have the suspicion that they also belong to a specific country group as well as the host is providing good page load timings in the US and Western Europe. (Note to myself : I should add the ga:country dimension in the second query run).

Rplot

OK. This is not a nice picture at all! I know that I have experimented with various analytics scripts in the last months plus in the first 3 months of 2013 I was using a significantly heavier wordpress theme but I still think the the sample is skewed by the georgaphic distribution of the readers (a new post will come soon on this!)

Rplot01

Extend the script to your needs

In a modification of the script above I can use a loop on the web properties that I have access , so I use R to store data and create a roll-up report in a fast way. If you are looking at the comments section of the scripts you will notice the following.

# In the future we should only get data for increment dates. Don't we?
get.start.date<-min(final_dataset$date)

I use this to incrementally query and store data in the final_dataset data frame (this will help with the sampling that I will run into the first time of running the script for a long period of time). I am pretty sure a cron thing can be streamlined here, however I have no idea on cron jobs…

Head now to https://github.com/IronistM/R_Google_Analytics !

Standard

Purge Demographic data in Analytics using Adwords

One of most useful but rarely accomplished to integrate data into your Web Analytics tool is demographic data (in this case the tool is Google Analytics). There are a lot of approach, but it turns out that there is one that could be in front of your eyes from the very beginning…

The data provider

We will get data from Google Display Network, so we will need a campaign running on the GDN. If you don’t know how to set-up one it’s pretty straightforward, just set your ad campaign to “Display Network only” or “Search & Display Networks – All features.”

Add demographic targeting to an ad group

Now, that you have a campaign set you can add an Audience to run on. There are more than one type of Audiences as you can see in the AdWords help. We are interested in utilising the Demorgaphic option, so we will create an Ad Group to specifically serve us with traffic by age group. I will copy-paste the AdWords instructions below

  1. Select the ad group to which you’d like to add demographic categories.
  2. Click the +Change display targeting button. Click Age to update your demographic targeting by age.

    Change display targeting button

  3. In the “Age” section, click Edit.
  4. Select an option below:
    • All ages
    • Specific ages
      • 18-24
      • 25-34
      • 35-44
      • 45-54
      • 55-64
      • 65 or more
      • Unknown

    Demographic categories for age

  5. Click Done.
  6. Click Save when you’ve finished updating all of your targeting settings.

After this, we have an Ad Group to start with.Now, the tricky part; you need a budget… ;x

Spot data in Google Analytics

Where can you find the data in Google Analytics? It is likely that you have noticed that the GDN campaigns sent out to Google Analytics information in the Keyword dimension such as

1. (remarketing/content targeting)
2. (content targeting)
3. (not set)
4. (category matching)

Obviously these are not keywords….Now among these you will find keywords that are exactly the Age Groups that you selected to add to your Ad Group targeting. To check that you can use the following Regex filter in the Keyword report of AdWords et voila!

^(..)to(..)$
 Keyword Visits  Contribution
1. 25to34 2,328 36.61%
2. 18to24 1,728 27.17%
3. 35to44 1,395 21.94%
4. 45to54 908 14.28%

Segment and slice at will

Obviously the above is only of descriptive value, the true thing starts when you apply this as a segment (remember segmentation provides insight) or apply a segment to this report. Imagine how insightful this will be when you are testing a new feature that you thing that will change the way that the visitors interact with your website or engaging with your upper conversion funnel steps. I would be happy to have a budget set for traffic of this structure to evaluate my landing pages….

PS: A similar approach using Facebook PPC data is outlined in the post Segment Google Analytics data by age and gender by Junta Sekimori

Standard

Back up your WordPress Installation on Google Drive

So, I got lured into setting a back-up strategy for the present blog, mostly because I cahnge a lot of things at times while trying to learn about wordpress. I could probably do this via a time scheduled job on my hosting provider, but I am not that techy so I searched for relevant plug-ins, and there are a ton (see this list). I would like to stress out that I am getting extremely fond of Google Drive at this moment, so I narrowed the list to those plug-ins that are capable of storing the back-up over my drive.

Confession : I am not sure if this is optimal, though..

From my shortlist I prefered to use UpdraftPlus. It claims to simplifies backups and restoration. You can backup copies of the wordpress databases on the cloud (Amazon S3, Dropbox, Google Drive) to  FTP and email (this would be not of practical use if you have a large folder of upload (>20Mb probably will get cut from the email provider).

The main options are below. You can configure the frequency of the backup (the shortest is 4hours) and the number of backups that will be retained (1 is enough for me). Next check what should be backup’d. Iback up everything :) You can set an email to alert you of possible issues (if you don’t you can always access the log of the process). Last, you can set an encryption password for use when you will need to restore your wordpress database.

Configure Backup Contents And Schedule

File backup intervals:  and retain this many backups: 
Database backup intervals:  and retain this many backups: 
If you would like to automatically schedule backups, choose schedules from the dropdowns above. Backups will occur at the intervals specified starting just after the current time. If the two schedules are the same, then the two backups will take place together. If you choose “manual” then you must click the “Backup Now!” button whenever you wish a backup to occur.
Include in files backup:  Plugins
 Themes
 Uploads
 Any other directories found inside wp-content – but exclude these directories: 
Include all of these, unless you are backing them up outside of UpdraftPlus. The above directories are usually everything (except for WordPress core itself which you can download afresh from WordPress.org). But if you have made customised modifications outside of these directories, you need to back them up another way. (Use WordShell for automatic backup, version control and patching).
Email:
Enter an address here to have a report sent (and the whole backup, if you choose) to it.
Database encryption phrase:
If you enter text here, it is used to encrypt backups (Rijndael). Do make a separate record of it and do not lose it, or all your backups will be useless. Presently, only the database file is encrypted. This is also the key used to decrypt backups from this admin interface (so if you change it, then automatic decryption will not work until you change it back). You can also use the file example-decrypt.php from inside the UpdraftPlus plugin directory to decrypt manually.

Next you can select where you will store the backups, as explained I chose Google Drive, so I must authenticate the UpdraftPlus to have access to my Drive. If you haven;t done this before it’s a litte awkward at first sight but it’s easy. The details are really clear on the links provided below. Essentially you need to create a New Project (say “WP-Back Up”) then generate a CliendID and copy this and the Cliend Secret into the console. Then go to the folder that you will store the backups and copy the part of the URL that is next to the /#folders/.

Choose your remote storage:
Google Drive: Google DriveGoogle Drive is a great choice, because UpdraftPlus supports chunked uploads – no matter how big your blog is, UpdraftPlus can upload it a little at a time, and not get thwarted by timeouts.
Google Drive: For longer help, including screenshots, follow this link. The description below is sufficient for more expert users.Follow this link to your Google API Console, and there create a Client ID in the API Access section. Select ‘Web Application’ as the application type.You must add http://www.yourdomain.com/wp-admin/options-general.php?page=updraftplus&action=updraftmethod-googledrive-auth as the authorised redirect URI (under “More Options”) when asked. N.B. If you install UpdraftPlus on several WordPress sites, then you cannot re-use your client ID; you must create a new one from your Google API console for each blog.

After all is set up, you will need to enable the service via the wordpress plug-in page and you will start seeing the backups in your Drive folder like this…

statsravingmad.com - Google Drive

 

Hope this was helpful. May the Power be with you

Standard

Track the leads to your 404 pages

A common issue when having a lot of interlinking in your content is that at some point of time you will either delete or mistype a link URL (you soooo rush to push the Publish button). There are two ways ot track the page that originated to the 404 mishappening…The first one is to use the Navigation Summary, the second one is to use Event Tracking. If you are not familiar with event tracking it is a google analytics gem that you can use to measure interactions with elements of your site and many more. To implement Event tracking you will need to add the following code snippet in the 404 page of your site to track the previous page that originated to the 404 mishappening…

Then you will see in your Event Reports the Event Category 404.

Drilling down to the Event Action will reveal the page before that lead to the 404 page. This way you can spot the page and possibly the broken link (most 404 pages stem from broken links or user typing a wrong URL)

Now I should go and fix the broken link in the Software page of mine : )

NOTE : If you are using WordPress you will find the previous page appended to the 404 URL, like this, where the homepage is generating the error

/404.html?page=/blog/