During the last months, Google Analytics has been victim of a lot of SPAM attacks through what it seems like “pageviews” that actually never happened. Some of the main websites that are running this attacks are “sexyteens.hol.es”, “buy-cheap-online.info”, “free-share-buttons.com”, “Get-Free-Traffic-Now.com”, “darodar.com”, “webmaster-traffic.com”, and a lot more…
This kind of practices damages our Google Analytics views by showing us data that is not real. “What are you talking about, Alan? Why would you say that? Is Analytics lying to us? Why? Why, Google, why???” Yes: I mean it, and I repeat: those visits you are viewing are not real.
A specter is hunting Analytics
That’s how they look in our “Pageview” report:
Si seleccionamos como dimensión secundaria “Fuente/medio”, vamos a ver las fuentes más extrañas del mundo:
If we set as our secondary dimension “Source/Medium”, we will see the craziest sources ever:
¿Cómo nos aseguramos de que realmente no estamos recibiendo ese preciado tráfico en nuestro sitio, de que es puro SPAM? Simple: seleccionemos como dimensión secundaria “hostname” o “nombre de host”:
How can we make sure that we aren’t really receiving this precious traffic in our site? How can we know it’s 100% SPAM?
Let’s select as secondary dimension “hostname”:
That column should always say “yoursite.com”, or “subdomain.yoursite.com”, or whatever your website URL starts with. However, when we are analyzing ghost visitors, we will see a lot of “(not set)” and other websites as the hostname:
This is the final evidence about your Google Analytics property measuring ghosts.
The How-to: abuse of the measurement protocol
With the great releasing of Universal Analytics, Google has added one of the most interesting features of all time, that allows us sync in GA not only the actions that users make in our website, but also the ones that are being made off-line, clearing the way for a whole new universe of posibilities. That was called as the measurement protocol. Through that, we get to send data to our account by directly using our server requests.
However, this protocol doesn’t have any identification method (like a unique token) in order to prevent anyone to send data to our account, allowing that anybody can just send information to any Analytics ID, wich by the way is public for every account on the internet. This way, many shitty sites are taking advantage of this to introduce this bad quality annoying advertising in our dear reports.
The best way of solving this is by creating a filter that only includes in your website the sessions that have your hostname defined. Caution: once you have created a filter, if you did it the wrong way by some mistake, the sessions that are not counted can’t be recovered. In order to make sure you are doing it correctly, I recommend you first take a look at your “hostname” Analytics reports and make a filter that includes all your valid hostnames. Some examples: yoursite.com, subdomain.yoursite.com, AnotherSiteOfYoursThatForSomeReasonYouAreAlsoTrackingInTheSameView.com.
That being said, let’s click on “Administration” in the top menu, and then select the “Filters” option in the view level (third column):
Let’s click in “New Filter”:
And configure it this way:
In this example, I only have one valid hostname: “mysite.com”. In case you have many, review this filter’s option in order to include all of them, In case you have subdomains, you should change “that are equal to” to “that contains”.
That’s is! Click on “Save” and, since now (this doesn’t remove previous ghost visits), you can say goodbye to the Analytics ghosts: 😉