How To Fix Referral Spam In Google Analytics

How To Fix Referral Spam In Google AnalyticsPin
Pinterest Hidden Image

Are you receiving a lot of referral spam in Google Analytics? Are you worried your reports may be tainted by it but aren’t quite sure?

In this post, we’re going to cover a couple different methods you can use to block referral spam in your reports. We’re primarily going to focus on accomplishing this with one filter.

First, let’s talk about what referral spam is and why it’s something you want to avoid.

What is referral spam?

Referral traffic, also known as a “hit,” is traffic that does not originate from search engines (organic traffic) or users visiting your website by entering its domain in their address bars (direct traffic).

Examples of referral traffic include those sent from social media sites or another site linking to yours.

Hits are recorded when users interact with your website, but they mainly come from visits. In Google Analytics, hits are recorded as pageviews, events, transactions and more. Referral spam generates fake hits that mostly originate from bots or fake websites.

Every website with a Google Analytics account has its own tracking code that identifies it. This is why you’re required to add Google Analytics script to your site’s files in order to have the service record traffic data and user behavior for your site. This code is typically placed in the header, though it’s much easier to add it via a plugin.

When a legitimate user visits your website, the data goes through your server before being sent over to Google Analytics.

When a common form of referral spam, known as “ghost spam,” occurs, attackers use automated scripts to send fake traffic to random Google Analytics tracking codes. When these fake hits are sent to your code, the data is recorded in your analytics as a result despite the fact that the traffic never reached your site.

Sometimes fake referrals come from malicious crawlers. Traffic sent via this type of referral spam does go through your server, but it ignores the rules in your site’s robots.txt file in the process. The traffic is then sent over to Google Analytics and recorded as a hit.

How to spot referral spam in Google Analytics

You can find referral spam alongside the other referrals Google Analytics records for your site. You’ll find these by going to Acquisition → All Traffic → Referrals.

google analytics referral trafficPin

Some spam websites are easy to spot. They’ll typically have odd domains with unprofessional names, phrases like “make money” or references to adult content in them.

They may also have a lot of hyphens or use nonstandard domain extensions. Other spam referrals aren’t as easy to spot, so you’ll need to use alternative methods.

By the way, make sure you use a custom range when looking at your referrals in Google Analytics. Set it to view the last two months at the very least, but you can go back as far as you wish. Just note that the further you go back, the more data you’ll need to sift through.

Because hits in the form of ghost spam do not originate from your site’s actual server, they’ll typically have bounce rates of 100% and sessions lasting 0 minutes and 0 seconds. Click the Bounce Rate column to sort the data by highest bounce rates first to make things easier on yourself.

google analytics bounce ratesPin

Crawler spam is much harder to detect as these bots do visit your site, so they typically use valid URLs and have accurate bounce and session data. If you think a source URL in your referral reports is spam, don’t visit the site to confirm it.

Instead, run it through a Google search by surrounding it in quotes (“google.com” for example) to see if it’s been reported as spam.

If you do visit these sites, make sure you’re using the latest versions of browsers like Chrome and Firefox, both of which have safeguards in place to protect you from malicious sites. Make sure your computer or device also has live antivirus software installed and active on it.

Why is referral spam bad?

The Referrals report isn’t the only place data from referral spam seeps into in Google Analytics. You’ll find it throughout your reports, particularly in the master view where the total number of hits your site or individual pages is located.

If your reports are tainted by hits that do not represent real people, you may wind up making misguided marketing decisions that lead to campaigns that either don’t take off or don’t earn revenue.

It should be noted that although Google has done a lot to stop referral spam from affecting your data, it’s a common occurrence that affects the majority of sites on the web.

While you should always choose a quality host, use a security plugin if you don’t use a managed WordPress host, and only install themes and plugins from trusted sources, you aren’t able to do much to deter spam since they either don’t attack your site directly or have ways to make the traffic look legitimate.

That’s why we’re going to show you how to fix referral spam by filtering it in Google Analytics.

How to fix referral spam in Google Analytics

Filters in Google Analytics are permanent, and filtered data cannot be retrieved. This is why you should always create an unfiltered view for your site as it lets you see data that may have been incorrectly filtered out. It helps you monitor the amount of spam your site receives even after you apply filters to remove it.

Creating an unfiltered view for your site’s Analytics account is easy. Start from the Admin screen (the Admin button is located at the bottom, left-hand corner), and click View Settings under the View panel (right-hand panel).

Start by renaming your current view, which is called “All Web Site Data” by default, to “Master View” by changing the name in the View Name field. Click Save.

google analytics master viewPin

If you scroll back up to the top, you’ll see a button toward the top right-hand portion of the screen labelled “Copy View.” Click it, name the new view “Unfiltered View,” and click Copy View to confirm it.

You may also want to go back to Master View and repeat this process to create another view called “Test View.” You can use this view to test new filters before applying them to the master view.

You now have an unfiltered, and possibly test, view in Google Analytics. If you applied filters to your master view, remove them from the unfiltered and test views. If you didn’t, you’ll receive a notification about redundant views from Google Analytics, which you can safely ignore.

Fixing ghost referral spam with a single filter

You’ve already identified spam URLs in your referral reports. Many webmasters go right ahead and create filters to block these URLs from appearing in their reports.

Unfortunately, spammers rarely use a single source name in their attacks, which means you’ll need to create new filters continuously to block any subsequent spam that appears in your reports.

What you should do instead is create a filter that only includes data from real hostnames.

Behind every domain is the computer and network it’s attached to, which can be identified by an IP address. These IP addresses are given unique “hostnames” to identify them with easy-to-remember alphanumeric names.

The prefix “www” is a hostname as is every domain on the web since they’re both connected to computers or networks with IP addresses.

Ghost spam is sent to random Google Analytics tracking codes rather than the hostnames linked to your site, so they use fake hostnames instead. This means it’s much more effective to filter out referrals that use fake hostnames.

The filter we’re going to create will also remove fake hits created by fake hostnames in your keyword, pageview and direct traffic reports.

Creating a regular expression for your filter

We’re going to create a filter that only includes hits from valid hostnames as a way to exclude fake ones. This means you’ll need to create a list of valid hostnames associated with your site.

If you have filters applied to your master view, switch to the unfiltered view you created earlier. You’ll find hostnames identified by Google Analytics by going to Audience → Technology → Network and switching the primary dimension to Hostname.

google analytics hostnamesPin

Here’s a list of the types of hostnames you’ll want to include in your reports:

  • Domain – This is the primary hostname used to identify your site on the web and the one legitimate referrals will pass through, so it needs to be included. You can ignore any of the subdomains you’ve created as they’ll be covered by your main domain.
  • Tools & Services – These are tools you use on your website and may have linked to your analytics account to collect data for campaigns. They include tools like your email marketing service provider, payment gateways, translation services and booking systems, but external tools, such as YouTube, you’ve integrated into your account count as well.

Make a list of all of the valid hostnames associated with your site based on these tips, being sure each name matches how it looks in the Hostname field. Exclude the following hostnames:

  • Hostnames that aren’t set
  • Development environments, such as localhost or the subdomain of your staging environment
  • Archive and scraping sites
  • Hostnames that look legitimate but are either sites you don’t own or tools and services that are not integrated with your Google Analytics account. These are likely spam being disguised as legitimate sources.

You should now have a list of valid hostnames of sources you either manage or use with your Analytics account. You now need to create a regular expression, or “regex,” that combines all of these.

A regular expression is a special text string for describing a search pattern. That search pattern is a list of valid hostnames in this case. Google Analytics will use this expression to identify the hostnames you want to include in your data after you create your filter.

Here’s an example of how your expression should look:

yourdomain.com|examplehostname.com|anotherhostname

The pipe | characters are important here. They mean OR and help Google Analytics distinguish each hostname from one another. You should never insert pipe characters at the beginning or end of an expression, so do not start or end your regular expression with |yourdomain.com or anotherhostname|.

Creating the filter

Navigate to the Admin screen, and switch to Test View if you created one. Switch to Master View if not.

Click the Filters link under the right-hand View column, then click the red Add Filter button. Here’s a quick list of how to configure this filter:

  • Method: Create new filter
  • Name: Something descriptive, such as “Valid Hostnames”
  • Filter Type: Custom
    • Make sure Include is selected
  • Filter Field: Hostname
  • Filter Pattern: Copy and paste your expression here, making sure there are no spaces
google analytics hostname filterPin

Once you’ve pasted your expression in the Filter Pattern field, click the Verify This Filter link to see if unwanted hostnames will be filtered out correctly. Click Save to create the filter once you’re done.

If all is well, repeat the process with your master view, and delete the test version.

Filter spam from crawler bots

Some spammers use crawler bots to send fake hits to your site. Plus, some third-party tools you use, including project management and site monitoring tools, operate via crawler bots if you have them integrated into your site.

You can block this type of spam by creating a similar expression but using source names instead of hostnames. Navigate to Audience → Technology → Network again, and add Source as a secondary dimension.

google analytics source namesPin

Here are two different prebuilt expressions you can use from Carlos Escalera Alonso’s site if you want to make things easier on yourself.

Expression 1:

semalt|ranksonic|timer4web|anticrawler|dailyrank|sitevaluation|uptime(robot|bot|check|\-|\.com)|foxweber|:8888|mycheaptraffic|bestbaby\.life|(blogping|blogseo)\.xyz|(10best|auto|express|audit|dollars|success|top1|amazon|commerce|resell|99)\-?seo

Expression 2:

(artblog|howblog|seobook|merryblog|axcus|dotmass|artstart|dorothea|artpress|matpre|ameblo|freeseo|jimto|seo-tips|hazblog|overblog|squarespace|ronaldblog|c\.g456|zz\.glgoo|harriett)\.top|penzu\.xyz

You’ll need to go through your source URLs to determine which tools send crawlers to your site and create your own expression for them.

When you add these filters to your test and master views, use Exclude as a Filter Type and Campaign Source as your Filter Field.

Final thoughts

Referral spam can wreak havoc on your site’s analytics. It can make it seem as though you have more hits and a higher bounce rate than you do. That’s why it’s important to block referral spam in your reports.

Just be sure to have three different views for your site—one master view, one for unfiltered data and one for testing. Double check the Filters area for your unfiltered view to ensure there are none as it’s important for you to monitor what gets blocked.

While this article focused on referral spam, it’s important to note there are additional ways you can filter spam in Google Analytics. For instance, you can use the above guide to find and filter spam for the following reports:

  • Language
    • Filter Type: Language Settings
  • Referral
    • Filter Type: Campaign Source*
  • Organic Keyword
    • Filter Type: Search Term
  • Service Provider
    • Filter Type: ISP Organization
  • Network Domain
    • Filter Type: ISP Domain

Note: If you’re going to filter referral spam by source, consider adding items from Matomo’s referrer blacklist (spammers.txt).

Related reading: