Search engine spiders will crawl your entire website to index your content. Most of the time, that’s what you’d want. Sometimes, however, you may want to block search engines from indexing a particular page or post on your WordPress site. As you might have already discovered, when you create new content in WordPress, there is no option that lets you block crawlers for individual pages or posts. Don’t let anyone tell you this isn’t possible, though. This blog post explores possible work-arounds.

A Brief Overview of Search Engine Crawlers

There are a variety of search engine bots crawling the net every single day. The most common ones are as follows:

  • Googlebot – Google
  • APIs-Google – Google API
  • Mediapartners-Google – Google AdSense and Google Mobile AdSense
  • AdsBot-Google-Mobile – Google AdsBots Mobile
  • AdsBot-Google – Google AdsBots
  • Googlebot-Image – Google Images
  • Googlebot-News – Google News
  • Googlebot-Video – Google Video
  • Bingbot – Bing
  • AdIdxBot – Bing Ads
  • BingPreview – Bing (creates page snapshots)
  • MSNbot – Predecessor of Bingbot
  • MSNBot-Media – Bing Images and Video
  • Slurp – Yahoo!’s crawler before Yahoo! contracted with Bing to use Bingbot
  • DuckDuckBot – DuckDuckGo
  • teoma – Ask

A Brief Overview of Indexing

When a crawler arrives on a page it checks for a noindex meta tag in the page’s header and HTML code. If the crawler sees that code, it will not index that page. If that page is blocked by a robots.txt file (through the “disallow” rule), however, the crawler will never see the noindex tag and thus the page can still appear in search results. That can happen when another page links to it. In these cases, Google simply constructs a title and snippet for that URL from the reference.

Therefore, a best practice is to block URL’s using the robots meta tag, which is done by placing the following code snippet within the header section of your WordPress theme header, that is between the <head> and </head> tags.

< meta name = “robots” content = “noindex”>

In this example, robots refers to all search engines and noindex tells these search engine not to index this page.

If you prefer to block specific crawlers, you can replace robots with their specific names, separated by commas. For example, if you do not want your multimedia assets indexed, such as images and video, you can use the following disallow rule:

< meta name = “googlebot-image,googlebot-video” content = “noindex”>

Now that we’ve reviewed how to stop search engines from indexing a page in their search results, I’d like to clarify that if you put this exact code into your WordPress theme’s header, you would block search engines from indexing your entire website.

There are a few additional steps needed to block search engine spiders (aka crawlers) from specific pages and posts. Let’s look at how this is done.

Adding the Meta Tag to Your Theme Header

What you will need:

  1. Admin access to your website’s header file.
  2. Your post’s ID. You can note it down while you “edit” the post (see example below):

https://www.sunnystartupmarketing.com/wp-admin/post.php?post=2642&action=edit
If your coding skills are limited I would highly recommend making a backup of your theme’s header file before attempting this, just in case something goes wrong. If you do not have any coding skills, I would recommend contacting a web developer to help with this. If you don’t have a web developer at hand, feel free to reach out for help.

 

Step 1: Head to your theme header file.

To access it, scroll down to Appearance in the left side panel and click on Editor. On the right hand side, you will now see a list of Theme Files. Select the one that reads “Theme Header (header.php)” or alternatively, you can access it via the backend on your server.

As we’ve already discussed, the meta tag needs to be inserted between the <head> and </head> tags. You can place the code anywhere within this section, as long as you don’t disturb already existing code, as this might break your site.

 

Step 2: Place the code.

This is the code that will instruct search engine bots not to index site the site. Again, robots can be replaced with specific bots, such as googlebot-image or googlebot-video, if that’s your preference. In this example, I’m going to instruct all search engine spiders not to index the site.

<?php if ($post->ID == X) { echo ‘<meta name=”robots” content=”noindex,nofollow”>’; } ?>

Step 3: Replace ID == X with the ID of your blog post.In the example above, we’ve determined that the blog post that I do not want indexed has the ID 2642. Hence, I need to adjust the code we’ve placed in Step 2 to make sure that this is the blog post that will not be indexed:

<?php if ($post->ID == 2642) { echo ‘<meta name=”robots” content=”noindex,nofollow”>’; } ?>

This workaround is the same for pages. Simply note down the page ID and insert it into this code.

If you have more than one blog post, let’s assume posts 26422643, and 2644, the code would look like this:

<?php if ($post->ID == 2642 || $post->ID == 2643 || $post->ID == 2644) { echo ‘<meta name=”robots” content=”noindex,nofollow”>’; } ?>

Verifying that the Meta Tag was Added Correctly

Every time you make changes to the code, you’d want to verify that it indeed works the way it is supposed to.

The easiest way you can do this is by heading to the page in question and looking at the source code. If you’re working on a Mac and are are using Google Chrome, head to View >> Developer >> View Source.

If you are working on a PC and/or a different browser, you can simply google “how to see source code” and add your computer model and browser to the search.

Once you see the source code click Control + F (or Command + F on a Mac) to get access to the Finder. This will allow you to search for the meta tag, rather than having to manually locate the code. I would recommend entering noindex into the search bar. If you have added the code correctly, you will now see <meta name=”robots” content=”noindex,nofollow”> somewhere between the <head> and </head> tags.

 

Lastly, I would also recommend keeping a record of your code changes that indicates:

  • the date the change was made
  • the file where the code was added/removed/changed
  • the code that was added/removed/changed
  • who made the code change
  • was the code verified to work as intended
  • who verified that the code was working as intended
  • when was the code change verified

This record will save you a lot of time and hassle should a code ever need debugging.

How to Remove a Post or Page from Search Engine Results

If you forgot to add the noindex meta tag to your theme header and search engines have already searched and crawled your site, not all is lost. You can remove your post or page from search engine results.

 

Removing a Page or Post from Google

To do this, you need access to your Google Search Console (previously, Google Webmaster Tools).

On the right hand side, you will now see a menu.

Step 1: Head to Google Index 

Step 2: Navigate to Remove URLs 

Step 3: Click the “Create a new removal request” button

Step 4: Enter the URL you’d like to remove

 

Removing a Page or Post from Bing

To do this, you need access to your Bing Webmaster Tools.

Step 1: Head to the Bing Content Removal Tool

Step 2: Enter the URL you’d like to remove

Step 3: Select whether you want to remove the page from their index or remove an outdated cached version of the page

Discourage Search Engines from Indexing an Entire WordPress Website

If you want the entire website not to be indexed by search engines, e.g. when you’re in the process of developing the site, there is a much simpler way of accomplishing this request.

Step 1: On the left hand menu of your WordPress console, head to Settings

Step 2: Select Reading

Step 3: Check the box next to Discourage search engines from indexing this site

Step 4: Click Save Changes 

 

I hope you have found this blog post helpful. As always, if you have any questions or require any help, please reach out.