How to Identify and Resolve Duplicate Content (Ultimate Guide 2021)

Duplicate content can do harm to a website or blog in terms of SERP, not to mention its effect on the user experience. In short, it must be avoided, and there are no reasons that justify its existence. But what is duplicate content anyway?

I can make things simple and tell you it is what the name says: content that has one or more identical or very similar copies on another page. Those extra pages may be under the same domain or posted anywhere else. There is absolutely no additional value to those who access the content, which is an obvious reason to believe it can affect the SEO results

Why is It Important to Resolve Duplicate Content?

When there is duplicate content on the same website, it might be difficult for users to identify which page is the most relevant. That impacts the user experience and might even make the website lose its credibility.
Even when the content does contain useful information, other websites that want to link to it will probably diverge in what page they link to. In other words, shares and recommendations that could positively affect one page are dispersed and fragmented due to duplicate content.
Search engines might not index the most relevant page, and the duplicates will be competing for the same SERP.
There might be a technical issue resulting in duplicate content that can lead to other problems.

What are the Best Solutions to Resolve Duplicate Content?

Fortunately, it is possible to resolve duplicate content without much effort and knowledge. I did it myself without dedicating more than a couple of minutes. As soon as there is some room in your schedule, take a moment to apply one of the solutions below.

301 Redirects

If the old or unnecessary versions of the same content exist, it is better to apply 301 redirects to them rather than simply remove the content. After all, they might already have some relevant authority that can be passed through 301.

In addition to the benefits in terms of SEO, users will be redirected to the page where they can get the best information if they access the one you replaced.

Add Meta No-Index

Redirects are not always the solution, although they serve most of the cases. When the pages must remain accessible for a reason, it is possible to apply a no-index meta tag. It allows crawlers to get past it without ignoring the whole page, just not indexing it. One suggestion of application is when content might be indicated as a duplicate within a pagination context.

Canonical Tags

Similar to the previous case, keeping a page because it is useful and slightly different from the original is possible without affecting SERPs. The use of “rel=canonicals” allows readers and visitors to keep seeing all the similar pages, but only one is indicated as the most relevant for crawlers.

One great example of application is in the case of e-commerce websites with similar products that cannot really differ much in terms of content.

Differentiate the Content

Finally, an obvious but efficient solution is to add content that will differentiate the pages. However, it should not be applied aimlessly. The concern about the impact on SEO should be as relevant as the consideration for the user experience. Visitors should find that additional content useful to some extent.

Possible Causes of Duplicate Content and Their Solutions

All the actions above can help in resolving duplicate content in an easy manner, but the job is not done when there are causes that persist. In fact, some technical cases do not require any of the solutions above, only one straightforward fix instead.

URL Typos

When creating pages or blog posts, pay close attention to the URL. It is not only important to add a keyword that has SEO value but to make sure not to use Caps Lock (case sensitivity) or to verify the existence of the “trailing slash”, which is that last slash without any additional letters. To clarify both cases:

Server Misconfiguration

The content looks perfect, it is a brand-new niche with lots of search volume, but something seems to impact the SERP of a page negatively. It may be that the page is accessible via different URLs, which is a common mistake. The example below shows a few different URLs that indicate duplicate content.

The solutions involve getting rid of the additional pages by configuring the server properly, implementing 301 redirects, or simply using the “rel=canonicals” tag if they are all necessary.

Use of WWW and HTTP vs. HTTPS

Another issue that most of the time can get resolved easily by properly configuring the server is the misuse of WWW and HTTPS. First, decide if your website will have a WWW or not and stick to it. The same applies to HTTP or HTTPS.

If necessary, apply 301 redirects to solve this problem too, but do not ignore it. Test entering the website with all four possibilities, and if they all open the same URL, it is configured correctly.

Filtering Parameters

Parameters such as “?size=big” might be added to websites to help users find what they are looking for through filters. The problem is that indexing those pages will result in duplicate content, and all pages might be affected negatively.

While filters are excellent in terms of user experience, they can generate an enormous problem, especially because the order they are applied also has an influence on the generation of new pages. Fortunately, you are already aware of the power of “rel=canonicals” and can implement it on each relevant page.

In addition, the suggestion is to handle parameters properly using Google Search Console and Bing Webmaster Tools to help crawlers identify those parameters.

Taxonomy and Different Categories

Taxonomy results in problems when a page has more than one category or is part of any other system that might generate different ways to the same page, i.e.:

website.com/way-1/page
website.com/way-2/page
website.com/way-3/page

In those cases, choose one to be the main category – make sure to analyze and make it a wise decision in terms of SEO – and use rel=canonicals once more.

Comment and Search Pages

Enabling comments can enhance user experience, but it might also be the reason why posts are generating duplicate content. That might happen when pagination is activated, and the next pages repeat the same content and change only the comments. When that is the case, make sure to apply pagination link relationships to avoid the indexation as duplicate pages
Another useful tool for users that can result in the same problem is the use of the search functionality. All search results pages on a website should be pointed as a non-indexable page through the no-index meta. They provide no use in terms of SERP and will definitely result in lots of duplicated pages.

Landing Pages and Testing Projects

Landing pages are often similar to other regular pages of websites. Therefore, all those pages created for traffic acquisition purposes can be considered duplicates. Simply apply the meta no-index solution for duplicate content to preserve the original.

Also, make sure to eliminate access to any staging website where new layout changes and functions are tested. If it gets indexed anyway, make sure to remove it using the Search Console and Webmaster Tools.

Localization to Regions of the Same Language

It is incredible to have your website ready for all sorts of markets, but it is important to remember that different countries use the same language. Therefore, make sure to use the hreflang attribute to identify that a certain page is dedicated to a country, even when it looks like another one. For the sake of exemplification, just imagine the Canadian and the British versions of a website.

How to Find Duplicate Content?

Do not feel apprehensive after seeing all those causes to duplicate content. I know it can be difficult to imagine how to handle all those possibilities and where to start checking them. Fortunately, there is an automatic and much easier way of finding duplicate content on your websites.

Finding Duplicate Content Using Search Console

Under “Index” in your Google Search Console>, the coverage report is useful to identify duplicate content on your website. You may see some different errors in the case that duplicate pages are found:

Duplicate without user-selected canonical: Duplicate URLs that did not use the rel=canonical solution I just taught you.
Duplicate, Google chose a different canonical than user: Google decided to use its own canonicals instead of the one they found on your website.
Duplicate, submitted URL not selected as canonical: Google decided to ignore the canonicals of a URL submitted via an XML sitemap.

That should work as a way of finding where the problem is. Although Google might automatically exclude them, you should not ignore the importance of avoiding the generation of duplicate content. The last option above shows that it might decide to remove the URL you submitted, for example.

Finding Stolen Content on Other Websites

Unfortunately, not all duplicate content is easy to resolve on your own website, as other sites might be using your ideas and not necessarily being punished for it. In that case, there are two ways of figuring it out:

Copy and paste a relevant part of your content on Google and check if there is any website with very similar or identical use of words.
Use a paid software to verify if your content is unique or if it can be found somewhere else on the internet.

If positive, it is possible to simply contact the webmasters and require the content to be deleted. Sometimes, the people responsible for the website are not necessarily aware of that when there are multiple people working there. When that does not work, filing a DMCA complaint on Google Search Console might:

FAQ

Can Duplicate Content Give me a Penalty?

Duplicate content does not result in a direct penalty. However, it does impact your SEO strategy and results and must be fixed. The exception is if you copied the content of someone else.

What is the Absolute Best Fix for Duplicate Content?

The cause of the duplicate content and your personal strategies will define the best solution to resolve duplicate content. Even so, it is clear that the use of no-index and the canonical attributes are useful companions.

What if I Copied Content From Other Websites?

Remove it and, if you want to write about the same subject or present the same type of content, write it with your own words. The alternative is to risk a penalty and consciously knowing you are affecting the work of someone else.

What to Do If Someone is Stealing my Content?

Fill a DMCA complaint through Google Search Console if they do not return your contact requiring them to remove the content.

Can I Increase my SERP by Resolving Duplicate Content?

If duplicate content is a relevant issue on your website, the chances are that you will most probably see improvements once they are resolved.

How Much Duplicate Content is Not Too Much?

That depends. In order to avoid risks, prefer to reduce it to the minimum possible and required for the best performance of your website in terms of user experience combined with SEO.