We crawled over 200 domains, including all URLs linked from their homepages and the homepages themselves, and examined the differences between the rendered and non-rendered HTML on every page. You can a summary of the results and some detailed examples in this post. At the end of the article, you can also access the full crawling data set we collected and analysed.
Differences between the two HTML versions can play a role in indexing, especially for larger websites, and should therefore be taken into account during optimisation and analyses.
96% of the domains we crawled have differences between the rendered HTML and the original source code (non-rendered HTML).
Out of the 200 domains we crawled, 96% show differences in SEO relevant areas such as text, internal links, title tags or meta tags. However, often not every subpage of the domains showed differences between the two versions – a total of 56% of the crawled URLs were affected.
On 81 (approx. 35%) of the crawled domains, only subpages and not the homepages have differences between the rendered and non-rendered HTML.
Differences by area
A link is a link, regardless of how it comes to the page. It wouldn’t really work otherwise.
John Müller, https://s.viu.one/013fc
We also found some cases where links that the user sees do not appear in the rendered HTML for Google.
One reason for this is that certain links are only written into the rendered HTML when there is a mouseover event. The links do not exist in the original source code or in the rendered HTML before the first interaction. As far as we know, Googlebot currently does not carry out any mouseover events – therefore, this type of link is not “visible” to Google.
One can find an example of this from our crawl of rebuy.de – In this short video, you can clearly see that the links only appear after the mouseover event (in this example, the link to the iPhone XS Max in the main menu):
So, when manually analysing internal linking structures, if you want to be sure that links are available in the rendered HTML, you should look at the rendered HTML before any user interaction (or use a website crawling tool with JS rendering).
An example from our crawl can be found on klingel.de – Here an Ajax request is sent to the server to load the main menu on the subpages, which returns the categories. When this request is blocked (which is the case for Googlebot but not for a normal browser), the main navigation no longer works:
You can test this yourself with the Chrome Extension Asset Swapper.
On the homepage of klingel.de, unlike on the subpages, these internal links are available without the Ajax call – the code needed to show the navigation is loaded directly in the HTML source text.
And then there are cases where links are removed from the rendered HTML.
Here the link element is completely deleted from the DOM – and not just hidden.
At mediamarkt.de, for example, links in the main navigation that are present in the original HTML are deleted from the rendered HTML on subpages:
Google will discover the links in the original HTML in this case and most likely crawl them. It is currently not known how Google handles the signal that a link is apparently not relevant for the user because it is deleted from the rendered HTML. However, it can be assumed that Google takes this information into account in some way.
It is difficult to tell whether the internal linking structures from the examples are unintended or whether they are exactly what the website owners wanted to achieve. It is just important to take such influences into account when optimising.
In the area of content, we found many domains that contain additional content in the rendered HTML.
In the examples we examined, where relevant content is loaded into the rendered HTML, this content was always indexed by Google.
Changes in title tags
If we look at the domains that have different title tags in the non-rendered and rendered HTML, lidl.de stands out:
Bing, on the other hand, uses the title tag from the original source code (non-rendered HTML):
Changed canonical and robots tags
For canonical and robots tags, there were only a few domains in our study that showed differences between the rendered HTML and the original source code (non-rendered HTML).
It was noticeable that saturn.de has a “noindex” on the privacy page in the rendered HTML and no robots meta tag at all in the original source text. Still, the page is listed in the Google index. In this case, Google doesn’t seem to take into account the “noindex” from the rendered HTML.
Access to the complete data
You are welcome to have access to the complete data we used for this study in a Google Data Studio report.
Simply register here and you will receive the link to the Data Studio Dashboard by email.