Next Comes the Indexing

Once a site has been found and crawled by a Google spider, the next order of business is to index the site and its pages.

Indexing means that Google stores your site in their library to be recalled later when a user inputs a relevant query. Not every page will be indexed, even if it’s found by Google. Sometimes a page won’t be indexed because it has been redirected or removed, it’s blocked, or it’s penalized.

After a page is crawled but before it’s indexed, Google must first render a page. In very simple terms, rendering is the method used by a bot that doesn’t “see” the way humans do to assemble a page from a given set of code. It does this by requesting and downloading the HTML (a simple coding language used to build many sites) used to build the page, processing that code, and then crawling the rendered page again. If the page is then indexed, the rendered page will be saved to Google’s index. According to Google, their index is currently over 100,000,000 gigabytes large.1

There are times when you may not want a page indexed by Google because you don’t want Google to rank and send traffic to pages that don’t make sense (e.g., a checkout page or a login page). We’ll talk later in this book about how to prevent Google from indexing a page as well as how to encourage Google to index other pages.