Google search engine crawling experiment



Recently I’ve had an experiment with the way Google crawls a site. I had a client site that had not been spidered in spite of being submitted to Google a good while back. I looked at the site and saw nothing amiss. There was plenty of text on the page everything looked good. I found their page in google but their was no cached text.


After a bit of searching I found others with similar problems and the speculation was that it was banned for “invisible text”. Some developers have used that technique to load keywoards into pages to take advantage of the search engines. Now, I hadn’t done anything of the sort, but I did write the pages in PHP and I had a few php comments to remind myself what I had done and why I had done it.

I removed the comment text and within a few days the main index page was spidered with searchable text and the summary was showing up in Google. So, I went about building a site map and submitting to get the rest of the pages crawled. Now the pages all were in the main directory with .php file extensions. I submitted the sitemap and waited and watched. A couple weeks went by and the sub pages still hadn’t been crawled. In that time, I had been doing frequent posts on this site and found Google spidering all over the place. WordPress uses easy to remember permalinks that look like directory paths, for instance…. http://www.averyjparker.com/2005/08/15/pcbsd-configuration/ instead of http://www.averyjparker.com/20050815pcbsd-configuration.php or something… Google seemed to like this as it was spidering and caching the various posts with about a 2 or 3 day delay.

About this time I found an interesting writeup on search engine optimization and specifically about Google. Among the things he noted is that he had noticed this behavior. No one at google could explain it to him, but it seemed that directories got spidered more quickly than specific document files. So, contact.php aboutus.php skills.php might not get indexed before domain.com/contact/ domain.com/aboutus/ and domain.com/skills

I tested this theory out on the site in question. I moved each subpage of the site to it’s own directory and renamed it to index.php (so that on viewing the directory it would automatically display), I updated my sitemap to reflect that (and the site’s menu). Within 2 days the Googlebot spidered each of the sub-directories. (After a wait of several weeks prior.)

So my best advice is if you’re eager to get a google spidering, go ahead and plan on giving seperate pages their own directories with a relevant name.

Related Posts

Blog Traffic Exchange Related Posts
  • Google search results catching up... Some time back I complained about the Google indexing of the site after the Big Daddy upgrade. For a good while before Big Daddy, there was usually about a week delay between me posting and there being a full crawl of the posted page which was fairly impressive. Post Big-Daddy......
  • The Google Problem, or why I'm starting to use MSN and Yahoo more. This weekend has been a bit of an introspective for me on why google is still the primary search engine I use. I know, I've been a big "fan(?)" of google for quite some time, I've obviously incorporated many of their products into my pages and used Google for 99%......
  • Interesting problem In doing a routine Google for my name... I ran across a website which has my email address and too many others to count in a plain text file. The site is configured to allow browsing of all files/folders and the text file claims to be 1 of 2, and......
Blog Traffic Exchange Related Websites
  • How To Know The Finest Net Hosting Solutions I am positive that if you are visiting this page, you will be interested in best shared hosting. More generally than not, the items which have been new the day in advance of is worn out news nowadays and to the normal person, maintaining up with the technological breakthroughs concerning......
  • The Major Search Engine Formally Pronounces Web Page Add Speed As Position Factor Google has inside of a recent statement, managed to get public that these have commenced using website upload speed during the 200 signals that determines the location of a website in Google's search result page. Since the speed of the page upload shows the response time of a website in......
  • Getting Your Small Business Found Online If you run a small business and need to attract new customers you will have already realised that getting your small business found online is now vital to your future survival. Gone are the days when most small businesses could find new customers by advertising in their local newspaper......
www.pdf24.org    Send article as PDF   

Similar Posts


See what happened this day in history from either BBC Wikipedia
Search:
Keywords:
Amazon Logo

Comments are closed.


Switch to our mobile site