I am one among a number of web profesionals who prefers their (X)HTML, CSS, and other code to meet current web standards.
One current incentive for webmasters and designers to use clean code could be Google's Webmaster Help Center - Webmaster Guidelines.
In this document, Google presents information which they say "will help Google find, index, and rank your site." For those of us interested in SEO, this is what we want to happen.
One of their Design and Content Guidelines is as follows:
Check for broken links and correct HTML
But Matt in an interview found here http://blog.outer-court.com/archive/2005-11-17-n52.html said:
"...We can’t throw out 40% of the web on the principle that sites should validate; we have to take the web as it is and try to make it useful to searchers, so Google’s index parsing is pretty forgiving..."
We all know to check for broken links. That has been an ongoing duty for webmasters since the inception of the WWW. But what about "correct" HTML?
In my view, this means "according to current standards". If your page is HTML 3.01, then mark it up as such. If you are using XHTML 1.0 strict, the markup should reflect that standard.
So then, what kind of HTML is Google using? Since so many of us (in the web business or not) use Google, shouldn't we look to them for a bit of inspiration as to how we should be constructing our web sites?
A DOCTYPE tag is presented at the top of an HTML document, before even the HTML tag itself. It provides browsers with a little hint as to how the code is to be interpreted.
A quick look at the Yahoo! home page shows that they HTML 4.01 transitional, as does DMOZ. Wikipedia uses XHTML 1.0 strict.
Interestingly enough, Google doesn't provide a DOCTYPE at all, leaving the browser to interpret the code as it sees fit. This is often referred to as "quirks mode".
The HTML 4.01 specifications, and later XHTML versions, specify that a DOCTYPE must be included. Leaving out the DOCTYPE is implying that your HTML is of a version prior to HTML 4.01.
In order for your code to be "correct" then, it must validate. Validating your code can be done any number of ways, but the simplest is to use one of the on-line validators such as the W3C Markup Validation Service.
When you have the validator examine the code from Goggle's main page, it fails, utterly and miserably. To be fair, Yahoo! also fails. DMOZ, however, is valid HTML 4.01, and Wikipedia validates as XHTML 1.0 strict.
Since Google has no DOCTYPE, the validator is left to make a "best guess" as to what type of HTML is being used. You can tell the validator to use a particular version, regardless of the DOCTYPE, but no matter which is chosen, Google fails.
How well would Google's own site be indexed by the Google search engine? Probably fairly well. Millions of pages indexed by Google do not conform to any web standards. They tell us in their guidelines that "correct HTML" can be an important issue in having our site indexed properly, yet they fail to comply with the current standards for (X)HTML.
I think it might be time for Google to "put their money where their mouth is." How about Google showing us how good they really are by bringing their HTML into the twenty-first century? It seems to me that perhaps they have been so busy with their behind-the-scenes programming that they have neglected their web interface, through which the world sees them. How easy is Google to use for someone with a screen-reader or a PDA?
Web developers frequently make use of the wide variety of tools Google has available, such as Google Maps, and AdSense. It can be very frustrating when you create a web page that meets (X)HTML and Accessibility standards only to be greeted by validation errors when you attempt to insert code provided by Google.
So, as web developers, how do we impress upon Google our desire for valid code? Perhaps this forum will provide us some small voice that might be heard by people in the right places, or we might go so far as to set up a blog in order to reach a wider audience and get more input from the community.
In the above mentioned interview with Matt, he also said:
"...Google’s home page doesn’t validate and that’s mostly by design to save precious bytes. Will the world end because Google doesn’t put quotes around color attributes? No, and it makes the page load faster..."
So Google' home page doesn't validate, and that's mostly by design to save precious bytes? Can't Google save precious bytes with making their code valid, using quotes around color attributes and page load faster? And while their pages don't validate, makes them load faster?
What about a CSS solution to the depricated fonts and other issues?
On a one page basis, the file would have to be built and checked against revised page code for size.
However, if they standardized on one global CSS file for all their search pages wouldn't that be cached on users machines potentially saving millions? After all, their search page is almost universally identical in look everywhere.
It seems that might be a low cost high ROI advantage that should be implemented at once.
Perhaps we can determine whether this option could, indeed, be a way for Google to increase both their ROI, and the number of friends thay have.
How do the rest of the community here see this issue, if it is an issue at all for them? All input will be greatly appreciated.