Thursday, July 9, 2009

Google search is a web search engine owned by Google Inc. and is the most-used search engine on the Web . Google receives several hundred million queries each day through its various services. Google search was originally developed by Larry Page and Sergey Brin in 1997.
Beyond the original word-search capability, Google Search provides more than 22 special features, such as: synonyms; weather forecasts; time zones; stock quotes; maps; earthquake data; movie showtimes; airports; home listings; sports scores, etc. There are special features for numbers: prices; temperatures; money/unit conversions ("10.5 cm in inches"); calculations ( 3*4+sqrt(6)-pi/2 ); package tracking; patents; area codes; plus rudimentary language translation of displayed pages.
A Google search-results page is ordered by a priority rank called "PageRank" which is kept secret to avoid spammers from forcing their pages to the top. Google Search provides many options for customized search (see below: Search options), such as: exclusion ("-xx"), inclusion ("+xx"), alternatives ("xx OR yy"), and wildcard matching ("*").

The search engine

PageRank

Google's algorithm uses a patented system called PageRank to help rank web pages that match a given search string.The PageRank algorithm computes a recursive score for web pages, based on the weighted sum of the PageRanks of the pages linking to them. The PageRank derives from human-generated links, and is thought to correlate well with human concepts of importance. The exact percentage of the total of web pages that Google indexes is not known, as it is very hard to actually calculate. Previous keyword-based methods of ranking search results, used by many search engines that were once more popular than Google, would rank pages by how often the search terms occurred in the page, or how strongly associated the search terms were within each resulting page. In addition to PageRank, Google also uses other secret criteria for determining the ranking of pages on result lists, reported to be a number over 200.

Search results
Google not only indexes and caches web pages but also takes "snapshots" of other file types, which include PDF, Word documents, Excel spreadsheets, Flash SWF, plain text files, online videos such as YouTube and much more. Except in the case of text and SWF files, the cached version is a conversion to (X)HTML, allowing those without the corresponding viewer application to read the file.
Users can customize the search engine, by setting a default language, using the "SafeSearch" filtering technology and set the number of results shown on each page. Google has been criticized for placing long-term cookies on users' machines to store these preferences, a tactic which also enables them to track a user's search terms and retain the data for more than a year. For any query, up to the first 1000 results can be shown with a maximum of 100 displayed per page.

Non-indexable data
Despite its immense index, there is also a considerable amount of data available in online databases which are accessible by means of queries but not by links. This so-called invisible or deep Web is minimally covered by Google and other search engines. The deep Web contains library catalogs, official legislative documents of governments, phone books, and other content which is dynamically prepared to respond to a query.

Google optimization

Since Google is the most popular search engine, many webmasters have become eager to influence their website's Google rankings. An industry of consultants has arisen to help websites increase their rankings on Google and on other search engines. This field, called search engine optimization, attempts to discern patterns in search engine listings, and then develop a methodology for improving rankings to draw more searchers to their client's sites.
Search engine optimization encompasses both "on page" factors (like body copy, title elements, H1 heading elements and image alt attribute values) and Off Page Optimization factors (like anchor text and PageRank). The general idea is to affect Google's relevance algorithm by incorporating the keywords being targeted in various places "on page", in particular the title element and the body copy (note: the higher up in the page, presumably the better its keyword prominence and thus the ranking). Too many occurrences of the keyword, however, cause the page to look suspect to Google's spam checking algorithms.
Google has published guidelines for website owners who would like to raise their rankings when using legitimate optimization consultants.


Functionality

The Google search engine has many intuitive features making it more functional. This could have played a role in making it as popular as it is today. Google is one of the top ten most-visited websites today.Some of its features include a definition link for most searches including dictionary words, a list of how many results you got on your search, links to other searches (e.g. you misspelled something, it gives you a link to the search results had you typed in the correct search), and many more. It is unknown whether functionality, speed, or luck brought it its peak status.

Search syntax
Google's search engine normally accepts queries as a simple text, and breaks up the user's text into a sequence of search terms, which will usually be words that are to occur in the results, but may also be phrases, delimited by quotations marks ("), qualified terms, with a prefix such as "+", "-", or one of several advanced operators, such as "site:". The webpages of "Google Search Basics" describe each of these additional queries and options.
Google's Advanced Search web form gives several additional fields which may be used to qualify searches by such criteria as date of first retrieval. All advanced queries transform to regular queries, usually with additional qualified terms.

Query expansion
Google applies query expansion to the submitted search query, transforming it into the query that will actually be used to retrieve results. As with page ranking, the exact details of the algorithm Google uses are deliberately obscure, but certainly the following transformations are among those that occur:
Term reordering: in information retrieval this is a standard technique to reduce the work involved in retrieving results. This transformation is invisible to the user, since the results ordering uses the original query order to determine relevance.
Stemming is used to increase search quality by keeping small syntactic variants of search terms
There is a limited facility to fix possible misspellings in queries.

"I'm Feeling Lucky"
Google's homepage includes a button labeled "I'm Feeling Lucky". When a user clicks on the button the user will be taken directly to the first search result, bypassing the search engine results page. The thought is that, if a user is "feeling lucky", the search engine will return the perfect match the first time without having to page through the search results.
According to a study by Tom Chavez of "Rapt", this feature costs Google $110 million a year as 1% of all searches use this feature and bypass all advertising.



Rich Snippets
On 12 May 2009, Google announced that they would be parsing the hCard, hReview and hProduct microformats, and using them to populate search result pages with what they called "Rich Snippets".

Special features
Besides the main search-engine feature of searching for text, Google Search has more than 22 "special features" (activated by entering any of dozens of trigger words) when searching:
synonym search - A search can match words similar to those specified, by placing the tilde sign (~) immediately in front of a search term, such as: ~fast food.
weather - The weather humidity, temperature and forecast, for many cities, can be viewed by typing "weather" followed by the city and state, U.S. zip code, or city and country (such as: weather Lawrence, Kansas; weather Paris; weather Bremen, Germany).
stock quotes - The market data[6] for a specific company or fund can be viewed, by typing the ticker symbol (or include "stock"), such as: CSCO; MSFT; IBM stock; F stock (lists Ford Motor Co.); or AIVSX (fund). Results show inter-day changes, or 5-year graph, etc.
time zone - The current time in many cities (worldwide), can be viewed by typing "time" and the name of the city (such as: time Cairo; time Pratt, KS).
sports scores - The scores and schedules, for sports teams,can be displayed by typing the team name or league name into the search box.
calculator - Calculation results can be determined,as calculated live, by entering a formula in numbers or words, such as: 6*77 +pi +sqrt(e^3)/888 plus 0.45. The user is given the option to search for the formula, after calculation.
unit conversion - Measurements can be converted, by entering each phrase, such as: 10.5 cm in inches; or 90 km in miles
currency conversion - A money or currency converter can be selected, by typing the names or currency codes (listed by ISO 4217): 6789 Euro in USD; 150 GBP in USD; 5000 Yen in USD; 5000 Yuan in lira (the U.S. dollar can be USD or "US$" or "$", while Canadian is CAD, etc.).
dictionary lookup - A definition for a word or phrase can be found, by entering "define" plus the word(s) to lookup (such as: Define philosophy)
maps - Some related maps can be displayed, by typing in the name or U.S. ZIP code of a location and the word "map" (such as: New York map; Kansas map; or Paris map).
movie showtimes - Reviews or film showtimes can be listed for any movies playing nearby, by typing "movies" or the name of any current film into the search box. If a specific location was saved on a previous search, the top search result will display showtimes for nearby theaters for that movie.
public data - Trends for population (or unemployment rates) can be found for U.S. states & counties, by typing "population" or "unemployment rate" followed by a state or county name.
real estate and housing - Home listings in a given area can be displayed, using the trigger words "housing", "home", or "real estate" followed by the name of a city or U.S. zip code.
travel data/airports - The flight status for arriving or departing U.S. flights can be displayed,by typing in the name of the airline and the flight number into the search box (such as: american airlines 18). Delays at a specific airport can also be viewed (by typing the name of the city or three-letter airport code plus word "airport").
package tracking - Package mail can be tracked by typing the tracking number of a UPS, Fedex or USPS package directly into the search box. Results will include quick links to track the status of each shipment.
patent numbers - U.S. patents can be searched by entering the word "patent" followed by the patent number into the search box (such as: Patent 5123123).
area code - The geographical location (for any U.S. telephone area code) can be displayed by typing a 3-digit area code (such as: 650).
U.S. Government search - Searching of U.S. government websites can be performed from webpage: www.google.com/ig/usgov.

No comments: