The Search Engine Race

A Two Sided Game Plan

In 1998, Lawrence “Larry” Page and his colleague at Stanford, Sergey Brin, outlined the then-current status of search engines. Yahoo!, WebCrawler and Search Engine Watch were said to store between 2 and 100 million webpages. User searches took between 1 to 9 seconds. All of this, Page and Brin said, would be dull in comparison to what users would demand of the Internet in less than a decade’s time.

They were correct. Today the largest search engine in the world labels and stores up to 20 billion websites per day and handles over 30 billion daily queries, returning responses in less than half a second. However, something odd happened along the way towards this huge progress. Somehow, a small startup that emerged from the Page and Brin paper in 1998 managed to obliterate all competing search engines. Google emerged, and Google remained.

Google’s success can be partly attributed to two strategic advantages, both very different in nature. One of them involved efficient employment of material resources while the other focused on the adoption of new network theory. However, both led to a growth in Google’s capacity to store and manage its data.

One core component of Google is its servers. In networks, a server is a computer that focuses only on receiving and managing data. So when any user accesses a certain website, the user’s device sends a request to access that specific website’s server. Once permission is granted, bits of data from this server are transported back to the user’s device. When it comes to Google, there is not just one server at the end of the line, but close to a million servers.

Google’s first brilliant strategic move was in purchasing its servers. While its competitors purchased material from companies like Sun Microsystems or EMC, Google set out to build its servers with the cheapest materials it could find. This strategy produced servers that failed very often compared to those of competing search engines. To counteract this, Google programmed a handful of servers to perform the same function. The net trade-off, Larry and Sergey found, was rather favorable. It turned out to be cheaper to acquire lots and lots of low-quality servers and constantly replace faulty ones than to build expensive ones that would fail less often. Such an approach enabled Google to acquire more servers than its competitors and therefore to store larger volumes of data.

A second deliberate move by Google to further dominate the search engine industry was its development and use of a certain principle called Software-Defined Networking. Software-Defined Networking, or SDN, is a centralized approach to switch communication. In any network, there are small ‘bridges’ (switches) that take data as input and send them off as output to the appropriate place. Prior to SDN, each of these switches had its own logic built into it that would allow it to “decide” where to send the data. More precisely, this switch would communicate with nearby devices to get a sense of the entire network and then decide where to send the information. However, SDN changed this. It proposed a network of switches that would contain no individual logic but would access a common logic from a central hub. This approach made networks more flexible as the network operator needed only to tinker with the central software controlling the switches to alter data flow. The operator did not need to be concerned about what each switch was doing in the hardware.

“…It was neither the strategic use of material resources alone nor the implementation of fancy theory alone that propelled the company’s success. It was both.”

Such SDN links not only made for more flexible and easier-to-manage data flow, but also improved the switches’ ability to transmit data. In its paper “Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network” Google strips down the challenges it faced over the years into a single one — increasing the switches’ bandwidth, or the efficiency with which the switches transmit data. With challenges such as an “explosion [of] data set sizes with more photo/video content” and “co-resident applications [that] share substantial data with one another in the same cluster”, increasing bandwidth was the single point of focus for Google engineers. Upon the adoption of SDN links, Google’s bandwidth in its servers was able to improve significantly as the data flow was not only more flexible but also better managed. SDN was able to “ease the congestion of data center networks where bandwidth needs were exploding.”

As these two developments were carried out along with several others, Google gradually grew into the information giant it is today. Over the span of a decade it increased its bandwidth capacity by a hundredfold and the number of pages it could label and store by more than a thousandfold. By looking at this path of progress, one can see that it was neither the strategic use of material resources alone nor the implementation of fancy theory alone that propelled the company’s success. It was both. Google took these hand-in-hand all along. That’s why it won.

Sources

http://blog.fibermountain.com/blog/four-reasons-to-make-the-leap-to-sdn-in-your-data-center

http://www.wired.com/2012/04/going-with-the-flow-google/

http://www.derekchristensen.com/why-google-is-better-faster-and-cheaper/

http://webcache.googleusercontent.com/search?q=cache:lFxOXHq6d8gJ:conferences.sigcomm.org/sigcomm/2015/pdf/papers/p183.pdf+&cd=1&hl=en&ct=clnk&gl=ca

http://infolab.stanford.edu/~backrub/google.html

jafar

Nice post!! A web search engine is a software system that is designed to search for information on the World Wide Web. The search results are generally presented in a line of results often referred to as search engine results pages. The information may be a mix of web pages, images, and other types of files. Some search engines also mine data available in databases or open directories.

Jafar