Copyright August 8, 2006 by Mike Banks Valentine
The Research Laboratories session at SES San Jose 2006 brought representatives from the top 3 engines to talk about how projects emerge from their labs to become actual search tools. Each offered a different perspective and each seemed to have a differing emphasis on moving from ideas to products.
First up was Peter Norvig, Research Director at Google, who began by asking, “What comes out of research?” He suggested that most of the tools emerging from Google labs are developed in a “Bottom up fashion … We have a bunch of engineers trying things out and some of them bubble up to the top.” He gave several examples and revealed that one of the most popular publisher tools, Adsense, came out of looking for a way to monetize Gmail, the free webmail product.
He showed an example of factual search, “What is the population of Japan?” The answer of a Google search on that query produces a direct answer as the first result on the page. 127,417,244. Followed by the source link and more possible sources displayed below. Clear fact based questions can be drawn from authoritative sources, continually updated and displayed as “One Box” searches.
He discussed Machine “Statistical machine translation” based on a model of English documents online compared to model of other languages such as news stories done in differing languages as a source for reliable quality for statistical comparisons. Norvig proudly displayed results of National Institute of Standards and Technology (NIST) competition for this type of translation shows Google coming out on top. They do it by looking at same text in different languages using online info without anyone actually speaking the languages.
Moving to more challenging computational and algorithmic research projects, Norvig discussed work being done on image processing in an attempt at “face localization” to determine from group photos, where a photo was taken. Identification of people on the web can’t be done so easily. The best they’ve reliably achieved is to determine if a face is that of a male or a female.
In what appeared to be an unintentional segue’ Norvig had mentioned the image processing in his presentation and was followed by Bradley Horowitz, VP product strategy for Yahoo. Horowitz had studied Computer vision and imaging before his involvement in search and claims that the science had progressed incremetally over years. He found an improvement when he first viewed Yahoo’s Flickr image tagging for determining photo content, “to avoid the heavy lifting of image processing algorithms. “People plus algorithms are greater than algorithms. This lead to emphasizing “Authority of Trust” of social search relying more on users than algo’s. He sees engines finding ways to re-Introduce “content and metadata” as reliable sources of classification.
Horowitz emphasized his “areas of focus” on Community at Yahoo and stressing “Better search through people” and social media such as their social tagging site, del.ico.us and social photo site, Flickr. He also mentioned the importance of microeconomics of Information navigation and search, with emphasis on the user experience. He pointed out that there are “2.6 words on average in search box” Yahoo Answers. Ordinary people ask a question in natural language and ordinary people answer in natural language. Turnaround time of question to answer suggested within a day, sometimes within hours.
One function he wished aloud for is probably one many people would love to see from search engines, the ability to ask where the most convenient Starbucks is on his route to the conference. Norvig (of Google) had to be biting his tongue, since Google currently shows exactly that on Google Maps pages linked from an address query.
Horowitz wrapped up by discussing the utility and value he sees in Yahoo Answers pages and suggested those would be factored into the algorithm soon. He also reminded the audience of the recent promotional stuff about celebrities asking questions for people to answer. Stephen Hawking asked “Will the universe survive the next 100 years?” Which is, of course, NOT the “normal person asking questions” as described above by Horowitz, but a PR move by Yahoo bringing in celebrity questions.
MSN James Colborn Ad Center labs. Paid search environment. Take that data to the next level. MSN has higher conversion ratio, tools in Ad Center help do that better. Showing probability of Commercial Query (Microsoft 33% “Buy digital camera” 91% probability of commercial intent. tool for advertisers. http://adlab.microsoft.com is available for anyone to use any time. Keyword mutation tool shows mis-spellings. Acronym resolution expansion. “Looking for feedback”. If there are things that you don’t like, please feel free to tell us as well.
Questions from audience. “20% time bubbling up.” How to determine where the lab should focus its time. Ideas from “20% time” allowed of Google engineers are first submitted to product management teams for review where they are voted on to move to the next level and into product development in the labs.
While little of any substance was revealed about possible new products from any of the engines research labs, it was at least illuminating to hear the differing emphasis and mindset of each company approach to research in the space. Now if they could only tell me where I left my keys in a “one box” result and get it out of beta before I’m late for the Google Dance tonight.