Microsoft Search

Occasionally, someone will brag about their progress in such a way that not only do you realize they’re actually far behind, but that they have absolutely no idea how far behind they are or how much lies ahead.

In software engineering, there’s a certain type of programmer fitting this mold that everyone runs into sooner or later. They’re not very communicative, but whenever asked about the status of their part of the project, they response positively, reporting things are going well. No big issues.

They usually have some experience, meaning nobody really has to watch over their shoulder. Besides, everyone else is busy getting their part done.

There is a very distinct moment at which you realize one is on your project. A moment where your heart fills with an icy chill, and mind can’t help but lock in on the words “OH FUCK.”

It’s days before QA — or even before launch — when everyone needs to have everything done. Not “mostly done.” Not “almost done.” Really done. Done done.

As in the users can use it now.

As the group assembles for the status check, maybe others in the room to report a bug or two they are putting the last touches on. Or a taking care of last minute cosmetic issue. When it comes our friend, he proudly proclaims his status — so gleeful you can almost see the “Mission Accomplished” banner behind him — “Great!”, he says, “I got it to compile this morning! It’s done!”

OH FUCK.

Typically, this moment is filled with silence. Maybe after a brief pause, someone will clarify that this, in fact, the first time it compiled. Usually, the response, a grinning acknowledgment with an “Ain’t it great!?” feel, confirms the worst.

The rest of the team continues to reflect in silence. Not because they don’t want to burst his bubble, but because they are realizing how utterly fucked they are. How the launched is delayed. And their personal lives are about to go out the window, as it will now fall to them to find all the guaranteed problems with the untested code & spend late nights & weekends getting it ready enough to launch.

It’s not quite the same, but that sort of thing was going through my mind when I read how Microsoft had announced improvements for their search engine.

To quote from the article:

Nadella said the updates will be phased in by the end the month. Among them, he said, are improvements to the way Live Search interprets what users are looking for, even if they misspell a word, type in two separate words instead of a compound word or use a variation, like “driving” instead of “drive.”

Live Search also will better detect what Nadella calls “stop words” — keywords or phrases that aren’t considered unless there’s a specific combination or context — like the name “Will Smith” in the U.S.

“It’s a huge improvement,” Nadella said. “We believe we can now compete with Google.”

WTF?!?

They didn’t have these things before?!

Understanding “driving” vs. “drive”? That’s a pretty basic problem called stemming. How basic? Put it this way, there’s been open source code out there to this before there was a Microsoft Search. Fuck, even the Wikipedia page has existed since 2007. All they had to do was go there. I know Microsoft is big about “eating their own dogfood”, but damn, use Google just this once to find it.

Now et’s assume for a second, they’ve had stemming and it’s a problem that’s actually harder — something like knowing “car” and “automobile” are the same thing. Now, granted that’s harder, but here’s the thing — when I transfered down to Yahoo Search Marketing, some 4-5 years ago, they already had this technology. The process was called canonicalization, and by the time I got there, it was an old, well-established piece of technology. So well-established that despite being fairly technical, everyone in the company — business, product, support, etc. — knew it was, at least at a high level, its purpose and so on.

Likewise their updated stop words. Yahoo’s long had a process called “Units”, which look for things like “Denzel Washington” and know it’s not the same as the Northwest state or the national capital. You can see a related patent application from Yahoo here. Although, if you work for Microsoft (or Google), it’s probably best you didn’t.

So really what was the crux of Microsoft’s announcement? Two things:


  1. They have absolutely no idea how far behind they are.
  2. They’ve launched technology that will catch them up to where the other big players were 5-6 years ago.

And now they think they’re on par with Google?

Seriously, it’s like they got a Revolutionary War era powder-gun off Ebay and declared “We believe we can now compete with the U.S. military.”

15 Responses to “Microsoft Search”

  1. ben Says:

    I think where Yahoo! really screwed up was not getting their search to be the default search in Firefox and Safari. I’m not sure if other people would agree with me on that, but that’s one of the key reasons I use Google.

    And where Microsoft screwed up was trying to run their search on Windows Server. :)
    Also for some reason, Microsoft marketing and branding is horrible. Windows Live Search?
    Windows Live in general is a stupid concept — it’s not fucking Windows. It’s the internet.

  2. Jeremy Osborne Says:

    Just a random reader directed here to your website by a friend. Amazingly good read, and a great follow up about developers just like this he and I run into.

  3. dk Says:

    really liked this line *after* the patent stuff “Although, if you work for Microsoft (or Google), it’s probably best you didn’t.”

  4. Bob Warfield Says:

    Did you try typing “Driving” and “Drive” into Google to see if you get the same search result?

    Hmmm,

    BW

  5. RR Says:

    You are oversimplifying (somewhat, at least).

    Take the case of stemming. While it sounds good (and obvious), I am not sure if it leads to really great gains. For example, I found this page on a google search:
    http://grapeshot.co.uk/shared/porter-stemmers.php
    In 3 real evaluations, the most commonly used (Porter) stemmer did slightly better than no stemming on 2 evaluations, and did not do better on a third.

    Bottom line is, I cannot believe Microsoft Live Search programmers didn’t already know about stemming. The latest “innovation” is probably smarter than just a naive stemming algorithm. You are oversimplifying the task!

    And, the search for “Denzel Washington” works just fine on Live Search. The suggestions are interesting too (e.g., when I search Pitt, I get both the college and Brad Pitt as suggestions).

    [Disclaimer: I don't work at Microsoft!]

  6. Jeffrey Friedl Says:

    I think you meant 2004 WRT Wikipedia’s entry on Stemming.

  7. wdr1 Says:

    RR, I’m not oversimplifying, Satya Nadella, the MS VP is. The “drive” vs “driving” example is his.

  8. wdr1 Says:

    Yep, thanks for the correction Jeffrey. Your eye for detail is much better than mine. ;-)

  9. Rob Mathieson Says:

    It pisses me off when people don’t proof read their own articles first. Your open source text editor probably doesn’t have spell check though. Hahahahaha.

  10. Sean Hederman Says:

    Funny that Rob Mathieson laughs at you when his CV (at http://195.97.193.35/robmathieson/CV.htm) is such a grammatical disaster. My favourite: “This was alongside advanced some .Net functionality, such as HTTPXML”

  11. Bogdan Ghervan Says:

    This is really a good piece of article you wrote! I really enjoyed when you compared to competing with the U.S. Military with a powder-gun :).

    As for the stemming issue, it is more probable that Nadella didn’t know the exact situation or how to put it in words that sounded better for the media. However, I’m not sure if he realizes how lame is to admit how poor their engine is and recognize in the public (as a competitor) that they’re going to become a worthy Google rival soon.

  12. Tom Primožič Says:

    In response to Ben:

    Although I speak on my behalf, I believe that many many users agree. The reason I use Google (over Yahoo!) is that it is Concentrated – search only. Whenever I stumble across Yahoo!’s home page, I am horrified by the amount of non-search-related content (possibly called spam in this context?) present on the site. When I want to search, I want search only – and that is exactly what Google gives me. Nothing more.

    – Tom

  13. Bob B. Says:

    @Tom

    I really think this is no longer a valid excuse. http://www.yahoo.com has always been cluttered, it’s a portal page, not a search page. If you want a clean Y! search experience, use search.yahoo.com. Better yet, assign a bookmark shortcut on your browser, y for Yahoo, g for Google. That way you can just use the address bar.

  14. Andy Freeman Says:

    search.yahoo.com is a search box and nothing more – it’s more sparse than http://www.google.com .

    http://www.yahoo.com is a portal page.

  15. rick Says:

    Do you think I would like Microsoft Search?

Leave a Reply