Skip to content

A call for better Wikipedia search

by Robert W Gehl on October 8th, 2009

Robert W. Gehl

My main point in this post: someone somewhere needs to make a better Wikipedia search engine, one capable of searching the vast material below the articles on the surface.

To prove this, I’m going to tell you about a research project I’m currently working on for my dissertation. (This is a bit involved but my main point will be clear in the end.) A consistent argument in analyses of the Web is that there is a balkanization of political discourse happening. Probably the most notable argument is Nicholas Negroponte’s notion of the “daily me”; the ability of users to customize their online news consumption to the point that they purposely avoid news and opinion that do not affirm their views. For example, through Google News, I can expose myself only to left-leaning news and editorial content and completely avoid right-leaning material. Or, I can only read the blogs Daily Kos and Talking Points Memo. Meanwhile, my neighbor only reads the Washington Times and Michele Malkin’s blog. And never the twain shall meet, unless its at one of the many protests and counterprotests which happen in DC.

I agree with this basic analysis; at least anecdotaly, it seems as if political discourse is extremely rancorous in the US. But this analysis begs the question: are there any places where Web users are in fact exposed to differing points of view, whether they like it or not? Where balkanization is mitigated? I’ve concluded that Wikipedia is that space.

At first, this seems odd. Wikipedia’s editorial policy is based on three rules: neutral point of view, no original writing, and verifiability. That is, they want every article to be as objective as possible, to contain no original syntheses of secondary sources, and to be well-cited. This seems like an unlikely place for debate.

However, in order to produce articles, Wikipedia editors – who can be anyone in the world – must have a place to debate neutrality and verifiability. Every Wikipedia article has a corresponding “Talk” page. In those pages, editors debate the quality of sources and how to phrase critical issues in a neutral way.

For an article on an innocuous, bland American television show like The Hogan Family, the Talk page has little debate. However, consider an article on Hurricane Katrina. Katrina tore aside a veil in American culture, a veil over the importance of race and class and the role of the state in a neoliberalized society. In that article’s Talk page, Wikipedia editors have debated sources and ideas which tackle the role of class and race in the disaster. They do so with exceptional logic and openness. While the argument can get heated, it is simply amazing how a bunch of “nobodies” were able to debate these controversies in a civil and in-depth manner.

I’ve been examining this debate, which is now four years in the growing and spans thousands of edits and possibly over 100,000 words on the talk page alone. I detail this debate in one of my dissertation chapters.

And this is where I need search help. Wikipedia’s own search only returns current articles. If I wanted to find the word “race” in the Katrina article, I will get the current version only and none of the 10,000+ previous editions. If I want to find out which editor wrote the word “race” in the article… I can’t.

There are some tools out there. Wikipedia has an article history search called Wikiblame, but it is quite limited. I have to know the date I want to search, and if I know that, there’s not much need to search, is there? Wikitrust is a brilliant tool, developed by PARC. However, it doesn’t seem to work on semi-protected Wikipedia pages; “Hurricane Katrina” is one such page. History Flow, from IBM, looks great, but isn’t available to the public.

And none of them search the “Talk” page, where all the debate happens.

In the case of the “Hurricane Katrina” article, there are 8 archived “Talk” pages. That’s a hell of a lot of material. And it can’t be searched.

There has got to be a better way! If you know of one, PLEASE LET ME KNOW.

This work, unless otherwise expressly stated, is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

From → Commentary

No comments yet

Leave a Reply

Note: XHTML is allowed. Your email address will never be published.

Subscribe to this comment feed via RSS

*