Chronological Data’s Influence On Relevancy Analysis

Last night , I read Content should be experienced by relevance and importance and interestingness, not chronologically,  after Louis Gray shared it on Friendfeed. This is my take on chronological and other relevancy metrics. While, I agree with Garry on the outcome; I don’t  want to shrug off chronology as lacking importance, is a great tool for weighting an objects relevance.

Chronological data is a very nice weight to have when you’re looking at the whole set of objects. Chronology isn’t as important for idea’s or thoughts, as it is for news, but when looking at the subset of most actual news, it is much better.  Cadmus succeeds because it focuses on a small chronological window, or at least it does in my case, showing me the most relevant items in the last 24 hours, as well as a focused source of input, Twitter. Twitter, and most social tools, have high-entropy in relevance over a sustained period, so that if you really want to provide relevant or important information, focusing on what has happened in the past 24 hours is a great idea. Unfortunately, the web isn’t just Twitter, Facebook, or other tools, there are thousands of blogs, and news sources that are also relevant.

I’m going to use the example of a feed reader, considering that’s where I was focused when looking at these things, and thus have more insight into the discussion at hand, in that area. Trying to determine relevance on items, particularly news, doesn’t work so well when you’re focusing on a set of items you’ haven’t read, over a period of weeks, months, years, or even a few days, and seeking relevance/importance. What you end up with is possibly having sets of news that is outdated, being more relevant than current news, or an item that doesn’t fit the users interests.

Some real quick techniques, to boost relevance involve chronological data, though it’s not as necessary in the later stages of relevancy, it plays a huge role in cutting the set down to size for analysis. Here are a few methods of using chronological data for quickly sculpting more relevant information.

  • Create a window, static or sliding, this helps capture and condense echo. (48-72 hours is good)
  • Over a period, larger than your initial window, you can remove stale items, by comparing condense sets, that are on the topics that are more current.

Chronology is an extremely quick and dirty tool, but it can help tremendously, in narrowing the items down quickly. So that the data that needs to be processed for each user is much smaller, however it is far from the be all end all of the process for determining relevant data. The list of other items for determining importance or relevancy:

  • An external source weighting similar to PageRank — allowing high-value content to be controlled by peers as well as sharing it’s clout. (Source-Data relationships)
  • A personalized weighting based on your relationships similar to EdgeRank — allowing your personal interactions to show trust and interest in items. (Human-Human relationships)
  • A personalized weighting, based on your habits, and usage of various items similar to APML — allowing your content usage to be analyzed and weighted. (Human-Data relationships)
  • An aggregate weighting of both Edgerank & APML, to determine, weighting of topics based on human relationships and habit comparison, an idea like GAP(ML) — allowing your common interests and friendship to expose a more complex set of relevant data. (Human-Data-Human relationships)
  • A set of common related data carriers, and user relationship with his sources, somewhere in between Edgerank and Pagerank — allowing quick analysis and overview of sources, to determine what is important currently, as well as what is important among the sites the user trusts. (Source-Source & Source-Human relationships)

Those are the main relationships, in my mind, though there are a few others, such as media relationships(text, audio, video) not all people prefer the same type of information format. Tools that the user is using, you may want to provide data to the user in a different manner depending on how and what they are using to observe the data, or where they observe the data. If the user is the key, then the relationships and objects around the user are most definitely the teeth, and you have to hit as many tumblers as you can, without getting stuck.

These are just the ones I’ve focused on, and I’m sure their are others just as value that I have skimmed over, but this should provide a good, base for starting out, and there are probably a million little tweaks and touches that I skipped.  To quote Garry, because he was right.

The field is still a bit wide open because few people have both the dataset to work and test on, AND the financial backing to see the project all the way through.

We will definitely push the boundaries over the next few years, and we’ll have a better order for our information, I have no doubt in this. However, I’m betting many individuals will still rely on the very simplicity that we rely today: Chronological ordering. And even if it isn’t shown that way on the surface, deep down, it will be at the very base of relevance. Some under-the-radar companies, in this area, are and BagtheWeb — bundles are an excellent source of the information required for relevancy*, Quora and Stack Overflow — Q&A is a huge resource for the personal interests, Cadmus, Hunch, and My6Sense — all have experience working with this, and I have no doubt all three will only get better.

*- I may be biased here, I’ve been working on and off with bundling, for close to 2 years, but ultimately got lost in perfecting how the data was stored.