Chronological Data’s Influence On Relevancy Analysis

Posted on December 18, 2010 by James Fuller

Last night , I read Content should be experienced by relevance and importance and interestingness, not chronologically, after Louis Gray shared it on Friendfeed. This is my take on chronological and other relevancy metrics. While, I agree with Garry on the outcome; I don’t want to shrug off chronology as lacking importance, is a great tool for weighting an objects relevance.

Chronological data is a very nice weight to have when you’re looking at the whole set of objects. Chronology isn’t as important for idea’s or thoughts, as it is for news, but when looking at the subset of most actual news, it is much better. Cadmus succeeds because it focuses on a small chronological window, or at least it does in my case, showing me the most relevant items in the last 24 hours, as well as a focused source of input, Twitter. Twitter, and most social tools, have high-entropy in relevance over a sustained period, so that if you really want to provide relevant or important information, focusing on what has happened in the past 24 hours is a great idea. Unfortunately, the web isn’t just Twitter, Facebook, or other tools, there are thousands of blogs, and news sources that are also relevant.

I’m going to use the example of a feed reader, considering that’s where I was focused when looking at these things, and thus have more insight into the discussion at hand, in that area. Trying to determine relevance on items, particularly news, doesn’t work so well when you’re focusing on a set of items you’ haven’t read, over a period of weeks, months, years, or even a few days, and seeking relevance/importance. What you end up with is possibly having sets of news that is outdated, being more relevant than current news, or an item that doesn’t fit the users interests.

Some real quick techniques, to boost relevance involve chronological data, though it’s not as necessary in the later stages of relevancy, it plays a huge role in cutting the set down to size for analysis. Here are a few methods of using chronological data for quickly sculpting more relevant information.

Create a window, static or sliding, this helps capture and condense echo. (48-72 hours is good)
Over a period, larger than your initial window, you can remove stale items, by comparing condense sets, that are on the topics that are more current.

Chronology is an extremely quick and dirty tool, but it can help tremendously, in narrowing the items down quickly. So that the data that needs to be processed for each user is much smaller, however it is far from the be all end all of the process for determining relevant data. The list of other items for determining importance or relevancy:

An external source weighting similar to PageRank — allowing high-value content to be controlled by peers as well as sharing it’s clout. (Source-Data relationships)
A personalized weighting based on your relationships similar to EdgeRank — allowing your personal interactions to show trust and interest in items. (Human-Human relationships)
A personalized weighting, based on your habits, and usage of various items similar to APML — allowing your content usage to be analyzed and weighted. (Human-Data relationships)
An aggregate weighting of both Edgerank & APML, to determine, weighting of topics based on human relationships and habit comparison, an idea like GAP(ML) — allowing your common interests and friendship to expose a more complex set of relevant data. (Human-Data-Human relationships)
A set of common related data carriers, and user relationship with his sources, somewhere in between Edgerank and Pagerank — allowing quick analysis and overview of sources, to determine what is important currently, as well as what is important among the sites the user trusts. (Source-Source & Source-Human relationships)

Those are the main relationships, in my mind, though there are a few others, such as media relationships(text, audio, video) not all people prefer the same type of information format. Tools that the user is using, you may want to provide data to the user in a different manner depending on how and what they are using to observe the data, or where they observe the data. If the user is the key, then the relationships and objects around the user are most definitely the teeth, and you have to hit as many tumblers as you can, without getting stuck.

These are just the ones I’ve focused on, and I’m sure their are others just as value that I have skimmed over, but this should provide a good, base for starting out, and there are probably a million little tweaks and touches that I skipped. To quote Garry, because he was right.

The field is still a bit wide open because few people have both the dataset to work and test on, AND the financial backing to see the project all the way through.

We will definitely push the boundaries over the next few years, and we’ll have a better order for our information, I have no doubt in this. However, I’m betting many individuals will still rely on the very simplicity that we rely today: Chronological ordering. And even if it isn’t shown that way on the surface, deep down, it will be at the very base of relevance. Some under-the-radar companies, in this area, are Bit.ly and BagtheWeb — bundles are an excellent source of the information required for relevancy*, Quora and Stack Overflow — Q&A is a huge resource for the personal interests, Cadmus, Hunch, and My6Sense — all have experience working with this, and I have no doubt all three will only get better.

*- I may be biased here, I’ve been working on and off with bundling, for close to 2 years, but ultimately got lost in perfecting how the data was stored.

Budgeting on Variable Income

Posted on December 13, 2010 by James Fuller

The three main things that need to be taken into account are:

Definite Expenses
Estimated Income
Savings required to cover deficiencies between Income & Expense

Defining Expenses

Write down all non-variable monthly expenses (e.g. Rent, Utilities, Insurance, etc.)
Estimate all recurring variable expenses for the month(e.g Food, Fuel, etc.)

Add these together and multiply them by 1.10 -1.20 to provide yourself with a buffer in case of any upward fluctuations of variable expenditures. Excess at the month should be saved or split between discretionary spending and savings, with only 10-20% going to discretionary, at most, the remaining 80-90% saved.

Estimating Income

Average your past 6-12 months of income. Avg. Inc. = (Total income/months)
If possible also average your highest and lowest monthly levels of income. H&L = ([Highest + Lowest]/2)
Average both of these numbers to come up with a good estimate of your monthly income. Est. Inc. = ([Avg. Inc. + H&L]/2)

* You can also add in quarterly averages, if you’ve seen a recent change in your income up or down, such as change in employment status.

Savings

Take 10-15% of your estimated income and try to save it, and any excess after your budgeted expense, for handling monthly deficiencies. You can use a portion of this as discretionary spending, or rainy day fund, to help maintain your personal happiness. Also, set a baseline for a buffer in your bank account, which you can check against, say $200 dollars, and you can raise this as you progress.

*You don’t have to place it in a savings account, the interest rates are horrible currently anyways, just so long as you try not to overuse these funds.

These techniques aren’t perfect, but they do provide a very good starting line for determining how much you can reasonably spend per month. I used these basic ideas as a set of tools, slightly modified for my personal usage, and managed to pay off $1800 in student loans and save ~$1200 on a variable income that ranged between $500-1400 over 13 months. My average monthly expenses were $400 and my average monthly income was $750. I also managed to maintain a fairly consistent spending, about $50 month, on entertainment.

Ordered Networking: 4munity/hIphS

Posted on December 3, 2010 by James Fuller

As I feel like it’s good to recognize your failures and look at what went wrong, here is the first of several posts on some of my failures. I’m posting these for two reasons, storing my failures and lessons, but also my ideas, however loosely bound they may be.

Date: December 2007 – September 15th, 2008

Core Ideas:

Limited number of relationships based on Dunbar’s Number(150); Segregation of various groups (e.g. Work, Family, etc.); Focus on forums for communication; Making an environment, extremely unfriendly to spammers.

From the notes:

Privatized Comments – Scalable conversations 1:1 – 1:100 conversation, allowance of publication by Owner [commenter]; Features: OpenID Profiles – Tweet Threads – Forums – API Integration – Collab Napkin Interface; Ordered Network – 150 Friends Max, (later laxed to 200) 149×150(22350) 2nd Order, 149x149x150(3.3million) 3rd Order; Access: Friends – View Profile, comment, message; 2nd – View profile w/o comments, message; 3rd message; AJAX Threading – Personal Styles (Pre-Designed Offerings); Mobile entry coding: (P[post]/R[read]) GGTT [Group/ Thread Depth] For new thread posting location; AL Appends Last Message (within 15 minutes.)

Background:

I got really sick of Facebook, had already deleted 2 accounts, and barely using a 3rd, Myspace, and forums because of security and overall interaction present on those platforms. The information I saw was spammy or non-relavent to me. I was more interested in finding a way to maximize value of relationships, and communications.

It started off as just an interesting piece of forum software, and then evolved into using relationships to promote and control the relevance of the data, and collaboration. Unfortunately, I was utterly clueless about what I was doing, and went about using Java to build the interface, and text files for storing relationship data; I had no clue about true databases at this point. The whole database was a set of folders and text files, what could go wrong. Yep, not much else got done, except for a barely working version of the napkin, and it definetly wasn’t as functional as I would have liked.

Ultimately, I fell in love with Twitter, and decided to walk away from the project. Not before posting, a blog post detailing the basics fo the service, which I’ll post at the bottom. Overall, I was so disturbed I tossed most of the code out immediately, and have sense thrown most of the other pieces from here and there out as well, even the original logo, which I would have liked to added to this post.

Lessons:

Layout a solid plan, and pick your tools wisely.

Don’t try to do everything, too much, means it takes forever to get things done.

If you need help, ask for advice from people you trust.

If you lose interest before you’ve even completed the project stop throwing more time into it.

Sep 05, 2008

What is hIphS?

A lot of people that have found this place are probably wondering what is hIphs. So I’ll give you some back story on what it is and why it needs to be. hIphs is something that I am currently developing to remove the problems I have found with the Social networking area. It is here to confront spamming, promote relationships, and help people collaborate on projects. It is a support group, a conference for you team, it is a place to connect with your family, above all it is here for you to use. I see it as a social experiment at this stage as of some of the boundaries I have set to force all of the goals, layered networks, personal forums, and above all a 200 person limit for friends, family, and partners.

I feel that If you can force the limit constraint that it will promote relationships and (it has just begun to be implemented in lighter extent on digg and twitter) to reduce spam. I am also working on a napkin interface for collaborative work that would allow you to upload files to share during a conversation, along with and IM client and a whiteboard, all to open the web to truly interactive collaborative projects. It is also to my interest to at some point in the future to allow saving of this collaborative event and allow you to provide it to clients or share it in house. The one other aspect that I’m working on is to provide a threaded forum based twitter like threading that you only receive the threads of your friends and you can keep the thread going with someone who is mutually related to your friend, so long as your friend is the one who began the thread.

Why the name hIphs and why the odd capitalization?

This is actually one of the last names I came up with for the site as my favorites were already taken. The name is based on several levels: First, is the likeness of a closed social network to that of a bee hive, you and your friends are more productive when your dealing on a trust based system. Second, is what you get when you split the word apart “Hi Phs” which I came to symbolize as Hello Friends. Third, I feel that the site will be providing multiple ‘I’nternet ‘S’ervices in the future, thus, the emphasis on those to letters. That’s my description on what hIphS is and why I chose the names. Thank you for reading this if you somehow found it, at this

The Innovationist

Irregular Ideas on Business, Philosophy, and Tech

Monthly Archives: December 2010