An Antithetical Post On How Narrowing Is The Key to Curated Data

So this whole thing about curation , has my head in a state, where I am seeing the data, meta-data, and users, as distinct entities in three-dimensional space. I’d love to provide an image of how they are related, but I can’t because when it comes to placing them in a 2-D or even 3-D state, there is warping and tunneling between these objects, outside of the third-dimension, to maintain proper relations.

Still here? Good. This post may be a bit vague, I’m going to try and keep it simple and understandable, for you as well as myself, I’m already a bit confused after several hours of trying to map this. If you would like to discuss this, for a more in depth, though possibly less coherent form, feel free.

To begin, we have three entities: data, meta-data, and users. These entities all have various ranges of relationship, which go from near to distant, and occasionally don’t exist. To describe the range as an example of friends, “Those best-friends, with very similar taste, are near(1), friends, much different taste(2), acquaintances, similar taste(3), acquaintances, different taste(4), and people you’ve never met(0).” We’ll approach range using this method, based on relational distance, between entities.

Data is, in my view, the front facing objects, whether that be text, images, video, or even tactile objects. Data itself exists in a weak presence, as far as to what value it represents, when coupled with meta-data, it becomes stronger.

Meta-data is data about data. It is the entity that is manipulated and understood, to provide us with relationship information, on any level. There are many forms of meta-data, temporal, location, authorship, topics, etc., that provide us with fantastic ways of connecting data, but often times it includes disparate entities, that aren’t necessary.

The user in my case is a human which interprets the regular data, and may create tags of meta-data, but can be a machine in which case it is likely to work with meta-data, either directly or in composition of meta-data from data sources.

Now that the entities are somewhat defined, I can get into the discussion of how these various entities are connected in creating relevant connections, both in basic terms, and user specific terms.

Often times, the simplest way to construct a relevancy map between data objects, is to use meta-data about the objects, social-bookmarking tools work this way by way of topical tagging, the distance between objects is the range of 4. Making the system a bit more complex you add methods, you take your tagged set, and add in user selection, by how much a user likes various items to manipulate what topics they are likely to see, this is in the range of 3 because it is still picking out items by topic which is a very wide. Or you can provide what your user’s friends have read recently, this is still in the range of 3, because by adding in what other people read, can narrow the area of focus, it’s possible to be in areas that the user doesn’t care as much for. If you add in what the user’s friends like, rather than just what they read, you get closer to the range of 2.

In order to get to the optimal range 1 you have to add two more things to your system: direct relations between data-objects and concentrated interaction between users, these can both be defined explicitly by users, and can be shown as a simple social-graph, with one object/user in the center, and the closest elements near by.  Direct-relations, which are somewhat like Techmeme, can be created on a broad scale by a user-based system of bundling links to content, based on relationship. Concentrated Interaction is a bit more complex, because it requires an analysis of interaction, but presents an interesting system, helps reach the range of 1.

Note: If you treat Users like data-objects, which they are in a database, you can apply meta-data, to make the concentrated interaction, more specific by what topics the user is most familiar.

So I’ve discussed 5 ways in varying levels of implementation to reduce the range of relevancy.

The use of tagging to create a quick reduction in the range of relevant data.
User selection to narrow down what topics the user likes, or aggregate content that the users friends are looking at.
Further narrow it down by what these friends like.
Allow Bundling of content that is directly related.
Analyze the concentrated interaction graph to narrow down trust sources.

I’m sure I’ve lost someone in this antithetical pile, as I had to get this off my head it was driving me crazy, and I’m going to call it the beginning of a new arcling, to be adjusted down the line. So if  you are interested, I’m sure that we can possibly make it a bit clearer by having a discussion.

The Future Of Privacy Is Full Publicy

Zuckerburg was right, “privacy was no longer a ‘social norm’,” being public is the new social norm, though most people will still tend to reject reality, even myself. I’ve finally gotten over about 90% of privacy issues, I might get upset by/at them, but even if there is something exposed, I’m preparing for it now. Anyone under the age of 21, within the US, who has ever used the internet has already lost their identity, so why should they worry, about what any company is exposing about them? It’s time to get over these feelings and accept the change that is coming, a ton of privacy isn’t worth an ounce of knowledgeable protection.

Just the other day, Facebook, proposed an update to their privacy policy to allow third-parties to have access to your data, some point in the future, and with this comes, yet, another wave of criticism, some. People are jumping all over Facebook, because they feel people will be paranoid that their data is vulnerable, and that their data shouldn’t be given out willie-nillie to just any third-party site that Facebook comes to agreement with. You would think people would be used to this type of position coming from Facebook, by now, this is their fourth or fifth slip up, but still people complain for a few months and then calm down, until it happens again.

Our most personal data in the US, social security numbers, is insecure, especially if you were born after 1988. The numbers can be defined through 2 data points, date & location of birth, and a little brute forcing. So for the younger generation, nothing is private, not even our government provided personal identification. If we aren’t protected in that regard, should we really be worried about those images from last weekend or who our friends are, what our opinions are? I think Eric Schmidt said it best, in an interview where he discussed privacy, “If you have something that you don’t want anyone to know, maybe you shouldn’t be doing it in the first place.”

I know I jumped on Facebook, but they aren’t the only sites that have huge inventories of data on their users, in hopes of adding relevancy, Google, Yahoo, Microsoft, et al. Facebook is the simplest site to jump on because of it’s repeated transgressions in the area. Google has faced it as well, though, when it didn’t take enough discretion in opening up their Gmail users privacy through Buzz. As the web keeps advancing, privacy options are going to be set to off on default, it will be up to the users to change the settings to keep themselves private, this has been called ‘publicy’.

Are you prepared for the next generation, the age of publicy? Are you ready to get dirty mucking around with settings to protect what little privacy, you will have in the future? Will you let everything go, and change how you interact on the web? These are questions that we will all face, but I think I’m prepared to be completely open in my environment when it comes to social matters, they aren’t anything compared to my financial information or my social security number, which can apparently be brute forced by a bot-net of 10,000 machines in ~1.27 seconds.

Update: Tyler Romeo’s latest post, Why I Dislike Facebook & Foursquare, makes a great point in contrast to the opinions I made here, I agree with quite a bit of what he has to say as far as respecting your users and offering secure protocols, to help protect your users. Take your time and go check that post out.

7 Tips To Remember During Human Interaction

We’ve all been had human interaction where we feel that we aren’t getting our point across. It is one of the most annoying feelings to feel you’re not being heard, or skipped over for no particular reason.  Here are a few tips that I use, on a daily basis, to have deep, meaningful, human interactions.

1. Listen First, Speak Later
If you aren’t listening to them, you have no clue where the conversation is going. If you don’t know where the conversation is headed, you don’t have a clue what you should say. You should hold your words back and carefully sculpt them to what is being said, that way you give credence to what the person is saying, even if you don’t agree with them.

2. Be Happy, Be Calm
You should never get upset in a conversation, because you will become short-sighted. If you become short-sighted you end up risking killing the conversation, or even worse destroying the relationship you have with the person. One thing I do, when I do get aggravated, is I pause the conversation. On the internet, I take a stroll through the house, before going back. In real-time interactions, such as over the phone or in person, I ask them to excuse me, to do something important or use the restroom.

3. Be Responsible
With great power, comes great responsibility. In a world that treasures the passing of knowledge, you wield the greatest power of all, your words. You should take try your best in making sure that what you say is accurate, and not offensive. If you do misspeak make sure that you remedy it, which leads to the next point.

4. Apologize Quickly
An apology might not right every wrong, but it shows that you understand you made a mistake. It is not an excuse to try to get people off your back, if you use it this way, you’re not being sincere. To truly apologize, you first have to state that you are sorry, then show proof that you understand why you wrong.

5. Be Accepting
Always be willing to accept someone’s ideas, even those you might not agree with. Being open to new ideas only leads to a more open and intellectually satisfying discussion. Acceptance is the first step in understanding something new.

6. Be Understanding
Once you have accepted external views, your next task is to step into the person’s shoes, as best you can, and attempt to understand what they are saying. Understanding what someone says makes you much more inviting to converse with, even if after understanding you point out where they have erred, which hopefully is reciprocated.

7. Offer Help
If someone is having a problem, that you can possibly help with, offer your assistance. In offering assistance, you have very little to lose, and much to gain, a new best friend, possibly. I’ve been through this cycle many times, and have made some very good friends by helping them when they needed it.

Here are a few bonus  tips for interacting in the physical world.

Smile
The best way to lighten the mood is to smile, you let everyone know that you enjoy their company. A smile is also a very attractive thing that can make you, and your ideas more appealing. This is the same as an apology, however, and if you aren’t sincere it’s not hard to figure it out, though it might take a bit longer with a smile.

Eye Contact
Eye contact is a great way to show that you are engaged with what the person has to say, and that you aren’t just shrugging off what they say. A few tips on eye contact, don’t stare, and occasionally break contact, for 1-2 seconds, to observe your surroundings.

Splitting the Web Markets

I’ve been looking into the web, trying to figure out what it’s going to look like in a few years. I’m still looking at various scopes, but I decided to analyze some of the more generalized markets that we have right now. You’re not going to find anything new here, just 5 areas of the web we will see changes in, and the coming monetization of the web.

Infrastructure = Hosting & ISP’s

Data Resources = Data

Data Access & Storage Protocols = API’s

Services = Applications that modify the Data through use of API’s to provide a value

Directories = Provide the ability to find what you’re looking for quite rapidly, can be pseudo-static or dynamic.

Each of these different markets can and most likely will be monetized within the coming years, most likely coming from the users themselves. Hosting & ISP’s have already done it. Directories that aren’t fully dynamic can do it with advertising, and even some of the dynamic real-time directories will be able to use the advertising model. The Data & DASP’s will be subsidized, for the most part, by the initial service’s charges, or possibly the service will be subsidized by external developers paying for access to the data, or just the data itself.

The benefits we will see is that our data is more stable, at least in the sense that the company isn’t going to go belly up, services should be better, and there will be more positions, hopefully. We all walked around expecting everything to be free, when we should have been asking how can we help make more services. Maybe the free world was just the accelerant for innovation to get the initial business models developed, promote an open generation, and allow everyone a shot at getting their ideas out there, it’s easier to pick up users, for a simple service, when you’re not charging them after all. The problem that we had with free is that we all became so jaded by it.

Focus on one of these markets and how you can change it. Each one is easily branched into another, you can traverse up or down that list from where you started. Look at Google, they exist in each of these markets. They started with a DASP that collected vast amounts of Data, then used initially used this data to create a Directory Service, along with quite a few other services, one of which is AppEngine which exists to share their infrastructure.

As the web evolves we’ll see these markets split and converge on each other time and time again, we may even see a new general market pop up. Just as an example of the splitting a market look at the services, there are so many sub-markets that exist within it that it would be hard to categorize them. For an example of convergence you just have to look at the various projects being developed to better connect the web, one of the most recent one’s to pop into my radar is Salmon, which is working to pull comments back to the original source and re-disperse them with the source feeds. Time to watch the ebb and flow, and maybe enter one or more of these markets.

Thoughts are Evolutionary: The Idea for Arclings

Do you really want to keep pushing ideas out, but have problems fleshing the concept out fully? Or maybe you just want to express the basis of an idea really quick, get feedback, and iterate. The problem with current systems is it’s hard to keep track of the evolution, if you post a lot of other stuff around it.

Micro-blogging lets you throw the idea out there, but doesn’t allow much room for the idea to evolve, or tracking this evolution.

Blogging in the conventional sense is much too concrete(though I’m doing it right now). I find the preconception of blogging to be you must push out a full thought. Why?

I propose a release quick, release often blogging structure and build arc’s as your story develops, making branching trees using link structures. Let the ideas build over weeks, or months, rather than waiting for one single burst of insight, and fleshing it out on the spot.

I propose using story arcs, along with links to the latest preceding events in the evolution, and trackbacks to the succeeding story events. Though this is possible in the current evolution of blogging systems, it’s complicated. I want an Arcling platform that makes the connection process easy, if not intelligent in managing the tracing of the structure.

A Few Innovative Ideas for Short URL’s

Over the past 6 months or so I’ve seen numerous posts raise flags with the idea of URL Shorteners. Each of these focus on several issues, security issues, non-relative link titles, no pass through for SEO purposes, and the possibility of data loss. Each of the problems, have at least partial solutions, but these solutions are still not effective enough. Here are some of the modifications that I plan on working to enhance the service.

Pretty URL’s – Security & Relative Titles

Making the short URL as human-readable as possible is a plus, however, with the shorteners on the market, they are quite hard to get because every user’s URL is an ID that can only be linked to only once by the service. My solution to this is to embed user data within the link, this abstraction reduces actual URL location to 1-3 characters(b62 range, 62 – 238,328), and you can store the User data in between 4-6 characters(b62, 14,776,336 – 56,800,235,584) at the end. This means the minimal length required for a link is 5 and the maximum is 9. The benefits of applying the User encoding is that it provides the ability to parse the users links, along with any meta-data associated with the link, such as a secondary access to the url, via a user specific vanity title, e.g. http://examp.le/URLxUser = http://examp.le/SteveJ/apple and http://examp.le/XbUser = http://examp.le/LarryP/apple. The User encoding also allows the linking system, to be used as a quick account review if there is any suspicion of malware or spyware being sent by a specific source. One requirement of using the User encoding, is that you define the length that the User data takes up and where it is located. I feel that 5 characters(~1 Billion unique id’s) is optimal at this point in time, and that placing this in the very end of the string is slightly simpler to parse, but that’s just user preference. However, at no point can you change either of these choices without destroying the entire system of links that have been spread over the internet, so you must choose wisely before you begin.

Multiple-Links – More Data, Less Space

Allowing users to batch related content, reduces the total length per link to 22/n to 24/n, where n equals # of links. Applying link specification to the API will extend the length, but also make large batches more usable for sharing data, (e.g. http://examp.le/XbUser?link=1,3). Next topic of discussion for this is how to handle statistics, because regular statistics become a bit blurred by having the ability to access multiple links at the same time. The most accurate collection is only the inbound links to the page, outbound is much more complicated, as it is multiple-permutations on exit paths. The best that you can do is calculate clicks for links, and measure selection for the Open All button by counting all active links. One benefit to the multiple link structure is that it encourages users to become link curators, this provides plenty of data for machine learning, as well as providing associations that aren’t easily discernible to machines, such as what the user likes. It also makes the system an active aggregation center for real-time data. An example of a multi-link(Safari 4 has issues and will open windows instead of tabs) http://lnkr.hiphs.com/socialme

Data Storage – Open Access, Uptime, and Redundant Stores

After, stories such as Ma.gnol.ia’s data loss, Cli.gs hacking, and various services shutting their doors, link rot becomes a very big concern. So I’ve looked into various solutions and one that sticks out is based on work by Directeur for use in federated real-time systems, Socnodes, and the Oruboros & Lamaean Hydra problems that he had to solve. His solution to the Oruboros was using Atom Feeds UID’s with service title to allow the systems to check against themselves. The usefulness of the Socnode layout is you can store and update remote databases with your data, creating a remote redundant store, but also n-ary accessible domains. Assuming you use separate data storage sites, DNS’s, and build otherwise independent systems that operate with the same data in parallel.

There will be a point that URL’s aren’t going to be nearly as important and I see this as a step to reaching it. These steps toward ease of access, safely securing the data, through redundancy, encoding and embedded data, and review systems, and the ability to collect related and relevant data are steps in the right direction.

The Twitter Tradeoff

Do you follow many people or few? This is the most essential question and most disputed aspect of Twitter, although it is also a huge part of other networks as well. I’ve been thinking about it alot the past month and  the answer is both depending on how you want to use the service. You can go small and extract alot of data and make deeper relationships or you can go big and funnel your relationships though they would be diluted.

Why go Small?

The main reason to go small is that you can stay heavily connected and have relevant data flowing constantly with out much noise in the stream. The system was originally designed for keeping track of friends so it makes sense to stay small. There are still problems with only following a few people and the main problem is based on the reciprocal friending that occurs on the service, if you are followed by someone they want you to follow them back. Having only a small group makes it hard to get a large set of advice and responses when you ask a question.

Why go Big?

The main reason to go big is to spur on the reciprocity that I mentioned above that allows you to poll your followers for answers. Also with the reciprocal reaction that gives you lots of followers it allows you to market yourself and your products to them. Another plus that comes with the mass friending is if your able to monitor and track the data that is coming through your stream you can pull out large amounts of focused data.

Now the downside of big is that you can’t build meaningful relationships easily with your friends based off of their tweets. You are opening the door to spammers by (auto-)following everyone back. It makes it harder to use apps because of to much data coming into the API for your user.

My Choice: Small

To me I’d rather have a large group of followers that I could ping off of but only be following a subset of them so that I can have a wealthy stream of information that’s relevant to me. To me I don’t want to have a lot of crap, I want to have valuable wealth inducing assets in my stream. It’s up to you whether you are marketing or there to extract information and build relationships to decide which path you want.

Note this is something that is equally applicable through the broad area of Social Media and it’s up to you. Twitter just takes this single aspect and inflames it in how their service is used making the way you use the service change based on the numbers. One site that has a similar set of changing data based on the numbers of friends & followers is Digg in that you have the ability to shout a story(currently being analysed for removal) to your friends to get dugg up.