A Few Innovative Ideas for Short URL’s

Over the past 6 months or so I’ve seen numerous posts raise flags with the idea of URL Shorteners. Each of these focus on several issues, security issues, non-relative link titles, no pass through for SEO purposes, and the possibility of data loss. Each of the problems, have at least partial solutions, but these solutions are still not effective enough. Here are some of the modifications that I plan on working to enhance the service.

Pretty URL’s – Security & Relative Titles

Making the short URL as human-readable as possible is a plus, however, with the shorteners on the market, they are quite hard to get because every user’s URL is an ID that can only be linked to only once by the service. My solution to this is to embed user data within the link, this abstraction reduces actual URL location to 1-3 characters(b62 range, 62 – 238,328), and you can store the User data in between 4-6 characters(b62, 14,776,336 – 56,800,235,584) at the end. This means the minimal length required for a link is 5 and the maximum is 9. The benefits of applying the User encoding is that it provides the ability to parse the users links, along with any meta-data associated with the link, such as a secondary access to the url, via a user specific vanity title, e.g. http://examp.le/URLxUser = http://examp.le/SteveJ/apple and http://examp.le/XbUser = http://examp.le/LarryP/apple. The User encoding also allows the linking system, to be used as a quick account review if there is any suspicion of malware or spyware being sent by a specific source. One requirement of using the User encoding, is that you define the length that the User data takes up and where it is located. I feel that 5 characters(~1 Billion unique id’s) is optimal at this point in time, and that placing this in the very end of the string is slightly simpler to parse, but that’s just user preference. However, at no point can you change either of these choices without destroying the entire system of links that have been spread over the internet, so you must choose wisely before you begin.

Multiple-Links – More Data, Less Space

Allowing users to batch related content, reduces the total length per link to 22/n to 24/n, where n equals # of links. Applying link specification to the API will extend the length, but also make large batches more usable for sharing data, (e.g. http://examp.le/XbUser?link=1,3). Next topic of discussion for this is how to handle statistics, because regular statistics become a bit blurred by having the ability to access multiple links at the same time. The most accurate collection is only the inbound links to the page, outbound is much more complicated, as it is multiple-permutations on exit paths. The best that you can do is calculate clicks for links, and measure selection for the Open All button by counting all active links. One benefit to the multiple link structure is that it encourages users to become link curators, this provides plenty of data for machine learning, as well as providing associations that aren’t easily discernible to machines, such as what the user likes. It also makes the system an active aggregation center for real-time data. An example of a multi-link(Safari 4 has issues and will open windows instead of tabs) http://lnkr.hiphs.com/socialme

Data Storage – Open Access, Uptime, and Redundant Stores

After, stories such as Ma.gnol.ia’s data loss, Cli.gs hacking, and various services shutting their doors, link rot becomes a very big concern. So I’ve looked into various solutions and one that sticks out is based on work by Directeur for use in federated real-time systems, Socnodes, and the Oruboros & Lamaean Hydra problems that he had to solve. His solution to the Oruboros was using Atom Feeds UID’s with service title to allow the systems to check against themselves. The usefulness of the Socnode layout is you can store and update remote databases with your data, creating a remote redundant store, but also n-ary accessible domains. Assuming you use separate data storage sites, DNS’s, and build otherwise independent systems that operate with the same data in parallel.

There will be a point that URL’s aren’t going to be nearly as important and I see this as a step to reaching it. These steps toward ease of access, safely securing the data, through redundancy, encoding and embedded data, and review systems, and the ability to collect related and relevant data are steps in the right direction.