Anne's Ark

Navigating the information flood without a leader.

The following is a preliminary draft dated 6-3-00.

Premise:

People have computers. People have internet access for their computers. People have files on their computers. People would like to share their files:

with other computers they own
with other people they know
even, with other people they don't know

and the internet makes it easy to do this, if you're a net-nerd.

More and more people use computers, and the internet, without achieving high-nerdom. Software that allows them to easily share their files is well received, e.g. Napster, Scour Exchange, and Gnutella.

Napster has relative ease of use, Scour Exchange has relatively powerful search capabilities, Gnutella has a decentralized architecture.

Anne's Ark implements the best of all three.

Ease of Use:

Download the install program, run it, select all the defaults and start the app.
The first time it runs, the app setup can automatically scan your computer for files you might like to share, some check boxes to adjust, then one click to do the dirty work. Other check boxes indicate your areas of interest, to speed searches.
The program can automatically connect you to the "General Public Net", or if you know the website of a "Netpost", you can browse to there and click on it to join that net.
Now your files are being shared, and you may initiate searches to find files of interest to you, just click on them to copy to your computer.

Powerful Search Capabilities:

Two things influence what you find:

Where you look
How you look

Your Special Interest Groups are defined when you start the program, these are used to connect you with people with similar interests, starting your searches there yields quicker results.

In addition to searching by filename, deeper searches can peer into the files to extract the author, artist, abstract, or full body of text.

Anne's Ark can sail in many seas of information, public and private. A business or other group can set up a private sharing network to share their files amongst themselves without incurring any overhead from the "General Public Net."

Web-mirroring allows Anne's Ark servers to "post" valuable network information (links to known users, frequently accessed files, etc.) to an ordinary HTTP server via ordinary FTP protocol. Even when the user's computer is shut down, the most valuable information (at the time of shutdown) is still visible via the "webpage."

Decentralized Architecture:

Internet Protocol (IP) is a decentralized architecture, but the search engines and other structures built on it mostly depend on most of the servers being up most of the time.

Personal servers will go in and out of the net in a very dynamic fashion, with downtimes exceeding 90% in many cases, as opposed to less than 1% for most commercial web servers.

Mirroring on commercial networks is a viable possibility (Scour Exchange works closely with idrive.com), and can be used by Anne's Ark servers as well, though this is not the primary focus of Anne's Ark, since personal computer storage space is measured in Gigabytes, while most readily available (and cheap) commercial FTP sites are limited to 50 Megs or less. 50 Megs can hold the text of an entire Encyclopedia, or JPGs of every Playboy Centerfold ever published, or MP3s of a single LP album, or about 5 minutes of video. Computers and their owners are moving up the media food chain quickly, and they are going to need to store things on their own hard drives.

Gnutella is based on a VERY simple protocol, with some inherent flaws - it will be impossible for the current Gnutella net to grow very large (10,000 users or more), and continue to allow all users to effectively search the whole net.

The primary function of Anne's Ark protocol is to support searches of member's local computers. A secondary layer searches additional data which is kept on "always on" HTTP servers (webpages).

Searches are used to:

Find a good place to connect to the net
Maintain an efficient network topology
Actually look for things, too.

The only purpose of the net is to propogate search requests, and possibly serve as proxy between firewalled servants. A typical search proceeds as follows:

Node injects search request into the net
Search request is flooded out to all interested nodes
Nodes which have something that fits the search criteria respond directly to the searching node.
The searching node evaluates all responses (within a reasonable time) and acts on the most attractive response(s), again by direct contact.

Firewalls put an interesting wrinkle in the concept of "direct contact." Automatic detection of firewall presence, by exchange of direct pings, can be used to select the necessary protocol (push, pull, proxy), and avoid bothering the user with such nerdness.

Social Issues:

Anonimity is not guaranteed with Anne's Ark - every search request has a return address attached to it. Users, at a minimum, will be advertising their current IP address. Some users may wish to go further than that, and publish a screen name, or even a full real life bio, depending upon the community they are interacting with.

Regardless, when search results are returned, the searching user gets the respondant's net address. Both searcher and searchee may chose to keep this address information for future use, even to go and check out what else the other is sharing on the net.

A "lurker" might log on to the net, share no files, respond to no search requests, and just go check everyone elses' files out.

A bad actor might flood the net with bogus, or simply annoying, search requests. A particularly bad actor might pose as a routing node, then intentionally drop, garble, or otherwise disrupt legitimate search traffic.

IP addresses can be banned from the system. Using cryptographically authenticated commands, a particular IP address can be shut out of the system quickly, and entirely. This type of power obviously can cause quite a bit of trouble, as well.

Automated "abuse monitor" algorithms can watch for nodes that aren't playing along nicely, and temporarily suspend them.

If "spoof" software becomes a problem, cryptographic authentication can be used to slow down the spammers. Keeping the keys from the spoofers will be a challenge, but might slow them down a bit.

Private Nets:

The protocol provides for cryptographically authenticated, and even secret payloads. In the "General Public Net", authenticated packets might make sense to reduce address spoofing, but secret payloads really only have application in private nets.

A private net might choose to encrypt its search traffic (and file transfers) for reasons of security. All members of the private net would possess the "net password", allowing them to make sense of the packets, while outsiders cannot. Lurkers wouldn't be able to tell much about an encrypted private net, except perhaps the level of activity. Spoofers wouldn't have a chance on a private net, unless they somehow got the key. For this reason, secure redistribution of key changes to select members (all but the spoofer) is necessary.

There are few, if any reasons why the public net should carry private traffic, since the private net members can contact each other directly through IP.

Private nets don't need to encrypt their traffic if they don't want to. They can form with limited, or no, interaction with the public net, and perform in the same manner, just exchanging search information locally.

HTTP: Mirroring and "Netposts":

Since individual computers come in and out of the network with great un-reliability, using the established and relatively reliable commercial network makes a good deal of sense.

Lists of last known contact addresses can be published on web pages forming a "Netpost" where contact with the net may be reliably initiated. Frequently requested files can also be put up on http: servers, and all of this can be done automatically by the software.

The final, and somewhat unattractive, upshot of this is that the software will also "like to" search the http: servers in the absence of their AA hosts, which is certainly possible, but rather inefficient unless the "content focusing" of user's self delcared SIGs proves useful.

As previously mentioned, mirroring and netposts can be considered a second level protocol, optional to the proper functioning of the primary Anne's Ark architecture.

On the other hand, properly done, these archives can automatically cross-link with each other based on SIG, automatically update to carry the information of "most recent interest," and do all sorts of other clever things - making themselves useful to people who don't even have the AA software (ala Scour Exchange.)

Implementation:

If some kind of Java client can be achieved - even as just a leaf node, it would do great things for cross-platform usage.

A "Strong" client (or servant, in Gnutellaese) program would have LDAP, file indexing tables, and all sorts of other heavy duty search tools, enabling quick and efficient searches of the local database. On the other hand, many local databases will consist of a small number of large media files - making strong search tools extreme overkill.

Stone Soup Net:

As new services are developed (e.g. local news announcements, calendar synchronizing and sharing, payment systems, etc.) they may be added to the Anne's Ark protocol seamlessly, operating with those nodes which understand the new services.

Being open source, anyone may then create services and software to access those services. There are some services (like realtime stock quotes) that should be handled by "the big boys" with their large, dedicated central servers - but there's no reason to run to a big server when you just want to chat with your friends.

World Domination:

There has been a lot of talk, maybe, maybe too much talk (Bono) about Gnutella being the "new paradigm" that will wipe out the current central server search engine model. Ummmm, maybe. Central servers embody a "familiar," trustable entity - and I believe people will still need that, at some level.

What the peer-to-peer model can do is put significantly more CPU horsepower to work in a distributed real-time search of what is available right now. This is especially important in the "home server" arena where we might anticipate upwards of 90% "server" downtime for any given server. But, given a a million servers overall, that still leaves 100,000 online at any given time - certainly a resource worth tapping into.