An alternate approach to routing. - comments to nravi@operamail.com

Last update: May 04, 2000

in short: let's record the packet's route, so we know where to send a reply.

guess that line really says it all, but... well...



The current gnutella protocol route _most_ replies by the packet-id of the response, which is a good approach, with maximum security for a very small memory-tradeoff. What characterise these services however, are the fast response times. Both ping requests and queries are processed automatically, and the reply is back for the server to route in a couple of seconds at most. This means the entry can be dropped from the routing table, say, 10 sec. after it arrive.
Such a routing table would work by replacing the oldest entry whenever a new one arrives. Thus to hold an entry for 10 sec., it would need to be large enough to hold an entry for each packet it sees in that period of time.

This is no problem if we can expect to see the reply within 10 sec, but say a certain amount of user interaction is required before the packet is sent, like for a push-request. For a push-request to be sent, the user must first select a file to download amongst the matches from a query. Then there is an attempt to establish a direct connection, but this times out, as the servant is behind a firewall. _Then_ the request is finally sent, possibly as long as 5-10 minutes after the query-reply passed through the network. To route that packet with the scheme detailed above, you would need a routing table with an entry for every packet seen in the last 10 minutes... now that is a waste of memory!

Gnutella/0.4 solves this by allowing you to route push-requests by host.. Every host has an unique identifier which is included in it's query replies. By keeping a separate routing table for the hosts it has seeen for the last 10-20 minutes, a server is able to route push-requests in the right direction. I imagine it's not very scalable, though, as backbone-servents will need a pretty large routing-table to accomplish this. What is worst about this scheme though, at least in my mind, is the compromised security. All query-replies you send out contain a unique identifier that point back to you... well... it's not that big a security issue, but....;)

No more routing-tables

ok.. what we do is simply to record the path for each packet as we pass it on. I suppose no client will ever have more than 256 different gnutella connections, thus we can use a byte to represent each.
When you receive a packet, you first determine whether it is a broadcasted or routed. If it's a broadcast you record the path in the reply-to-field. If the packet is routed, the packet-id should be unused, so we can record the path in that. (There is no need whatsoever for unique id's on routed packets - they are never duplicated, and should be dropped *only* when TTL expires; or if the next host on the path disconnected...).

Rewrote this part, as it was "strange and difficult to follow...":) - may not be a lot better now, but at least i've made an attempt. ;)
To record routing info in a packet, move each entry one step back, discarding the last one, then insert the connectionID as the first entry in the field. To route a packet, pull out the first entry from the reply-to-field, move all entries forward, then send the packet to the connection identified by that first entry. You should also insert some random value for the last entry, making it harder to determine how many hops are left...

Pros/cons

+
simple
no routing-tables
the path remains legal as long as none of the hosts drop out - unlike for routing-tables, where an entry is dropped once X new messages are seen.

-
not quite as flexible as routing with a table.. - impose limits for max TTL / max nr of connections
i guess we could use a path-recording-field that's not of fixed size to overcome this, say tagging the routing info to the end of the packet or something, but i kinda like having all info needed for routing in the header, and i consider fixed-size packet headers the only way to go. any comments on this?

Security

I suppose it would be extremely hard for anyone but the host that inserted an entry to get anything meaningful out of it. ;)
To make sure noone can tell who sent the request for which the path is to be recorded, the route-field could be filled with random values before you send it on it's way / insert a random value in the last position when you pull out the first.
Also, to be absolutely sure it's impossible to get anything meaningful from the path-list, a host could use a different set of identifiers for each connection it has.

Possible modifications

We could do with just 4 bits to represent each entry.. This would mean a limit of 15 connections/client (2^4 = 16, but at least 1 value is needed to indicate that a packet is intended for self). With 8 bytes for the packet-id, this scheme would allow us to store a path of 15 hosts, which should be more than enough no matter how GnutellaNG turn out..

An 8-16 bit owner-field could be included in each packet. This would be copied from request to reply and give the client an additional field for deciding if the packet is the reply to some request it actually posted, or just caused by a routing-error.
With 4 bits for each connection and an 8-bit owner-id, the chance of a client processing a bad message is 1/16 * 1/256 = 0.00024414. For a 16 bit owner-id, divide that by 256... Adding another byte will allow the client to differentiate between it's different requests as well, if that is needed.

As pointed out to me, this scheme WILL mess up once connection-ID's start being reused. This is would hardly be a problem if we use a whole byte for the ID, but it could mess things up with only 15 unique values... ok... so the packet never reach the host it's intended for, but rather dies as the TTL expires... A routed packet is never duplicated, so with a TTL of 7, i'd say there is an extra load of 3-4 packets (average of TTL/2) * 100 (about 100 packets being routed through that host?) to the gnutella-net... I suppose the gains from killing those routing-tables could be worth this additional load on the bandwidth? And if we stick with an ID of 8bits/host, it shouldn't be a problem at all...?