Gnutella protocol negotiation, Proposal, Jukka Santala, donwulff@nic.fi, 12 May 2000

Revised 14th May: Tried to make message-passing instructions slightly more clear.
Revised 15th May: Packet->message, HTML:ized, added max-hops
Revised 16th May: Added consideration about routed hops, FAQ...

This proposal outline has been written with a number of goals in mind. First, many low-bandwidth clients connect to the Gnutella network, and would much prefer for the host they are connecting to do the elementary filtering of messages forwarded to them. For example, currently there are 25k messages of incoherent function-ID's flyign around that totally kill the performance of a modem-conncected client. Secondly, the GnutellaDev group currently recommends to drop clients that send more than 20 unknown function-ID messages; this is a position that simply cannot hold on the long run, as it totally kills extendability and chains Gnutella protocol to the stone-age. Yet it's clear that some protocol-management is needed.

So, what we do, is invent a new message type to negotiate protocol details. Negotiating is perhaps not so much the right word here, since in the GnutellaNet's spirit of sometimes creative anarchy the idea is for each client to simply specify which kinds of content they want to be delivered. Here's the brief outline of a content-request message. This uses the same conventions put forth in other Gnutella protocol messages, so...

0 Number of content-identifiers.
1+ There are N of these, see the number above.
- 0 Function ID
- 1 Function descriptor
  - 0x00 Broadcast
  - 0x01 Route
  - 0x02 Echo
  - 0x04 Notice *
  - 0x08 Random
- 2-5 Minimum payload size
- 6-9 Maximum payload size
- 10-11 Rate limit
- 12 Max-hops
- 13-15 Reserved

Rate limit is expressed in terms of messages per second of that type; a client SHOULD NOT be forwarded more than that. 0 means unlimited, should it ever be required. The reserved bytes should all be set to 0. Minimum and maximum payload sizes are specified like payload length in the message header, and indicate the parameters within which a message SHOULD be forwarded to the link which sent this content-request message, inclusive. Here, 0 means 0. To indicate that no messages of given type with any sort of payload are to be forwarded to said link, for example, set both values to 0. To accept no messages of a type at all, don't include a record for that function-ID.

Max-hops field is the preferred maximum Hops field to accept, altough a servent MAY establish a lower standard. Since the currently published specifications and validation recommendations aren't clear on exactly when the Hops or TTL are to be measured or set one SHOULD be careful with this value. However, the Max-hops field here refers specifically to the value of the Hops field as it would be right after receiving at the target link, and the value is considered inclusive again ie. a value of 255 means send any. The max-hops for a routed message SHOULD always be same or higher than for the equivalent broadcast message, when known. Messages which didn't get sent due to any of the limits do not count towards the rate limit. If a content-request message is received from a link, any messages with a non-listed function-identifier are to be dropped.

The content-request message is not replied or acknowledged in any way; if the client supports content-negotiation, it will send such a message of its own and start obeying the received instructions. The message is not passed on anywhere, altough the client MAY decide to change its content-requests to its other links. This SHOULD be done with caution so as to avoid DoS attacks or totally messing the system. But the client may at any point during their connection change its mind and send another content-request message. Each function-ID SHOULD appear in the message only once, and if it appears multiple times, behaviour is undefined. Same applies for request-lines with multiple function descriptors set.

After receiving the content-request message, it would be helpful for the servent to parse it's content into a 4k "routing table" cntaining the instructions for each message type at it's index position. From the point of view of the servent receiving the content-request message, the recommended message-passing behaviour is as follows:

If the originating link has Notice flag set for said message-type, send a copy of the message back to the same link.
Send copy of message to any link that has route-flag set for that message type, from which a broadcast message with matching Message-ID has been received.
If any link other than the originating one has Random flag set and Message-ID is unique, pick one at random to send a copy to.
If Message-ID is unique, forward message to any link that has Broadcast flag set for that message type, except for the one that it was received from.

There is no action required for "Notice" message-type, and it is thus recommended that these content-requests never be used. A message SHOULD be forwarded to a link only if it fits within the size and rate parameters specified in the request. Messages failing to obey these limits MAY be considered bad messages. It is not neccessary for a servent to implement any other message-passing conventions beyond Route and Broadcast. No message-passing conventions SHOULD be assumed by default from a link that such a request-message has been received to, and it is perfectly valid for a link to request not to even be sent content-request messages.

A client receiving a Random message it has already seen (as may occur in a redundant hierarchy) MAY send it with no change in TTL and Hops back to originator, which should then find yet another random link that it hasn't tried yet. As with broadcast-routing, if there are no available further links, the message is discarded. Any servent honoring Random routed messages SHOULD be capable of robustly handling this situation. Remember it's a duplicate for the originator too, so this needs some thought!

Now, all that remains to be decided is the function identifier for the content-request message itself. I'm going to be bold and suggest 0x41 as that will be left "free" from Push request and has LSB set, and thus will hopefully not be broadcast. It really should be 0x44 to follow logic, but after this message that logic is no longer neccessary. The message TTL SHOULD be set to 1 to make sure that the message won't go further than the first link.

Altough this proposal relieves some of the pressure on the Gnutella extensions, it must still be underlined that due to the limited number of function identifiers and the distributed nature of Gnutella protocol, all function identifier assignments SHOULD still be subject to oversight and at least semi-central adminstration to prevent collisions. The "Reserved" designation of the bytes in the content-request message really does mean reserved, and they SHOULD NOT be taken into use without my approval to keep things from getting tangled.

FAQ

Are you trying to shortcircuit GnutellaNG work?

In a way, yes. I finished the first version of this proposal even before the deadlines in GnutellaNG, and they're still apparently to even totally decide what workgroups should exist. Yet it is clear something like this is sorely needed NOW, and not once all the talking is done. However, I sincerely hope that GnutellaNG will ratify this proposal (Possibly with improvements) for it. Since I've been working on it beyond their submission deadline, and all their workgroups are going to be decidedly closed and private, I doubt that though.

So why did you start writing it anyway?

One of the motivating factors for this proposal was that GnutellaNG's charter says it exists to extend Gnutella protocol and give recommendations to direct the development of current GnutellaNet towards that direction, but the current GnutellaDev recommendations limit any kind of extension for example by ordering all unknown message-types dropped. When I asked about this discrepency from GnutellaNG's team manager, he simply said that it was prohibited to extend the existing Gnutella protocol. I wonder how this fits together with the concept of open source and open standard, and how those who have already extended Gnutella feel about it.

Doesn't this lead to huge bandwidth waste and explosion?

Quite on the contrary. Currently, the large majority of Gnutella clients is simply broadcasting every unknown packet they receive, which is leading to huge waste of bandwidth. When dropped at all, packets are only dropped after having already been received. Extension of the message-types/functions is going to happen in any case, this extension-protocol proposal just ensures that those that don't want to deal with them don't have to and provides some level of insulation between conflicting protocols.

Isn't the lower-nybble routing type indication good enough?

In a perfect world it would be, but we have already redefined PUSH as a routed message breaking this coding, so it alone couldn't be used. Note that PUSH is anomalous message in other ways as well, and admittedly requires handling outside of the protocol considered in this system. All new message function-types should still preferably follow the same numbering scheme, or have same lower-nybble (4 bits) as the function ID in the content-request message.

Can I extend the content-request message?

No, not as it currently stands... In fact, I have pretty good idea of what I want to do with it currently already. I didn't want to make it more complicated than needed, though, so it isn't part of the current spec, or I may find another required use. According to the plans byte 13 will be a "version flag" that will be taken into use _if_ some idiots start causing protocol-function-number collisions with their extensions, and can essentially subdivide the network propagation to sections using same propagation. The current protocol version is #0. The byte 15 would have the function-ID of the message this message-type should be routed by if byte 14 is set in a routed message, and byte 14 would give index to the Host-ID in that packet if PUSH type exception ever became popular. Now do you see why it's just "Reserved"? ;) Besides, I'm still waiting to see the recommendation on current message routing before I could finalize that...

I just want to leech, can I disable all routed messages?

Well, no. First of all, if you disable routed messages, you will also lose all query replies sent to you, and will never see anything to download. Secondly, you should always accept routed messages for anything you've forwarded a broadcast for, unless the messages are seriously broken, because routed messages are very low bandwidth compared to broadcasts and sent along only that one route. This means that the max-hops on routed messages should be at least as high as for the matching broadcast message, and the minimum and maximum message sizes should follow the GnutellaDev recommendations, or be 0x00000000 and 0x0000ffff respectively. If you're operating strictly as leaf, you can slip from this recommendation, or if you stop forwarding broadcast messages between your links, but a client which doesn't fulfill these request recommendations could be considered broken and dropped.

I'm working on variable sized messages, and...

It's possible to use this protocol to validate variable sized messages at well, in theory at least. But it's so complicated I haven't made it part of the present protocol, as it should be more or less compatible at any case, due to the additions being made to the higher-order bytes of the upper bound on message size. If the upper 16 bits of the "maximum size" are used, it will instead specify index in the packet (in upper 8 bits) and the maximum multiplier (next 8 bits) at that index of blocks sized exactly the number in the lower 16 bits added on top of the "minimu size" for validating variable size messages with fixed size "chunks". Got it? Again, reason why I didn't include it in present protocol ;) Besides, only the content-request message could currently be validated using this. In case of it, for example, minimu-size would be 0x00000001, and maximum-size 0x00ff000f. I would advice against the use of this form, though, as it requires unwarranted work from the sending servent. This is a good reason to never use upper bound higher than 0x0000ffff though.