Proposed Changes to gnutella protocol 04/15/2000
comments to bowser@cheerful.com
Purpose of Changes
While toying with gnutella, a work in beta, it became fairly obvious that
scalability becomes a concern. As its use becomes more popular, it must be
able to not break as users are added. The most effective way to keep the
software reasonably scalable is to avoid any type of operation in the protocol
that may be considered an "n-squared" operation, that is, the number
of steps required to support more users grows quadratically as more users are
added.
Proposed Objectives in Changes to Protocol and Client Software
- Priority #1. Reducing the amount of bandwidth needed to
support many users in gnutella, therefore allowing greater scalability.
- Priority #2. Providing a means whereby slow users (those with
modem connections) are not relied upon to relay information between parts of
the network.
- Priority #3. Keeping the network efficiently knit, so that
connections maximize the number of hosts reachable in the fewest hops.
Proposed Changes to Protocol and Software Rules
I will list these changes in order from most simple and obvious, to
increasing complication.
1. KEEP MODEM USERS FROM BEING ROUTERS
The network is heavily slowed down when bogged-down modems are relied upon
for propagating searches and results throughout the network. Quite simply,
modem users should not be expected to participate in relaying.
Forming the backbone of the network is a task best left to those with
high-speed dedicated connections (such as cable modem, DSL, or leased
lines). This does not exclude modem users from using gnutella, rather it
improves speed both for them and for everybody else.
The simplest solution to prevent modem users from routing, is to simply allow
them to make one outgoing connection to a fast site, and to not allow any
incoming connections. Thus they will become the outer "leaves"
of the network, and will be prevented from routing packets while taking
advantage of other people's fast servers.
In order to implement such a feature at a minimum, no protocol changes would
be necessary. However, with no protocol changes, a few drawbacks would
occur. The most notable one is that outside users who connected to a
"slow" computer would have to be turned down. It could be
frustrating if a user only knows the addresses of other "slow"
computers (perhaps he got them from IRC). It would be beneficial to allow
a response packet of "Go away, I'm not open for connection, but here's
someone else you can connect to [some random fast host]".
Protocol changes that would benefit this feature also include inserting a
"flag" in ping packets indicating whether systems are "fast"
or "slow", so that other hosts would know only to give out the address
of a "fast" server (and also not to attempt to make connections to
slow ones).
"Fast" and "Slow" would be configured by the user.
"Slow" would mean less overhead for the user, and would actually be
preferable if one wanted to be selfish. In order to keep "fast"
people from setting their systems to "slow" in order to be greedy, an
idea of a 10KB/sec file transfer cap for "slow" users (enforced by the
app) would take away the motivation of greedy users to set their system to slow.
2. GET RID OF THE PONG MESSAGE
Judging from the protocol, the Ping and Pong mechanism implemented in
gnutella serves two purposes:
- It is used to propagate totals of how many hosts/bytes/files are on the
net.
- It helps hosts locate other hosts.
Ping messages are efficiently distributed throughout the network, ensuring
that duplicates cannot occur (using the GUID). But one great disadvantage
is that the Pong messages are sent from each user on the net to each other
user. This is a process that grows exponentially as more users log
on. (For example, 2000 users means 2000x2000 pongs, or 4,000,000.
But 10,000 users now means 10,000 x 10,000 pongs, or 100,000,000 pongs.)
As the number of users grows, the network overhead increases considerably.
My idea is that PONGS are not necessary in the first place. This is my
idea for the replacement of the current PING/PONG mechanism:
- Get rid of the PONG entirely. Knowing if a host is able to reply on
the network is entirely irrelevant when doing searches.
- Put all of the information that's currently in the PONG, into the
PING. The new PING message would become: "Hi, I'm xxxx on
the network, and I am 1 host offering 330mb in 50 files". Such a
message would propagate throughout the net, and all the client recipients
would simply add that host to their tables and add the aggregated file
counts to their totals.
- Make Ping messages expire every 15 minutes (each host must re-ping every
15 minutes to tell everyone it's alive). Machines will automatically
cache the pings and expire them by themselves after 15 minutes (the 15
minute value would be part of the program, not coded into the packet).
While pinging several times an hour would seem like overkill, remember that
in one hour, 10,000 users * 4 pings = 40,000 pings, compared to 100,000,000
pongs is much better.
Now, there is one purpose that removing the pong does not address by itself,
and that is routing. But I believe I have a routing method that would far
outweigh the current one in terms of efficiency.
3. INTELLIGENT ROUTING
The current routing situation seems to be rather haphazard: a client
overhears packets that is routing, and uses them to glean additional
hosts. But the hosts connected to are rather random, and will usually only
reflect hosts that are nearby on the network, therefore making many
"locales" on a big network where they are really a collection of
little networks that can't communicate with each other.
The most important goal in creating an efficient routing table is to eliminate
wherever possible any situation where there is an unreasonable number of hops to
get from one single point to another. It is not simply enough to have
steady connections to the same "neighbors" all the time - the
connections are much more effective when they are spread diversely and maximize
the number of nearby sites with the least connections.
Think of this: If your host has five outgoing connections, and three of
them connect to people that are "near" each other on the network, then
these are wasted bandwidth. They'd be much more effectively used if they
were connected to hosts that were "far" away, in terms of the number
of hops necessary to reach the sites. This is why I propose the following:
1. Keep a small, manageable number of connections. By
this, I mean approximately enough to fit in a handful - four to six. More
may be permissible for faster reliable connections. (I don't count
low-bandwidth "slow" connections in this.)
2. Attempt to make connections to sites you know are far away.
If you have a goal of getting as many people as you can within 7 hops, then
connect to people that are 14 hops away. (When you do this, the site that
was 13 hops away is now 2 hops away; the site that was 12 is now 3, and so
on.)
3. Evaluate which connections are less effective (redundant) and set them
as the first ones to be eliminated. If you have connection A and B,
and you send packets out A and receive them back on B two hops and 200ms later,
then chances are there is over-redundancy here which could be eliminated and
used to strengthen a different part of the network.
In order for a site to know which connections are far away, there should be a
new form of routed packet that says "Connect To Me!". This type
of packet would be directed at a pinger by any site on the network which
receives the ping packet with certain hop values, say 8 and 14. Each
pinger would receive a deluge of "Connect To Me" packets from a whole
load of sites that are 8 and 14 hops away. The new host could simply pick
and choose at random, and suddenly gaps both small and large have been closed in
the network, since now all those sites are one hop away instead of many.
In order to figure out which connections are less effective, I propose the
addition of a new field in the ping packet, which I'll call a LUID (Locally
Unique Identifier). This can be a 32-bit quantity that is unique only
on the machine that issued it, and is used only by that machine in
evaluating redundant links.
The method would go something like this: Suppose you are a host with 6
connections, and you're sending a ping. You'll send the ping out on all
six connections. Eventually, due to routing, some of them are going to
come back to you (which of course you'll ignore).
If you attach a different LUID to each copy of the ping you send out (note
the GUID is the same because it's the same ping), your host will be able to
track the pings when they come back. If a ping comes back in less than two
or three hops, you know you have an over-redundant link and you know which link
that is by the LUID. (Of course, other systems would simply pass the LUID
from host to host without changing it or evaluating it - it is the sole property
of the machine that issued the ping.)
4. DATABASE CACHING AND STATISTIC AGGREGATION FOR SLOW USERS
Think back to proposal #1 where I suggested classing users as
"slow" or "fast". All slow users have the limitation
that they only have ONE connection to the entire cloud. But knowing that
fact allows you, as the up-linked "fast" host, to take advantage of
your monopoly on that client's connection and use it to save bandwidth and
increase speed.
Instead of requiring each of the "slow" people to participate in
the network as a whole, the "fast" person would become a proxy for the
"slow"one. In particular, it would do the following things:
- Cache the ENTIRE DATABASE of offered files for all connected
"slow" links, and handle ALL search requests from the network on
behalf of the slow clients. If the search algorithm were written well
(using b-trees or the like), this would not be a bad performance hit for the
"fast" person. There would be no need to propagate search
commands to the "slow" people, and people doing searches would not
have to wait for the "slow" computer to reply. The fast
computer would use a standard low-overhead ping-pong method to make sure its
proxy clients are still there, and would instantly drop a client's database
from the cache upon discovering a client disconnected.
- Aggregate (add up) all the PINGs for all the slow clients. Instead
of each of the slow clients joining the network announcing, "I am 1
host with 40 files totaling 34 MB", let the "fast" host
create one ping packet on behalf of the group. For example, "I
represent 9 hosts with 220 files totaling 185MB." Doing it this
way makes it so 10 people get announced on the network for the price of one
ping. That results in a tenfold bandwidth savings, and therefore ten
times as many people can use the same network in this situation.
Possible Issues And Solutions
Q1. If pings only occur once every 15 minutes, won't that mean that
it takes a minimum of 15 minutes for a newly connected host to see the whole
network?
A1. Yes and no. If a host were programmed to send its entire host
table to the new client, then it would be instantly refreshed with current
information. Even if it didn't, the only thing that would be affected is
the file/host/MB count, which isn't really critical anyway.
Q2. Won't caching other people's files cause inconsistencies in the
backbone?
A2. No. The only systems that will cache files will be
"fast" hosts that are already directly connected to "slow"
hosts. In fact, all they would do is respond on behalf of the
"slow" host - systems on the other end would not know that a proxy
answered the request. When a "slow" host disappears from the
network, the "fast" host is directly connected and will quickly
know. At that time, it can immediately scrub its cache of any files
available on the disconnected system.
Q3. In this network, everyone is equal. If you make some nodes
"above" other nodes, then people could abuse this power to tyrannize
the network.
A3. Not true at all. I am not suggesting "promoting"
anyone above anyone else on the network. Rather, I am simply suggesting
"demoting" those who are on slow connections, who are likely to bog
down the network. "Slow" people would have all the powers as
"fast" people, however they would be spared of the responsibilities
for routing and handling search requests.