Monday 4 December 2006

standing in line

Bittorrent is a good answer to a very basic problem in computer science: what is the most efficient way to deliver data from a server to some number of clients? The simplest solution is FIFO (first in, first out): simply have the N clients queue up, and the server sends the data to each in turn. Websites with only a small amount of traffic or small amounts of data can often use this model without incident, but increase the clients or the size of the package sufficiently and things can quickly get out of control.

In queuing theory, we assume that clients “arrive” in a random distribution after the original server makes a file available while service rate will nominally follow a Poisson distribution:

The real math is hard, but the math of averages is pretty easy. If λ is the average arrival rate and μ is the service rate (note λ must be < μ, which we ensure by taking a large enough time slice so that all the clients have been served), then traffic intensity is:
average line length is:
which makes the average wait time:
An example: suppose λ = 3 and μ = 4. Then ρ=3/4, LQ=3, and TQ=1. If we double μ to 8, ρ=3/8, LQ=3/5, and TQ=1/5. Add 4 again, μ = 12, ρ=1/4, LQ=1/3, and TQ=1/9. Add 4 again, μ = 16, ρ=3/16, LQ=3/13, and TQ=1/13.
Okay, enough math—we all know what it’s like to be stuck in line. In the simple FIFO model, there are many clients and just one server—kind of like the lineup at a store with just one checkout lane. Traffic varies widely, and major pileups can occur. There are obvious solutions to this problem: multiple queues, like at the supermarket, or single lane, multiple cashiers, like at a bank. Bittorrent takes this idea one step further. It takes advantage of the fact that, in the computer world, digital packages can be split up into lots of smaller packages for convenient delivery to the client (the airlines are testing out this approach with luggage, but so far travelers don’t seem to appreciate the increased efficiency). Next, the server or “seed” sends out a different piece of the file to each of the clients, along with a tracker that allows the whole community of clients to know who has what pieces. Now the pieces are spread out amongst all of the participants in the torrent (the “swarm”), and if a given client is missing a given piece, it can now be found in several locations. Thus, every participant in the swarm is a “client” just for the pieces it doesn’t yet have, and a “server” for those it does. Before long, some of the clients have complete copies and become seeds. Some toy examples can show how well this system can work. Suppose that there are 32 clients. Suppose that each client can transfer (or receive) the file in 32 minutes.

In a FIFO situation, it takes 32 x 32 = 1024 minutes to serve all of the clients—one of whom gets the whole package in 32 minutes, and another of whom gets nothing until the 993rd minute! A very unfair situation. Even in a situation with mirrors (multiple servers), whether implemented as a supermarket or bank queue, we improve matters only by a factor of 1/#mirrors—that is, the improvement is a function of the number of mirrors, not the number of clients—because the clients don’t upload.

Suppose we improve matters by asking clients to become servers after they get the whole file. Then we have 1 copy in the first 32 minutes, or 2 servers at the start of the second session. 2 more copies, or 4 servers after 64, 4 and 8 after 96, 8 and 16 after 128, and all 32 copies are done in just 160 minutes—a vast improvement—overall speed increases as 2N, making efficiency of order log N (in FIFO, order is just N). Notice, though, that the clients are still spending most of their time waiting to get started—in the first 32 minutes, only 2 computers are active, in the second just 4. It’s not until the last 32 minutes that the whole swarm is involved.

We solve this problem by cutting up the original file into 32 chunks, each of which can be transferred in just one minute. In the first minute, the server uploads one piece to one client. In the second minute, the server uploads a second piece to a second client while the first client transfers the first piece to a second client. In the third minute, a third piece comes out, 3 have the first piece, and 2 have the second. In the tenth minute, a tenth piece goes out, 10 have #1, 9 #2, 8 #3, 7 #4, and so on. In the thirty-second minute, the thirty second piece goes out, and everyone has the first piece, one client is missing the second piece, two clients are missing the third, and so on. More importantly, everyone is fully involved in the torrent. In the fortieth minute, everyone has the first through eighth pieces, one person is missing the ninth, and 24 are missing the last piece. After just the 64th minute, everyone has all the pieces.

There’s a lot of simplification in this model. I’ve neglected the processor time to work out the transfer orders and the bandwidth used up in negotiating new connections, I’ve idealized my system so that all the clients show up at the same time, the components have equivalent capabilities, and everyone plays nice, but it is nevertheless a good indication of the sort of vast improvement bittorrent is over FIFO. What’s the order of the bittorrent solution? It’s log N, just like the earlier version. But it’s a much more efficient version because by dividing up the file, the whole swarm gets in on the torrent earlier—there’s simply less wasted time inactively waiting in line.

No comments: