Distributed computing (DC) is the process of using multiple computers to approach a complex and lengthy task. Internet-based DC normally involves many client computers connected to a server that will coordinate the clients. The server breaks up the main task into many smaller, more manageable parts for each client to compute. When a client finishes, it sends the results back to the server. The server will recombine all the results when the task is complete.
DC has many advantages over traditional linear computing, which does one thing at a time. The first and most obvious benefit is speed. Since DC splits up a large task, many computers can work on it at the same time. Additionally, using DC is much more reliable, because if one client crashes, another client will simply take over its part.
A big problem in distributed computing is how to divide the parts among the clients. As I will show in my experiment, it isn’t efficient to simply divide the task equally for each client, as some clients will be significantly faster than other ones. This brings us to load balancing, which allows faster clients to work more than the slower ones.
Another obstacle in distributed computing is determining the size of the smaller parts of the task. If the parts are too big then performance will suffer when a client that is calculating a part quits—it needs to be recalculated. If it’s too small then performance will also suffer because too much time is spent on the communication between the client and server. This is commonly known as communication overhead.
The purpose of my science fair project was to explore the relationship between overall performance and the number of clients in internet-based distributed computing network. Also, it will investigate the need for proper load balancing.
I hypothesize that the more computers participating, the greater overall performance will be. The effect of load balancing will probably be most prominent if the computers used vary greatly in speed.
I used the following computers.
|
Function |
Processor |
Memory |
|
Server |
Intel Pentium III 800 MHz |
256 MB |
|
Client 1 |
Intel Centrino 1.7 GHz |
512 MB |
|
Client 2 |
Intel Centrino 1.5 GHz |
512 MB |
|
Client 3 |
Intel Pentium IV 1.8 GHz |
512 MB |
|
Client 4 |
Intel Pentium IV M 1.8 GHz |
512 MB |
|
Client 5 |
Intel Pentium IV 2.8GHz |
256 MB |

Figure 1. Developing the client and server software.

Figure 2. Running the client software.
Equally Divided (no load balancing)
|
# Clients |
Overall |
Client 1 |
Client 2 |
Client 3 |
Client 4 |
Client 5 |
|
1 |
7.922051239 |
7.810612 |
|
|
|
|
|
2 |
4.933914042 |
3.8953 |
4.9053 |
|
|
|
|
3 |
5.556477642 |
2.609612 |
3.369413 |
5.529838 |
|
|
|
4 |
4.180690098 |
1.957225 |
2.57615 |
4.153212 |
3.288763 |
|
|
5 |
3.354543447 |
1.561088 |
2.09675 |
3.32515 |
2.654925 |
1.6343 |
Table 1. This table shows the average amount of time, in seconds, it took the distributed computing network to complete the assigned task without load balancing.
Load Balanced
|
# Clients |
Overall |
Client 1 |
Client 2 |
Client 3 |
Client 4 |
Client 5 |
|
1 |
8.297824621 |
8.138538 |
|
|
|
|
|
2 |
4.940483189 |
4.79875 |
4.776275 |
|
|
|
|
3 |
4.105860615 |
3.988375 |
4.010687 |
3.973312 |
|
|
|
4 |
3.407188749 |
3.290188 |
3.27655 |
3.266625 |
3.270188 |
|
|
5 |
2.811429405 |
2.6934 |
2.706375 |
2.683838 |
2.706063 |
2.677725 |
Table 2. This table shows the average amount of time, in seconds, it took the distributed computing network to complete the assigned task with load balancing.
The above tables of data can be graphed like this:

Figure 3. Graph of distributed computing performance without load balancing.

Figure 4. Graph of distributed computing performance with load balancing.
You can see that the overall time is always slightly higher than the slowest client, because of the small amount of time the server requires to process the messages from the clients. In the graph without load balancing, you can see that the slowest client (client 3) caused the overall time to become slow.
My experiment shows that adding more clients to an efficiently designed distributed computing network increases performance.
Without load balancing, the parts of a task would be equally divided amongst the clients. This means that the overall performance is bounded by the slowest client. For example, client 3 was extremely sluggish compared to the others, so it dragged the overall performance down.
However, when I added load balancing, all the clients worked for about the same amount of time, with the faster clients doing more parts than the slower ones. This means that adding new clients will always benefit performance, even if that client has very little computing power.
My results show that distributed computing, when implemented correctly, is an effective system that can utilize massive reserves of computing power to approach complex problems. This confirms my initial hypothesis.
I was initially interested in this topic because I’ve been doing Internet programming for a while now, and I had some vague awareness of distributed computing. I wanted to further my understanding and find out more. Along the way, I’ve learned many useful technologies and programming techniques, including: using Excel for data manipulation, automating processes (such as running trials for my experiment), Flash design, and Perl server and socket programming. I think my project was very successful and educational.
There are numerous applications of internet-based distributed computing in the real world—you may already have heard of some of the projects going on. For example, the SETI@home project uses thousands of computers to search for extraterrestrial radio signals. Folding@home simulates the dynamics of protein folding, in an effort to better understand diseases like Alzheimer’s, caused by protein misfolding. The Great Internet Mersenne Prime Search is the DC project that found the largest prime currently known. Many, many more uses for distributed computing are yet to be explored.
I’d like to thank Mr. Lee, my advisor, for guiding me through all the necessary steps of completing a science fair project.
Altenburg, Burt; et al. 2004. Internet-based Distributed Computing Projects. http://distributedcomputing.info
Team DC (Thinkquest Team C007645). 2000. DCCentral [Welcome!]. http://library.thinkquest.org/C007645/english/0-welcome.htm
distributed.net. 2004. distributed.net: Node Zero. http://www.distributed.net/