Blog Post

Background

 

Distributed computing (DC) is the process of using multiple computers to approach a complex and lengthy task. Internet-based DC normally involves many client computers connected to a server that will coordinate the clients. The server breaks up the main task into many smaller, more manageable parts for each client to compute. When a client finishes, it sends the results back to the server. The server will recombine all the results when the task is complete.

 

DC has many advantages over traditional linear computing, which does one thing at a time. The first and most obvious benefit is speed. Since DC splits up a large task, many computers can work on it at the same time. Additionally, using DC is much more reliable, because if one client crashes, another client will simply take over its part.

 

A big problem in distributed computing is how to divide the parts among the clients. As I will show in my experiment, it isn’t efficient to simply divide the task equally for each client, as some clients will be significantly faster than other ones. This brings us to load balancing, which allows faster clients to work more than the slower ones.

 

Another obstacle in distributed computing is determining the size of the smaller parts of the task. If the parts are too big then performance will suffer when a client that is calculating a part quits—it needs to be recalculated. If it’s too small then performance will also suffer because too much time is spent on the communication between the client and server. This is commonly known as communication overhead.

 

Question/Purpose

 

The purpose of my science fair project was to explore the relationship between overall performance and the number of clients in internet-based distributed computing network. Also, it will investigate the need for proper load balancing.

 

Hypothesis

 

I hypothesize that the more computers participating, the greater overall performance will be. The effect of load balancing will probably be most prominent if the computers used vary greatly in speed.

 

Procedure

Materials

 

I used the following computers.

 

Function

Processor

Memory

Server

Intel Pentium III 800 MHz

256 MB

Client 1

Intel Centrino 1.7 GHz

512 MB

Client 2

Intel Centrino 1.5 GHz

512 MB

Client 3

Intel Pentium IV 1.8 GHz

512 MB

Client 4

Intel Pentium IV M 1.8 GHz

512 MB

Client 5

Intel Pentium IV 2.8GHz

256 MB

Method

  1. First, I developed client and server software for my experiment. They work together to calculate and graph an equation. I created two versions of the programs—one which simply divides the parts equally between the clients, and one which incorporates advanced load balancing technology.
    1. I designed the load balancing so that each client tells the server when it has finished its part. The server then gives it another part to calculate. The faster clients will naturally work faster, and thus, get more parts. In the end, the clients should all finish around the same time, although the faster clients will have completed more parts.

 

Figure 1. Developing the client and server software.

 

  1. After constructing the software, I began experimentation using the non-load-balancing version of my software. I started out using one client and one server, and had it complete the task. The server and client record the times they took to complete their parts. I did 5 trials, and then added another client. From there I repeated the process, continuing until I had tested with 5 computers.

 

Figure 2. Running the client software.

 

  1. I repeated step 2 with load balancing.
  2. I analyzed the data collected by the computers.

 

Results

 

Equally Divided (no load balancing)

 

# Clients

Overall

Client 1

Client 2

Client 3

Client 4

Client 5

1

7.922051239

7.810612

 

 

 

 

2

4.933914042

3.8953

4.9053

 

 

 

3

5.556477642

2.609612

3.369413

5.529838

 

 

4

4.180690098

1.957225

2.57615

4.153212

3.288763

 

5

3.354543447

1.561088

2.09675

3.32515

2.654925

1.6343

Table 1. This table shows the average amount of time, in seconds, it took the distributed computing network to complete the assigned task without load balancing.

 

Load Balanced

 

# Clients

Overall

Client 1

Client 2

Client 3

Client 4

Client 5

1

8.297824621

8.138538

 

 

 

 

2

4.940483189

4.79875

4.776275

 

 

 

3

4.105860615

3.988375

4.010687

3.973312

 

 

4

3.407188749

3.290188

3.27655

3.266625

3.270188

 

5

2.811429405

2.6934

2.706375

2.683838

2.706063

2.677725

Table 2. This table shows the average amount of time, in seconds, it took the distributed computing network to complete the assigned task with load balancing.

 

The above tables of data can be graphed like this:

Figure 3. Graph of distributed computing performance without load balancing.

 

Figure 4. Graph of distributed computing performance with load balancing.

 

You can see that the overall time is always slightly higher than the slowest client, because of the small amount of time the server requires to process the messages from the clients. In the graph without load balancing, you can see that the slowest client (client 3) caused the overall time to become slow.

 

Discussion and Conclusion

 

My experiment shows that adding more clients to an efficiently designed distributed computing network increases performance.

 

Without load balancing, the parts of a task would be equally divided amongst the clients. This means that the overall performance is bounded by the slowest client. For example, client 3 was extremely sluggish compared to the others, so it dragged the overall performance down.

 

However, when I added load balancing, all the clients worked for about the same amount of time, with the faster clients doing more parts than the slower ones. This means that adding new clients will always benefit performance, even if that client has very little computing power.

 

My results show that distributed computing, when implemented correctly, is an effective system that can utilize massive reserves of computing power to approach complex problems. This confirms my initial hypothesis.

 

I was initially interested in this topic because I’ve been doing Internet programming for a while now, and I had some vague awareness of distributed computing. I wanted to further my understanding and find out more. Along the way, I’ve learned many useful technologies and programming techniques, including: using Excel for data manipulation, automating processes (such as running trials for my experiment), Flash design, and Perl server and socket programming. I think my project was very successful and educational.

 

There are numerous applications of internet-based distributed computing in the real world—you may already have heard of some of the projects going on. For example, the SETI@home project uses thousands of computers to search for extraterrestrial radio signals. Folding@home simulates the dynamics of protein folding, in an effort to better understand diseases like Alzheimer’s, caused by protein misfolding. The Great Internet Mersenne Prime Search is the DC project that found the largest prime currently known. Many, many more uses for distributed computing are yet to be explored.

 

Acknowledgements

 

I’d like to thank Mr. Lee, my advisor, for guiding me through all the necessary steps of completing a science fair project.

 

Bibliography

 

Altenburg, Burt; et al. 2004. Internet-based Distributed Computing Projects. http://distributedcomputing.info

 

Team DC (Thinkquest Team C007645). 2000. DCCentral [Welcome!]. http://library.thinkquest.org/C007645/english/0-welcome.htm

 

distributed.net. 2004. distributed.net: Node Zero. http://www.distributed.net/