Lade Inhalt...

Scalability of the Gnutella Network and Business Opportunities of Peer-to-Peer Networking

Diplomarbeit 2002 78 Seiten

Elektrotechnik

Leseprobe

Table of Contents

List of Definitions

List of Equations

List of Figures

List of Tables

Acknowledgements

Abstract

Preface
Organization of the Document
Introduction

Chapter 1 Definitions and Models
1.1. Peer-to-peer - Definition and models
1.1.1. A technical definition
1.1.2. Delimitation to Client-Server based networks
1.1.3. Network models
1.1.4. PIE: Presence, Identity and Edge Resources
1.1.5. Mindset or Technology?
1.2. Scalability - Definition and Interpretation
1.2.1. Scalability of the number of users
1.2.2. Scalability of bandwidth
1.2.3. Scalability of the number of connections respectively traffic flows

Chapter 2 Existing Work on Scalability in peer-to-peer Networks
2.1. Why scalability matters
2.2. Small World Networks and peer-to-peer
2.3. Other work on peer-to-peer scalability
2.4. Gnutella Can not Scale. No, Really!

Chapter 3 Thesis & Expectations

Chapter 4 The Experiment
4.1. Definitions
4.1.1. Horizon
4.1.2. Reachable Servents
4.1.3. Stage
4.2. Technical feasibility
4.3. Experiment setup
4.4. What did we measure?
4.5. How did we measure? Changes to the Qtella client
4.5.1. Ping analysis
4.5.2. Loop analysis
4.6. Preparation of the measurement results

Chapter 5 Results of the experiment
5.1. Number of reachable hosts
5.2. Loop analysis in the Gnutella network
5.3. Interpretation of the results
5.3.1. Ping analysis
5.3.2. Loop analysis
5.3.3. Gnutella’s scalability
5.3.4. Possible sources of errors

Chapter 6 Where is the business?
6.1. Next generation Internet
6.2. What is the network worth?
6.2.1. Metcalfe’s Law
6.2.2. Gilder’s Law
6.2.3. The critical mass of a network
6.3. Problems of centralized Client-Server systems
6.4. Benefits of decentralization and peer-to-peer
6.5. Range of applications
6.6. Distributed Processing
6.6.1. Market Opportunities
6.6.2. Business Models
6.6.3. Benefits & Challenges
6.6.4. Risks & Threats
6.7. Distributed Storage Systems
6.7.1. Market Opportunities
6.7.2. Business Models
6.7.3. Benefits & Challenges
6.7.4. Risks & Threats
6.8. Collaboration Software / Groupware
6.8.1. Market Opportunities
6.8.2. Business Models
6.8.3. Benefits & Challenges
6.8.4. Risks & Threats
6.9. Instant Messaging
6.9.1. Market Opportunities
6.9.2. Business Models
6.9.3. Benefits & Challenges
6.9.4. Risks & Threats

Chapter 7 Summary and Outlook
Appendix A: The Gnutella Protocol
Appendix B: Experiment results

Bibliography

Glossary

List of Definitions

Definition 1 - Peer-to-peer

Definition 2 - “Pure” peer-to-peer

Definition 3 - Peer-to-peer according to Shirky

Definition 4 - Scalability according to http://www.whatis.com

Definition 5 - Scalability according to Stefan Butenweg (see [5])

Definition 6 - Metcalfe's Law

List of Equations

Equation 1 - Total number N of reachable nodes (see [8])

Equation 2 - Generation function (see [8])

Equation 3 - Bandwidth generated in return to a query (see [8])

Equation 4 - Probability for building a tree structure after h hops (lower bound)

Equation 5 - Probability for building a tree structure after h hops (upper bound)

Equation 6 - Maximum number of reachable servents* for 100 connections

Equation 7 - Total number N of reachable nodes for nconn connections of our servents*

Equation 8 - Estimation of the upper bound of the total number of reachable

Equation 9 - Estimation of the lower bound of the total number of reachable

Equation 10 - Maximizing Metcalfe’s law, The value V of the network has to be equal to the number

List of Figures

Figure 1 - Pure peer-to-peer network structure

Figure 2 - Hybrid peer-to-peer network structure

Figure 3 - Probability of a pure tree structure in the Gnutella* network versus the distance in hops (see [20])

Figure 4 - Simulated bound for the cumulative number of pings for a total of Nall=1000 servents* with nc=3 connections each. The blue lines show the 90% confidence interval. (see [20])

Figure 5 - Example of a log entry conduction the Ping analysis 19

Figure 6 - In the bottom left the numbers of the results of the currently running experiment are displayed. On the right the loop analysis can be started

Figure 7 - A screenshot from one of our helper applications in action

Figure 8 - The theoretical number of reachable servents* (dashed) versus the measured number of reachable clients (solid). To better keep the overview we only show two different numbers of max. connections.

Figure 9 - The total number of reachable servents* broken down on the numbers of connections of the client. The red line represents the linear trend for the first numbers of connections. The black line shows the logarithmic trend of the values.

Figure 10 - Loop percentage in the connections for different numbers of connections open in average

Figure 11 - The figure shows the development of the loop distance in hops relative to our Qtella client

Figure 12 - Showing the upper and lower bound for the total number of reachable servents* for 5.00 connections

Figure 13 - Number of reachable servents* on per connection basis for the different numbers of connections in average

Figure 14 - Metcalfe and Gilder: Plotting the laws of network dynamics (x-axis: year y-axis: bandwidth in Kbps, hosts in units) (see [23] page 39)

Figure 15 - Distributed Computation — Aggregate Market Opportunity ($ thousands) (see [23] page 71)

Figure 16 - Comparing processing metrics and costs of a super computer in 1989 and a standard P4 desktop PC in 2001 and a up-to-date super computer with 100,000 desktop PCs installed e.g. in a company (see [23] page 56)

Figure 17 - Annual Network Line Cost Compared with the Cost of 9.6GB Storage — One Day’s Transmission on T1 Line Rates ($) (see [23] page 84)

Figure 18 - Estimated bandwidth cost saving (see [23] page 84)

Figure 19 - Worldwide Content Delivery/Streaming Media Market Opportunitysee [23] page 101)

Figure 20 - Accelerated file-transfer (see [23] page 95)

Figure 21 - Market opportunities for decentralized Collaboration Software (see [23] page 115)

Figure 22 - Total number of reachable servents* for the different average numbers of connections of our Qtella client III

Figure 23 - Graphical preparation of the results from Table 2 III

Figure 24 - Decrease in the number of reachable servents* per connection IV

Figure 25 - Breaking down the Ping experiment on the individual number of connections of all measuring results. The black line shows the logarithmic rend in the graph. IV

Figure 26 - The average path length in hops during our experiments remained almost constant for the different numbers of connections V

Figure 27 - The figure shows the Gaussian distribution of the relative frequency of direct and indirect loops in our measurements with 82.33 open connections in average. V

List of Tables

Table 1 - Peer-to-peer models (modified from [2])

Table 2 - Number of reachable servents* in our measurements related to theoretical values

Acknowledgements

The author likes to thank his supervisors, Martin Deubler and Ruediger Schollmeier, for spending their time discussing the document and measurement results with him. He wants to thank them as well for reviewing the document before submitting it. In addition I want to thank the Institute of Communication Networks of the Mu- nich Technical University and the Siemens AG for making this document possible.

Special thanks go to Mark “Moak” Seuffert for the long and productive discussions on IRC #gnutelladev and by email. I hope he will never give up fighting for the “real” Gnutella* ideals ;-) on gnutellaforums.com.

Finally I want to thank Bernd Lojewski for correcting my English ;-) and Martina Korff for helping me not to loose my patience and enthusiasm for my thesis in the last six month.

Abstract

In this document we analyze the Gnutella* network’s scalability by measuring the number of reachable servents* in the network and the loop probability. We distinguish direct and indirect loops and show that the loop percentage is growing with the number of connection per client. For this reason our measurements show a reduced number of reachable servents* from a single client on a per connection basis.

We argue that loops lead to a reduction in the number of reachable hosts. This implies a reduction in the occupied bandwidth from the Gnutella* protocol. Therefore the network is scaling much better than one may expect.

In the second part we emphasize business opportunities for decentralized net- works and discuss benefits, challenges, market opportunities, risks, and threats for the four different areas of peer-to-peer computing. We come to the conclusion that peer-to-peer technology is offering great revenue streams in the future and that common platforms are needed to further push peer-to-peer into the markets.

Preface

Organization of the Document

This document is structured into two parts. The first part is about the scalability of the Gnutella* network, the second handles business opportunities in the peer-to- peer networking market.

First we define the terms “peer-to-peer” and “scalability” for further usage in the document (compare Chapter 1). We also define special models in peer-to-peer networks, such as “hybrid” and “pure” peer-to-peer models. In addition we intro- duce the PIE model, standing for Presence, Identity and Edge Resources. Summing up this chapter we have a short discussion, whether peer-to-peer is a technology or a mind-set.

In Chapter 2, we introduce the reader to existing documents about the scalability of peer-to-peer networks. We cover the small-world networks as a model for peer-to- peer networks and prove Jordan Ritter’s approach to calculate bandwidth usage in a Gnutella* network to be wrong.

Chapter 3 presents our point of view on the scalability of pure peer-to-peer networks that we have been able to prove with our experiments.

The experiment itself is presented in Chapter 4. Here we go into great detail and describe what and how we measured, technical feasibilities and the preparation of our results.

In Chapter 5 we present and analyze our measurement results. We show that loop building in the Gnutella* network and the limited number of users prevent the creation of a binary tree structure in the network. This leads to a much lower number of reachable clients in the network, which implies less traffic overhead due to the Gnutella* protocol. This chapter concludes our first part.

Chapter 6 is introducing business opportunities of decentralized networks to the reader that are often based on peer-to-peer network models. First we discuss the impact of Metcalfe’s and Gilder’s law on networks. After that we explore how to measure and maximize the value of a network. Finishing up these theoretical ap- proaches to specify the network’s value, we discuss flaws in centralized Client- Server design and present benefits of the peer-to-peer network model.

Next we present the four areas in which peer-to-peer technology provides applications: Distributed Processing, Distributed Storage Systems, Collaboration Software and Instant Messaging. For each of these applications we analyze benefits, challenges, market opportunities, risks and threats.

Finally, in Chapter 7 we summarize all our results and finish this document.

At this point we like to mention that all terms that can be found in the glossary at the end of the document are marked with *.

Introduction

The next-generation Internet or ”Internet 3.0” as Bear, Stearns & Co. Inc. call it in [23] is evolving. We do not talk about an IPv6-enabled Internet but about development in the next years. Starting in the 1960s with a small group of engineers and academics connecting a handful computers through the so-called ARPANET*. The network developed and became what is today known as the Internet.

Today’s Internet is best described as the Web era: the network is based on a mainly centralized Client-Server infrastructure and routers, e-business and browsers are the main objects that pushed the Internet to what it is today.

However technology, software and networking are improving and offer more and more possibilities. Media streaming, broadband access at the edges of the network, faster and more powerful machines and decentralization are key terms today that offer a variety of improvements.

In this document we focus on a networking model that is based on the idea of de- centralization to get rid of the flaws of centralized Client-Server design. It is follow- ing the basic ideas of how the Internet was once planed: Peer-to-peer networking.

When hearing about peer-to-peer most people immediately think of Napster and File-Sharing, since this is the widely known application of peer-to-peer networks at present. It came to questionable glory by the law suites the RIAA* enforced against most leading File-Sharing application vendors, amongst them Napster.

But also the business has discovered peer-to-peer: with an estimated market of more than $25 billion in 2004 (see [23]) it is offering interesting possibilities to many companies. Numerous startups have been founded in the peer-to-peer area in the recent years. More than $ 560 million (see [3] page 15) have been poured into startup-companies by banks and software or hardware producers, as Microsoft, Intel, Sun Microsystems, etc. Many of these big players have started “initiatives” like .NET* , SunONE* or the P2P Working Group. In the second part of this report we have a look at some potential markets for peer-to-peer technology and analyze their backgrounds (see Chapter 6).

Before this, we focus on a special issue of peer-to-peer networks: Scalability. It is still an unsolved problem and a hot object of discussion in news-groups, bulletin boards and amongst user communities.

Peer-to-peer based networking as a technology has evolved in the recent years and products are able to offer many solutions to overcome the flaws in Client-Server based systems. Peer-to-peer models have not yet been fully academically analyzed and many aspects are still under development.

Starting at the protocol level, the different possibilities of network structuring and scalability developers constantly work on new improvements to the network. In the following sections we discuss peer-to-peer networks, their scalability and present our measurement results.

Chapter 1 Definitions and Models

1.1. Peer-to-peer - Definition and models

Peer-to-peer - also stated as P2P, P-to-P, peer-2-peer or in capital characters Peer-to-Peer - is often taken as a synonym for any kind of network that deals with peering technology.

1.1.1. A technical definition

Ruediger Schollmeier, Munich Technical University, Germany faced the problem of many different interpretations of the meaning of peer-to-peer networking in the media as he started his PhD. In his first paper he proposed a highly technical definition of peer-to-peer (see [7]) but managed to clean up and summarize the different approaches.

The following paragraph defines the expression “peer-to-peer”:

“A distributed network architecture may be called Peer-to-Peer (P-to-P, P2P, …) network, if the participants share apart of their own hardware resources (processing power, storage capacity, network link capacity, printers, …). These shared resources are necessary to provide the Service [1] and content offered by the network (e.g. file sharing, storage capacity, network link capacity, printers, …). They are accessible by other peers directly, without passing intermediary entities. The participants of such a network are thus resource (Service and content) providers as well as resource (Service and content) requestors (Servent -concept).” (see [1] page 3)

Definition 1 - Peer-to-peer

From Definition 1 we learn about the most important properties of peer-to-peer networks:

ƒ - Each peer in the network shares parts of its resources (CPU time, storage capacity, etc.) with other participating peers in the network.
ƒ - Every peer offers a defined collection of services to the networks. The type of service provided may vary from peer to peer but it is accessible with equal rights by others.
ƒ - Peers are resource providers and requestors, which play an equal role in the network topology.

The peers’ behavior as content providers and requestors has led to the expression “Servent”* to describe their behavior. The name for peers participating in a peer- to-peer environment is derived from the word server to account for its server-like behavior to provide service (content) to the network and from the word client accounting his client-typical part in the network as a service (content) consumer.

In many other papers authors often refer to “Services” as “Content” which is a tighter description than given by Ruediger Schollmeier but - at least at the moment - it accounts better for what services are in toady’s peer-to-peer networks: mainly sharing files and database content of users accessible for other participants in the network.

Although Ruediger Schollmeiers’ definition manages to account for the most impor- tant properties of a peer-to-peer network it has one shortcoming: It seems that servers as we know them from nowadays Client-Server based networks can not participate in peer-to-peer networks. This is because they will be more often con- tent providers rather than content consumers in a peer-to-peer environment and therefore do not fall into the definition of peer-to-peer. From our point of view this behavior does not contradict the peer-to-peer paradigm as long as those servers act as any other servent* in the network, subduing themselves under the rules of peer-to-peer network protocols.

1.1.2. Delimitation to Client-Server based networks

The major difference to traditional Client-Server based architectures is the equality among servents* in peer-to-peer environments. While servents* provide services - and in order to do so offer hardware recourses (CPU time, storage capacities, etc.) - to one another the whole service provision in Client-Server environments is server based and the client is nothing more than a service requestor.

1.1.3. Network models

It is clear that in many cases system administrators or the IT departments are not willing to give the complete network control to the network itself, even though peer-to-peer network design allows to do so. Different approaches and scenarios lead to two different models of peer-to-peer networks: The so called “Pure” and “Hybrid” peer-to-peer networks.

Pure peer-to-peer networks are characterized by the equality of all participating servents. No central point of intelligence is needed to coordinate, authenticate, etc. peers joining the network. Ruediger Schollmeier defines pure peer-to-peer architec- ture as follows:

“A distributed network architecture has to be classified as a “ Pure ” Peer-to-Peer network, if it is firstly a Peer-to-Peer network according to Definition 1 [compare Definition 1 - Peer-to-peer] and secondly if any, arbitrarily chosen Terminal Entity[2] can be removed from the network without having the network suffering any loss of network service.” (see [1] page 4)

Definition 2 - “Pure” peer-to-peer

We conclude that a pure peer-to- peer network contains at least two Terminal Entities and does not con- tain any servers offering specialized services to the network. Again, from our point of view this definition is not completely satisfying, since serv- ers prevent a system from being of a pure peer-to-peer architecture even if they behave like servents. In the article of Sweeney et al. pure peer- to-peer networks are also referred to as “Atomistic”, in which “[c]lients make contact using known network

Abbildung in dieser Leseprobe nicht enthalten

Figure 1 - Pure peer-to-peer network structure addresses or via broadcast an-

nouncements. This model needs no server capability, but assumes that bandwidth is readily available.” (see [2] page 1) The article brings up new considerations regarding peer-to-peer networks. In real life pure peer-to-peer design often is not an option, since ongoing development is not able to solve some important networking issues or other restrictions do not permit the usage of pure peer-to-peer technology. For example in enterprise file sharing systems, where version control is of high relevance. Developers often choose to add a server to the network, responsible for version control of docu- ments that are shared on the peer-to-peer network. Another example is access control. Without a central service it is currently hard to implement authentication mechanisms in peer-to-peer networks. Peer-to-peer networks show that security systems with central control will have to be re-evaluated in the future vs. user re- sponsibility - an inevitable evolution for computing according to [2].

Sweeney et al’s article provides some ideas on how network models may look like to satisfy customers’ demands and present technical possibilities. Summarizing the article, they have a detailed look at the second form of peer-to-peer networks, so called “Hybrid” Peer-to-Peer net- works. The difference to pure peer- to-peer systems is simply the exis- tence of some central authority in the network to manage critical de- mands, like indexing, directories, security, work flow, policy, etc. (see Figure 2). Indices and directories are both used to collect service informa- tion in peer-to-peer networks but there is a subtle difference. While indices explicitly point to data in the network, directories point to users. Both do not have to reside on a sin- gle machine. Depending on the network architecture many or only one

Figure 2 - Hybrid peer-to-peer network structure

Abbildung in dieser Leseprobe nicht enthalten

index or directory may exist, as illustrated in Figure 2. It is important to recognize that using an index one cannot find users based on identity, only based on data they share. In contrast, a directory can only be used to locate users not data.

It is also important that there are different policies to access a directory or index. There is informal access where access to indices, directories and resources is con- trolled by individual decisions or community norms and formal access being strictly controlled by additional (authentication) services. As an example for informal ac- cess, ICQ* allows users to choose, which of their personal data is shared with the rest of the community. The latter is often found in business peer-to-peer solutions like Groove* and NextPage3.

Abbildung in dieser Leseprobe nicht enthalten

Table 1 - Peer-to-peer models (modified from [2])

Table 1 illustrates the models discussed in this chapter. The distinction between user-, data- and compute centered peer-to-peer models illustrates the different usage of indices and directories.

The “Web Mk 2” model in Table 1 is a convergence of the hybrid peer-to-peer models and existing web architectures. Groove* Networks goes strongly into this direction by offering a hybrid peer-to-peer infrastructure to customers while providing the possibility to add features by plugins to their client software and by adding servers with special services to the network.

1.1.4. PIE: Presence, Identity and Edge Resources

After a very technical inspection this chapter likes to present the basic ideas of peer-to-peer. In [3] Section Onethe authors introduce PIE as the three fundamental ideas behind peer-to-peer. They stick to the very general definition of peer-to-peer provided by Clay Shirky (see [4]):

“P2P is a class of applications that takes advantage of resources—storage, cycles, content, human presence—available at the edges of the Internet.”

Definition 3 - Peer-to-peer according to Shirky

Although Definition 1 provides a more solid definition, Shirkly points out the importance of what Dale Dougherry et al. call PIE. From their point of view peer-to-peer is a mindset that is approaching three main issues. Those issues have been in network designers’ minds for years in order to improve shortcomings of the “traditional” Client-Server architectures, as we know them from the past.

The Internet as it exists today has several properties and flaws that pushed devel- opers minds. One issue is that machine centric network design - as the whole Internet with its IP based addressing scheme is - is not meeting users demands. The large number and heavy usage of search engines in the web and the popularity of File-Sharing and Instant Messaging applications show that people want to uniquely identify persons or content in the net. In addition peoples’ mobility and the possibility to reach a person at any place and any communication device are issues that developers have been trying to resolve in the past. Results are protocols like SIP or Mobile IP and even though developers succeeded, the crux is always the machine centric addressing scheme in the net making it so hard to achieve this goal.

A second issue is fault tolerance and the implied costs. To be on the save side re- dundancy in form of backup-servers is often added to the network to prevent a to- tal failure of a system in case a server brakes down or is compromised by an attack from outside. In order to do so, a lot of money has to be paid for the machines while especially administration outflanks these costs by far. Peer-to-peer networks are less reliant on central servers and therefore much more robust against failure and attacks. In addition administration overhead is likely to be less than in the clas- sical Client-Server environment. In this sense peer-to-peer technology is a step to- wards higher reliability of systems and reduced administrative costs.

Finally the spreading of broadband access of home users to the network in contrast to the majority of content available to the Internet is located in the core of the net- work on server farms. While the content is delivered on high-speed links from serv- ers close to the core to the edges of the network, ISP* spend tremendous amounts of money to transfer the requested data while in their network available bandwidth is idle. Also taking into account that most home PCs nowadays have storage ca- pacities and processing power comparable to servers five years ago, one may ask the question as to why resources at the edges of the network are scarcely used.

Summarizing these points, Dale Dougherry et al. came up with PIE building the fundamental mindset on which peer-to-peer is based. Presence is the demand on the knowledge of availability of content or individual users, Identity stands for the need of unique identification of content or persons in the net and Edge Resources describe the fact of scarcely used resources at the edges of he Internet.

1.1.5. Mindset or Technology?

In [3] peer-to-peer is referred to as a mindset and the authors stick to this position in their book even though they examine peer-to-peer infrastructure in Section 6 of their book. Is there really nothing like a peer-to-peer technology or infrastructure? Is it all a mindset? Of course the fundamentals of PIE represent the mindset peer- to-peer is based on and it is truly a rather old idea. To manage PIE in today’s net- works technology platforms and infrastructure like Sun’s JXTA* or Microsoft’s .NET* are needed. The authors themselves show this by adding section 6 to their book. At the moment companies are trying to build peer-to-peer infrastructures though cooperation among them and towards a standard is hardly enforced. It will be necessary to build a common and compatible infrastructure in order to success- fully push peer-to-peer technology in the long run. From this we can that say peer- to-peer is not only a mindset but also an evolving technology where its infrastruc- ture is not standardized yet and is currently being developed.

1.2. Scalability - Definition and Interpretation

Before we discuss the experiments on Gnutella’s scalability it is important to define what scalability in a network means. This is difficult since too many effects influence network performance - from bandwidth usage, the number of network nodes to the number of connections and data streams. Often only bandwidth usage is considered when people deal with scalability.

After some research we found some definitions, which are summarized best by http://www.whatis.com:

“In information technology, scalability (frequently spelled scaleability) seems to have two usages:

1) It is the ability of a computer application or product (hardware or software) to continue to func- tion well as it (or its context) is changed in size or volume in order to meet a user need. Typically, the rescaling is to a larger size or volume. The rescaling can be of the product itself (for example, a line of computer systems of different sizes in terms of storage, RAM, and so forth) or in the scalable object’s movement to a new context (for example, a new operating system). […]

2) It is the ability not only to function well in the rescaled situation, but to actually take full advan- tage of it. For example, an application program would be scalable if it could be moved from a smaller to a larger operating system and take full advantage of the larger operating system in terms of performance (user response time and so forth) and the larger number of users that could be handled. […]”

Definition 4 - Scalability according to http://search390.techtarget.com/sDefinition/0,,sid10_gci212940,00.html, tml. 04/12/2002

This rather solid definition is what comes to peoples minds first when talking about scalability even though it is not very useful to be applied to a peer-to-peer network.

Stefan Butenweg (see [5]) has defined scalability the following way and states that it has to be considered in three different categories when applied to a network4:

“The property of a network, to manage the over an order of a magnitude growing number of us- ers and the growing number of connection requests respectively of traffic flows and to provide sufficient bandwidth to the over an order of a magnitude growing data-traffic is called scalability.

In this context the question on how to extend a communication network the easiest way is often discussed. This question targets the complexity of the work to enhance network bandwidth. This property is called extensibility.

The question of scalability of a network can be separated into three areas: ƒ

-Scalability of the number of users.
ƒ-Scalability of the number of connections respectively traffic flows ƒ
-Scalability of bandwidth”

Definition 5 - Scalability according to Stefan Butenweg (see [5])

Since we follow Definition 5 in this work, we focus these on three areas.

1.2.1. Scalability of the number of users

This factor accounts for the number of users participating in the network. It means that the network has to provide access to users willing to join the network. Accord- ing to the Gnutella* Protocol specifications (see [6]) the network scales, since it is always possible to join the network. This can be done by contacting a servent* after receiving IP addresses of active Gnutella* network participants by one of the avail- able host caches or using locally stored IP addresses from a previous session. Al- though the number of connections a servent* can handle is limited, it is only a mat- ter of time to find a servent* that can establish a connection to the new user in the network. Even if many servents* reject connection setups for others joining the net- work because they reached their maximum number of connections, contact can be always established.

From this point of view the Gnutella* network scales. This issue of scalability is not of interest for further investigation in this paper.

1.2.2. Scalability of bandwidth

To raise the bandwidth of a network there are different approaches: One is to raise the available bandwidth of each network node. In this case the network size is not changing but the network nodes are scaling. Another approach is to add new nodes to the network. Here the network size grows while the bandwidth available in a network node is constant. The protocol properties determine if available bandwidth is sufficient to manage the traffic necessary to keep up network functionality.

Network nodes’ scalability is up to the users joining the Gnutella* network and their link capacities ranging from high-speed LAN* access to dial-up connections. Therefore node scalability is relying totally on the network users and can only be affected by the Gnutella* protocol indirectly by lowering the necessary organization traffic for network functionality.

Adding new nodes to the Gnutella* network should - at first glance - enhance the organization traffic in the network and the necessary bandwidth would raise. In the experiment we tried to determine the network structure in more detail to find out how this structure affects the Gnutella* protocol traffic in the network.

1.2.3. Scalability of the number of connections respectively traffic flows

These issues account for the number of possible connections of a network node and the signaling load it can handle. Both are closely related to available bandwidth of the node. Most desktop PCs have enough processing power, storage capacity, etc. to handle incoming requests, the problem is rather the available bandwidth. According to [7] and [8] 91% of the traffic are queries. A dial-up connection at 56 kBps can therefore handle a maximum of 84 queries5 a second. So the number of connections must be limited to be able to handle the message flows. In this exam- ple no other network traffic as file transfer, web browsing, etc. is considered. This shows that the scaling of connection numbers and traffic flow is mainly a band- width issue. We try to analyze, how the network structure influences the request rates and what the results imply on the network’s behavior.

Chapter 2 Existing Work on Scalability in peer-to- peer Networks

First of all we point out that there is not much research on scalability done yet. Most of the papers are analysis of the network, not entirely focused on scalability. A large part of them give insight into network size - which is at about a maximum of 50,000 users -, the message rates of the protocol, average node connectivity, etc. Also a point of investigation is to describe the Gnutella* network as a so-called small-world network. In this area most investigations rely on simulations.

2.1. Why scalability matters

Scalability is a very important property of a networks since it shows that the net- work can operate when increasing user number, connection numbers and traffic flows and of course bandwidth. Especially from a business point of view it is vital since customers can be sure that they do not run into any problems extending a scalable system - or in the networking case that the system won’t eat up all the available bandwidth. For the provider scalability of a system means that he won’t run into any unforeseen problems when extending customers systems. In addition, from a provider’s point of view it is important that the system can be extended to large scale, since they intend to sell as many systems and as large in extent as possible.

An even more important point is what economists call the “network effect”: The more participants a network has, the more content is - ideally - available. If this content is interesting for a broad mass of users the more valuable the network is to the user. In this sense it is vital for a product to be scalable to be able to reach the network effect at some point, offering great revenues.

For these reasons it is very important to prove scalability of peer-to-peer network models before companies start bringing products to the market.

2.2. Small World Networks and peer-to-peer

Watts and Strogatz (see [16]) have shown how graph theory can be used to map networks to so called small-world networks6. The small world model consists of a network of nodes. Its topology is of a regular lattice, with the addition of a low- density p7 of connections between random pairs of nodes (see [15]). Small world networks are denoted by their characteristic path length and their clustering coeffi- cient. The phenomena of a small-world network graph is its clustering of smaller groups with long-range connections among different clusters making the average path length to content in the network relatively short (see [17], chapter 14). The clustering coefficient gives a measure of the clustering in the network reaching from

0 for a random graph to 1 for a fully connected regular graph. The larger the clus- tering coefficient is the more likely the clusters may break apart if the nodes fail that connect the different clusters in the network. Surprisingly peer-to-peer net- works, as well as social networks, can be simulated using small world network models.

In [17], chapter 14 Theodore Hong calculates the clustering coefficients for a Gnutella* network, based on a random graph with an average number of connec- tions per node k set to 3 and a total number of nodes n of 1000. His work shows that in this model Gnutella* is relatively robust against node failure and node at- tacks. It tarts to loose connectivity among the clusters as more then 40% of the nodes fail or are compromised. He also states that due to the model evolved, the average pathlength in the Gnutella* network scales logarithmically with the size of the network as well as the request pathlength does. However he mentions that his results do not reflect the actual bandwidth usage in the network. He estimates that the bandwidth usage is scaling linearly with the network size but he mentions that this is a lower bound for bandwidth usage.

Javanovic,. Annexstein and Berman were able to show that the actual clustering coefficient in a random graph model is higher then calculated ones like in [17]. According to their measurements the coefficient is between 0.5 and 0.7 while calculations result in coefficients from approximately 0.25 to 0.5. Additionally the measured average path length is shorter then the calculated one (see [10]). Newman and Watts introduced a scaling function to calculate the dimensions and surface areas of small world networks’ graphs and their percolation (see [15]). Simulations of the model have confirmed their results.

The question, how the utilized models and their parameters actually reflect the real network structure of peer-to-peer networks, more precisely the network structure of a pure peer-to-peer network like Gnutella, is still a subject of discussion.

2.3. Other work on peer-to-peer scalability

Though the small world model is likely to be a good model for pure peer-to-peer networks, none of the mentioned papers calculates the resulting bandwidth usage. In his draft Scaling Issues on Internet Networks (see [11]) Bill Arnaud uses the “N squared” phenomena8 - which is related to the small world network model - to estimate the impact of a growing network size on the network topology, network traffic, etc. Arnaud’s paper is not based on simulations and also lacks calculating or measuring numbers. It relies on examples and statistical numbers of the development of applications and networks over the past years.

The papers [12], [9] and [7] analyze the network traffic in the Gnutella* network and present graphs showing the number of links versus the number of nodes in the network, network size development, percentage of a path-length values, etc. [7] and [12] also measure the available bandwidth of a network participant resulting in an available bandwidth of 6 kBps/user. Taking this value and estimating the network size to 50,000 users - a rather large value - having 3.4 connections in average the network has to cope with a bandwidth of 1GBps in total. Over a month period this results into 330 TB per month. It has to be mentioned that these measurements are from November 2000 and the values may have changed in the meantime. But still they give us a good estimate of traffic in a pure peer-to-peer network.

2.4. Gnutella Can not Scale. No, Really!

The chapters heading is the title of an often cited article of Jordan Ritter from Xerox PARC (see [8]). Ritter assumes that the network topology of the Gnutella* network is of a binary tree structure and calculates bandwidth usage in the network gener- ated by queries. In a binary tree the total number of reachable nodes in the network - in our case servents* connected to the Gnutella* network - is calculated as fol- lows:

illustration not visible in this excerpt

Equation 1 - Total number N of reachable nodes (see [8])

In Equation 1 n equals the number of connections of each node while T builds the sum over all hop distances up to t. From there Ritter uses the following equation to calculate the generated bandwidth by a relaying transmission of s bytes for a given n and t. This Generation is defined as:

illustration not visible in this excerpt

Equation 2 - Generation function (see [8])

f(n,x,y) is a function describing the maximum number of reachable users that are at

least x, but not more than y hops away (see [8]). In addition he gives an equation

for the incurred bandwidth transmitting s bytes given n and t. Incurrence is defined as the reception or transmission of data across a unique connection to a network. In fact using this equation Ritter accounts for the generated bandwidth twice as Kabanov argues (see [19]).

In the next step Ritter estimates the generated traffic in response to a query employing the function k(n,t,R) defined as

illustration not visible in this excerpt

Equation 3 - Bandwidth generated in return to a query (see [8])

R represents the constant response factor and denotes the product of the percentage of users responding and the amount of data generated.

Taking it all together Ritter ends up with tremendous amounts of bandwidth necessary to operate the Gnutella* network which is not practical in real-life and shows that the Gnutella* network is not scaling from Ritter’s point of view.

In fact his calculations have one big error in them: the model they are based on. The Gnutella* net is not of a binary tree structure as Ruediger Schollmeier shows in [20]. In his calculations the probability for building a tree structure in the Gnutella* network can be calculated using the following equation:

illustration not visible in this excerpt

Figure 3 - Probability of a pure tree structure in the Gnutella* network versus the distance in hops (see [20])

illustration not visible in this excerpt

Equation 4 - Probability for building a tree structure after h hops (lower bound).

illustration not visible in this excerpt

Equation 5 - Probability for building a tree structure after h hops (upper bound) (see [20])

In the equations h denotes the hop count. Nsearch(h) is the total number of servents* needed to establish a worst case tree up to hop h while Nsearch(h-1) equals the number of servents* searching (nc-1) connections on hop nc represents the maxi- mum number of connections on a servent*. Nfree is the total number of servents* not connected yet.

Figure 3 shows Ptree(h) calculated for different values of h and for different numbers of total available servents. It is obvious from the graph that for raising h the prob- ability for establishing a connection to an already connected servents* is rising while Ptree(h) rapidly approaches zero. These calculations show that the binary tree structure Ritter used as a fundament of his calculations does not hold and therefore his calculations are wrong.

Chapter 3 Thesis & Expectations

From our point of view Gnutella* scales for the following two reasons. First the number of users is always limited and therefore it is very unlikely to join the network while not creating a loop in the network. This is especially the case the later - in terms of the total number of available hosts is already connected - a servent* is joining the network. For this reason the network does not build a binary tree structure as proposed by Ritter (see [8].

As soon as loops are established within the horizon of a servent* the Gnutella* protocol will prevent a servent* who receives the same message from two different connections to further propagate the same message twice in the network. In the Gnutella* case loops may be healthy for the network since they limit the number of messages - and the necessary bandwidth - propagated while the Gnutella* proto- col itself is rather bandwidth wasting than efficient. On the other hand these loops will limit a servent* to reach the theoretical maximum number of servents* with its search queries, given that the TTL* in the packets it sends out is smaller than the farthest hop away.

illustration not visible in this excerpt

Figure 4 - Simulated bound for the cumulative number of pings for a total of Nall=1000 servents* with nc=3 connections each. The blue lines show the 90% confidence interval. (see [20])

[...]


[1] According to Schollmeier, "[a] Service is a packaged set of capabilities of a communication net- work, to manage the exchange of information, which finally fulfills a certain demand of the (Ser- vice-)user. This Service is provided by the network to the user via an appropriate access point.." (see [1] page 3)

[2] Schollmeier defines a Terminal Entity as "an element specified in the network protocol, as an endpoint in the network, at which several virtual and physical links converge. In Peer-to-Peer networks this Terminal entity may either be a peer, or a server". (see [1] page 4)

[3] http://www.nextpage.com

[4] lTranslated to English by the author

[5] Given a 18 character search string, the query size results in 83 Bytes/query including the TCP and IP headers

[6] The name goes back to Harvard professor Stanley Milgram who mailed 160 letters to randomly chosen people living in Omaha, Nebraska in 1967. He ask them to participate in his experiment, where they should try to pass the letter to a stockbroker in Boston, Massachusetts, using only intermediaries known to one another on first name bases. Surprisingly 42 made it through with only 5.5 intermediaries in average. This experiment became popularly known as the small-world effect. (see [18])

[7] In the reference denoted as Φ, but changed to be compliant with most other papers on the subject.

[8] The number of vertices to fully connect N nodes in a network is[illustration not visible in this excerpt]and for large N this isapproximately N2

Details

Seiten
78
Jahr
2002
ISBN (eBook)
9783638136327
Dateigröße
1.3 MB
Sprache
Englisch
Katalognummer
v5911
Institution / Hochschule
Technische Universität München – Elektrotechnics and Faculty for Communcations Networks
Note
1,3 (A)
Schlagworte
Peer-to-Peer P2P Networking Business Model Gnutella hybrid pure Napster Groove JXTA .NET

Autor

Teilen

Zurück

Titel: Scalability of the Gnutella Network and Business Opportunities of Peer-to-Peer Networking