Thursday, August 13, 2009

Review: "Looking Up Data in P2P Systems" by H. Balakrishnan et. al.

The paper[1] introduces some of the efficient P2P algorithms that have been designed and still be developed to resolve the one the main challenges in P2P systems, the look up problem. According to the context, P2P system has been the target for research because of its structure and decentralized nature.

The recent algorithms pertaining to the lookup problem in a P2P system present a simple and general interface, a distributed hash table (DHT). From the term itself, the idea is that each node in the decentralized and distributed system maintains a hash table (key, value) pair which is used to track the location of some data items within the system.

The authors also mention some of the difficulties attached with the approaches used by some P2P applications in the spring of its implementation. Hierarchies, centralized database, and unbalanced nodes were some of the hitches in the system. The paper then introduces more stable algorithms that have been designed recently as both structured and symmetric and implement the DHT abstraction. What each algorithm makes them distinct from one another is the data structure that they uses as a routing table to provide O(log N) look ups. Chord uses a skiplist structure while Kademlia, Pastry, and Tapestry utilize a tree-like data structure. Viceroy maintains a butterfly data structure while allowing only a constant number of nodes for look up while being consistent with O(log N) lookup. Lastly, CAN uses a d-dimensional Cartesian coordinate space to implement the DHT abstraction.

The design of each approach depends upon the priorities of different issues: distance function, operation costs, fault tolerance and concurrent changes, proximity routing, malicious nodes, and indexing and keyword search.

Reference:
[1] H. Balakrishnan, F.M. Kaashoek, D. Karger, R. Morris and. I. Stoica, '"Looking up Data in P2P Systems", Communcations of the ACM, Vol. 46, No. 2. (February 2003), pp. 43-48.

REVIEW: “Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications” by Ion Stoica et. al.

The paper presents a distributed protocol to address content lookup problem on peer-to-peer applications. This involves finding any given data within a P2P system at the same time, adapting to scalability and independence from centralized servers or hierarchies.

Chord maps key-value pairs into a node. A data item can be traced via O(log N) messages to other nodes (ordered in an identifier circle modulo 2m) in a N-node system using unique identifier keys. One of the features of Chord is load-balancing between nodes (nodes receiving same number of keys) by implementing a variant of consistent hashing in assigning keys to nodes. Each node in the system also maintains a finger table which lists information about the keys assigned to it.

The paper also presents how Chord can cope well in a dynamic network where nodes join and depart any time by producing only short overheads. Stabilization processes are also implemented to correct inaccurate information on each node. Simulation results prove the mentioned capabilities of Chord and show how it provides robust correctness despite outdated information in some nodes.

The features of Chord make it a scalable, cost-effective, and fault-tolerant solution for P2P systems. This approach can be effectively applied to web caching where data are mapped to caches by consistent hashing functions and eventually finding the cache that contains the desired web data, using its URL as a key for the DHT.

However, a problem may exist in a file-sharing application. Note that Chord enables only exact key lookup which requires a file-sharing application to create a mapping from keywords to DHT keys. Furthermore, nodes need to have a copy of the file mapped to them rather than just submitting a list of files that they want to share directly from their hard drive.

Finally, an issue that Chord must check involves varying sizes of data items. This might disrupt the stability of the load-balanced nodes. In web caching for an instance, data can be a web text data or large videos, etc.

Reference:
[1] I. Stoica, R. Morris, D. Karger, M.F. Kaashoek, and H. Balakrishnan, "Chord: A scalable peer-to-peer lookup service for internet applications," Proc. ACM SIGCOMM 2001, pp.149–160, Aug. 2001.

Thursday, July 2, 2009

Review: On Inferring Autonomous System Relationships in the Internet

Summary
Global connectivity of millions of end-hosts worldwide is possible through interdomain routing in the Internet coordinated by BGP. BGP allows each AS to set its own routing policies. Oftentimes, contractual commercial agreements between administrative domains poses constraint on these routing policies e.g. A customer must not provide transit services between its providers. Indeed, policies like this shows that AS relationships stand as important features in Internet architecture.

“On Inferring Autonomous System Relationships in the Internet” by Lixin Gao [1] proposes an augmented graph representation that infers AS relationships classified as customer-provider, peering or sibling relationships. The author also presents heuristic algorithms for inferring these relationships. But first, the concept of an annotated AS graph and the theorems regarding the selective export rule (and the routing table entry patterns derived from it) are formally discussed. The author presents algorithms for the mentioned AS relationships based on the patterns. For inferring the first two relationships, the author constructed two algorithms: Basic and Refined. The difference of the two is that the first assumes that all BGP routers are configured correctly while the last perceives possible misconfigured BGP routers. For inferring peering relationships, the author presents the Final algorithm that simply checks if an AS pair does not transit traffic for each other. To test the accuracy of these algorithms, an experimental study of AS relationships in the Internet was done using BGP routing tables retrieved from the Route Views server in Oregon. Based from the results, more than 90.5% of the AS pairs have customer-provider relationships, less than 1.5% of the AS pairs have sibling relationships and less than 8% of the AS pairs have peering relationships. AT&T internal information checked the inference results and confirmed 100% of the inferred customers and peers and 20% of the inferred siblings. As a whole, 98.9% of all the inference results are confirmed by AT&T internal information. The inferred sibling results are also verified from WHOIS lookup service and more than half are confirmed. At the end of the paper, the author discussed the possible causes of unconfirmed siblings and these are: router configuration typo, misconfiguration of small ISPs, unusual AS relationships, and inaccuracy of the heuristic.


Evaluation
I think that this is a good study that must be continued rather updated since the Internet has experienced a huge increase in its size and complexity since it was commercialized and since the date that this paper was published. An example of updating would be to check the constants used for each algorithms like the constant R indicated in the Final Algorithm. Would it still be applicable today? Would it produce better results if it would increase its value?

A further study that must follow this research could be looking for solutions on the possible causes of unconfirmed cases. With the current technology, these misconfigurations may be lessened by improving the physical devices used for interdomain routing which could in turn improve also inferring AS relationships.


Reference:
[1] Lixin Gao, On inferring autonomous system relationships in the internet, IEEE/ACM Transactions on Networking (TON), v.9 n.6, p.733-745, December 2001

Lecture on Interdomain Internet Routing

Summary
The lecture entitled “Interdomain Internet Routing”, from the name itself explains how routing between different administrative domains works in the Internet. An important issue in the current Internet routing layer is tackled initially which pertains to the difficulty of global connectivity of various end-hosts in the Internet. This is due to unequal creation of Internet service providers (ISPs) which are categorized according to the size of their network. Meanwhile, a true picture of the Internet routing is introduced. The existing Internet routing infrastructure is divided into autonomous systems (ASes) that exchange reachability information using the current wide-area routing protocol called BGP (Border Gateway Protocol).

The next part discusses two prevalent forms of relationships happening between different ASes namely transit and peering. The former which involves a financial settlement, requires an ISP to provide access to all (or most) destinations in its routing tables. The latter allows two ASes to provide mutual access to a subset of each others' routing tables and may not involve any financial transactions. ASes involved in either transit or peering needs to construct export policies (rules on what routes to export) and import policies (rules on what routes to install) so as not to lose revenue.

The third section is dedicated to BGP which includes the protocol's design goals, its attributes, and how it disseminates routes within an AS. The current structure of BGP was influenced by these 3 factors: scalability (the Internet routing infrastructure must be scalable as the size of the network increases); policy (AS must enforce various routing policies); and cooperation under competitive circumstances. A BGP consists of several attributes such as NEXT HOP (IP address of the next-hop router), ASPATH (sequence of AS identifiers that the route has traversed), LOCAL PREF (first attribute considered in selecting routes) , and MED (multiple-exit descriminator). Among these, the last three are the most important. Disseminating routes within an AS are done using two types of BSP sessions: eBGP are between BGP in different ASes while iBGP are between BGP in the same AS. The dissemination is done with care using two important goals: loop-free forwarding and complete visibility.

Lastly, failures over BGPs are discussed, including its causes and solutions that has already been made. Moreover, scalability is further explained with the discussion of multi-homing.

Evaluation
This lecture provides a good background on the technicalities behind interdomain Internet routing. The authors have done a great job of including a detailed discussion of BGP’s structure and the goals that served as bases for the protocol’s creation.

Open issues regarding interdomain Internet routing have also been recently identified through surveys and studies done by a group of researchers [1]. Some of these include slow convergence and chattiness of BGP, scalability problems due to multi-homing which is discussed in the paper, security issues, etc. They searched for the possible causes of these issues and if there are any studies done to resolve these issues.

Reference:
[1] M. Yannuzzi, X. Masip-Bruin, and O. Bonaventure, Open Issues in Interdomain Routing: A Survey, IEEE Network, November/December 2005.

Wednesday, June 24, 2009

Review: The Design Philosophy of the DARPA Internet Protocols


The paper, “The Design Philosophy of the DARPA Internet Protocols” by David D. Clark [1] presents the reasons, perhaps the goals which explain the Internet's current architecture. The fundamental goal of the Internet base on the paper is: “...to develop an effective technique for multiplexed utilization of existing interconnected networks.” For multiplexing to be made possible, packet switching technique is used, a fundamental component of the Internet architecture.

Moreover, a list of second level goals is also established as a guide for Internet's architecture and each item is discussed in detail. The list consists of 7 additional goals mentioned in order of importance:

    1. Internet communication must continue despite loss of networks or gateways.

    2. The Internet must support multiple types of communications service.

    3. The Internet architecture must accommodate a variety of networks.

    4. The Internet architecture must permit distributed management of its resources.

    5.The Internet architecture must be cost effective.

    6. The Internet architecture must permit host attachment with a low level of effort.

    7. The resources used in the Internet architecture must be accountable.


The first second level goal is achieved using an approach called “fate-sharing” in which only the end-hosts maintain the state information about their connection. The top level goals (first 3) resulted in the development and use of datagrams as transporters across the network while the remaining low level goals are taken into minimum account. The paper concludes by upholding the success of the current Internet architecture because of its wide range of military and commercial utilization. Also, it suggests further development of the current datagram to achieve the low level goals (resource management and accountability) as well.


The author of the paper clearly stated the reasons why the Internet was designed in that certain way. He also did a great job of introducing a protocol feature and expounding it for each goal such as packet switching and the use of a datagram as a building block. I especially commend him for suggesting that a better building block might be suited for the next generation of architecture. He is aware of the fact that indeed datagrams came from the requirements of the DARPA project.


Today, we must look at the usage of the Internet very differently from the date that it was constructed. An example would be the increasing multimedia traffic and mobility services. We must again question Internet's design and look for what best suits the needs of today's network community.


Reference:

[1] David D. Clark, "The Design Philosopy of of the DARPA Internet Protocols." ACM SIGCOMM Computer Communications Review, vol.18, issue 4, August 1998.

Review: End-to-End Arguments in System Design


The paper entitled “End-to-End Arguments in System Design” by J.H. Salter, D.P. Reed and D.D. Clark[1] proposes a design principle on how to properly place functions on the modules of a distributed computer system. Following the end-to-end principle and the emergence of data communication network as a component of a computer system, the paper argues that functions usually implemented at the low level modules can instead be implemented on high level modules to avoid redundancy and improve system performance. Specifically, functions referring to communication protocol operations must be defined to occur at the end points of a communication system. The paper also discusses certain examples (applications) to site advantages of using the end-to-end design. But of course, there are cases when functions can only be implemented on the low level especially when issues of efficiency and performance arise. So, the end-to-end arguments must not be treated as an absolute but as a guide on protocol design analysis. Finally, it stresses the importance of identifying the end points to which the said arguments must be applied.


The end-to-end principle is one of the central design principles which guided the design of the Internet. From the date of this design proposal which was in 1981 until the next 20 years, Internet's design was shaped by the end-to-end arguments. But due to the growth of Internet's commercial use and other specific purposes, there's an emerging risk of rethinking the Internet's original design principles. M.S. Blumenthal and D.D. Clark (one of the authors of the end-to-end arguments) [2], list some of the trends which can influence the requirements for the Internet. Among those mentioned are:

"..the growth of Internet Service Providers, government interests, changing motivations of the growing user base; and the tension between the demand for trustworthy overall operation and the inability to trust the behavior of individual users."


They conclude that risks for innovation is inevitable. At the end, they argue that the flexibility, openness and generality in the Internet caused by using the end-to-end arguments as a design principle must be preserved in any circumstances that might take place.

In this context, the importance of taking the end-to-end argument into consideration in communication protocol design were clearly stated. Examples are given to provide a more perceivable view of common functions often misplaced on a communication subsystem. Basing the design of the Internet on a simple concept allowed fast accommodation of new applications thus, fostering innovation. But how ever good the design principle is, it still has drawbacks along with it, as mentioned in the context. It is notable the authors do not always advocate that the end-to-end approach is the best since there maybe cases when functions implemented at low level modules is beneficial.

References:

[1] J.H. Saltzer,D.P. Reed and D.D. Clark. "End-to-End Arguments in System Design." ACM Transactions on Computer Systems, Vol. 2, No. 4, pp. 277-288, November 1984.

[2] D. Clark and M. S. Blumenthal: “Rethinking the design of the Internet: The end to end arguments vs. the brave new world”; Workshop on The Policy Implications of End-to-End, Dec. 2000