Thursday, August 13, 2009

Review: "Looking Up Data in P2P Systems" by H. Balakrishnan et. al.

The paper[1] introduces some of the efficient P2P algorithms that have been designed and still be developed to resolve the one the main challenges in P2P systems, the look up problem. According to the context, P2P system has been the target for research because of its structure and decentralized nature.

The recent algorithms pertaining to the lookup problem in a P2P system present a simple and general interface, a distributed hash table (DHT). From the term itself, the idea is that each node in the decentralized and distributed system maintains a hash table (key, value) pair which is used to track the location of some data items within the system.

The authors also mention some of the difficulties attached with the approaches used by some P2P applications in the spring of its implementation. Hierarchies, centralized database, and unbalanced nodes were some of the hitches in the system. The paper then introduces more stable algorithms that have been designed recently as both structured and symmetric and implement the DHT abstraction. What each algorithm makes them distinct from one another is the data structure that they uses as a routing table to provide O(log N) look ups. Chord uses a skiplist structure while Kademlia, Pastry, and Tapestry utilize a tree-like data structure. Viceroy maintains a butterfly data structure while allowing only a constant number of nodes for look up while being consistent with O(log N) lookup. Lastly, CAN uses a d-dimensional Cartesian coordinate space to implement the DHT abstraction.

The design of each approach depends upon the priorities of different issues: distance function, operation costs, fault tolerance and concurrent changes, proximity routing, malicious nodes, and indexing and keyword search.

Reference:
[1] H. Balakrishnan, F.M. Kaashoek, D. Karger, R. Morris and. I. Stoica, '"Looking up Data in P2P Systems", Communcations of the ACM, Vol. 46, No. 2. (February 2003), pp. 43-48.

REVIEW: “Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications” by Ion Stoica et. al.

The paper presents a distributed protocol to address content lookup problem on peer-to-peer applications. This involves finding any given data within a P2P system at the same time, adapting to scalability and independence from centralized servers or hierarchies.

Chord maps key-value pairs into a node. A data item can be traced via O(log N) messages to other nodes (ordered in an identifier circle modulo 2m) in a N-node system using unique identifier keys. One of the features of Chord is load-balancing between nodes (nodes receiving same number of keys) by implementing a variant of consistent hashing in assigning keys to nodes. Each node in the system also maintains a finger table which lists information about the keys assigned to it.

The paper also presents how Chord can cope well in a dynamic network where nodes join and depart any time by producing only short overheads. Stabilization processes are also implemented to correct inaccurate information on each node. Simulation results prove the mentioned capabilities of Chord and show how it provides robust correctness despite outdated information in some nodes.

The features of Chord make it a scalable, cost-effective, and fault-tolerant solution for P2P systems. This approach can be effectively applied to web caching where data are mapped to caches by consistent hashing functions and eventually finding the cache that contains the desired web data, using its URL as a key for the DHT.

However, a problem may exist in a file-sharing application. Note that Chord enables only exact key lookup which requires a file-sharing application to create a mapping from keywords to DHT keys. Furthermore, nodes need to have a copy of the file mapped to them rather than just submitting a list of files that they want to share directly from their hard drive.

Finally, an issue that Chord must check involves varying sizes of data items. This might disrupt the stability of the load-balanced nodes. In web caching for an instance, data can be a web text data or large videos, etc.

Reference:
[1] I. Stoica, R. Morris, D. Karger, M.F. Kaashoek, and H. Balakrishnan, "Chord: A scalable peer-to-peer lookup service for internet applications," Proc. ACM SIGCOMM 2001, pp.149–160, Aug. 2001.