Repository of Availability Traces
Maintained by Brighten Godfrey
This package includes, in a common file format, a distilled form of the data from a number of studies of machine availability and failure. The goal is to build a repository of availability data to make it easy for distributed systems researchers to obtain, use, and compare the data sets. Currently, the package includes traces of PlanetLab (Stribling 2005), web servers (Bakkaloglu et al 2002), corporate PCs (Bolosky et al 2000), DNS servers (Pang et al 2004), Skype superpeers (Guha et al 2006), and KAD peers (Steiner et al 2007).
Get more details: readme.pdf. The readme also includes a handy list of availability datasets beyond what is in this repository.
Downloads
Download the collection: availability-0.3.tar.bz2 (4 MB)
New! steiner-kad.avt.bz2 (29 MB), a trace of a 179-day crawl of over 400,000 peers on KAD by Moritz Steiner, Taoufik En-Najjary, and Ernst W. Biersack, converted to .avt format by Lluís Pàmies i Juárez. Since it's much larger, this file is not included in the main collection.
List of availability datasets
Here's a woefully incomplete catalog of availability datasets, or papers that used such datasets, in reverse chronological order.
Included in this repository
- Moritz Steiner, Taoufik En-Najjary, and Ernst W. Biersack. A global view of KAD. Proc. of Internet Measurement Conference (IMC), October 2007, San Diego, USA.
- Saikat Guha, Neil Daswani, and Ravi Jain. An Experimental Study of the Skype Peer-to-Peer VoIP System. In Proceedings of The 5th International Workshop on Peer-to-Peer Systems (IPTPS'06), Santa Barbara, CA, February 2006.
- Jeremy Stribling. Planetlab all pairs ping.
- Jeffrey Pang, James Hendricks, Aditya Akella, Bruce Maggs, Roberto De Prisco, and Srinivasan Seshan. Availability, usage, and deployment characteristics of the domain name system. In Proc. IMC, 2004.
- Mehmet Bakkaloglu, Jay J. Wylie, Chenxi Wang, and Gregory R. Ganger. On correlated failures in survivable storage systems. Technical Report CMU-CS-02-129, Carnegie Mellon University, May 2002.
- J. Bolosky, John R. Douceur, David Ely, and Marvin Theimer. Feasibility of a serverless distributed file system deployed on an existing set of desktop PCs. In Proc. SIGMETRICS, 2000.
Not included in this repository
- Stevens Le Blond, Fabrice Le Fessant, and Erwan Le Merrer. Finding good partners in availability-aware p2p networks. In International Symposium on Stabilization, Safety, and Security of Distributed Systems (SSS'09), Nov 2009. [Edonkey] Data set available at http://fabrice.lefessant.net/traces/edonkey2/.
- Bianca Schroeder, Garth Gibson. A Large-scale Study of Failures in High-performance-computing Systems. Proceedings of the International Conference on Dependable Systems and Networks (DSN2006), Philadelphia, PA, USA, June 25-28, 2006. See also project website.
- PlanetLab All Sites Ping: http://ping.ececs.uc.edu/ping/
- D. Stutzbach and Reza Rejaie. Characterizing Unstructured Overlay Topologies in Modern P2P File-Sharing Systems. In IMC 2005. [Gnutella]
- C. Chambers and W. Feng. Measurement-based Characterization of a Collection of On-line Games. In IMC 2005.
- J. A. Pouwelse, P. Garbacki, D. H. J. Epema, H. J. Sips. The Bittorrent P2P File-sharing System: Measurements and Analysis. In 4th International Workshop on Peer-to-Peer Systems (IPTPS’05), Feb 2005.
- K. P. Gummadi, R. J. Dunn, S. Saroiu, S. D. Gribble, H. M. Levy, and J. Zahorjan. Measurement, modeling, and analysis of a peer-to-peer file-sharing workload. In Proc. ACM SOSP, Oct. 2003. [Kazaa]
- R. Bhagwan, S. Savage, and G. Voelker. Understanding availability. In Proc. IPTPS, Feb. 2003. [Overnet]
- S. Sen and J. Wang. Analyzing peer-to-peer trafic across large networks. In Proc. of ACM SIGCOMM Internet Measurement Workshop, Nov. 2002. [FastTrack]
- J. Chu, K. Labonte, and B. N. Levine. Availability and locality measurements of peer-to-peer file systems. In Proc. of ITCom: Scalability and Traffic Control in IP Networks, July 2002. [Gnutella, Napster]
- Stefan Saroiu, P. Krishna Gummadi, and Steven D. Gribble. A Measurement Study of Peer-to-Peer File Sharing Systems. In Proc. MMCN, San Jose, CA, USA, January 2002. [Gnutella, Napster]
- RON Project Data: http://nms.csail.mit.edu/ron/data (This has only a few days of availability data.)
- D. Long, A. Muir, R. Golding. A longitudinal survey of Internet host reliability. 14th Symposium on Reliable Distributed Systems, 1995.
Related projects
- The Peer-to-Peer Trace Archive at TU Delft has a large collection of traces of P2P systems.