Archive-name: os-research/part2 Version: $Revision: 1.22 $ Posting-Frequency: monthly Last-Modified: Tue Aug 13 21:03:28 1996 URL: http://www.serpentine.com/~bos/os-faq/ Answers to frequently asked questions for comp.os.research: part 2 of 3 Copyright (C) 1993--1996 Bryan O'Sullivan TABLE OF CONTENTS 1. Available software 1.1. Where can I find Unix process checkpointing and restoration packages? 1.2. What threads packages are available for me to use? 1.3. Can I use distributed shared memory on my Unix system? 1.4. Where can I find operating systems distributions? 1.4.1. Distributed systems and microkernels 1.4.2. Unix lookalikes 1.4.3. Others 2. Performance and workload studies 2.1. TCP internetwork traffic characteristics 2.2. File system traces 2.3. Modern Unix file and block sizes 2.3.1. File sizes 2.3.2. Block sizes 2.3.3. Inode ratios 3. Papers, reports, and bibliographies 3.1. From where are papers for distributed systems available? 3.2. Where can I find other papers? 3.3. Where can I find bibliographies? 4. General Internet-accessible resources 4.1. Wide Area Information Service (WAIS) and World-Wide Web (WWW) servers 4.2. Refdbms---a distributed bibliographic database system 4.3. Willow -- the information looker-upper 4.4. Computer science bibliographies and technical reports 4.5. The comp.os.research archive 4.6. Miscellaneous resources 5. Disclaimer and copyright ------------------------------ Subject: [1] Available software From: Available software This section covers various software packages, operating systems distributions, and miscellaneous other such items which may be of interest to the operating systems research community. If you have written, or know of, some software which you believe would be of fairly wide interest, please get in touch with the FAQ maintainer with a view to having a short spiel and availability information included here. ------------------------------ Subject: [1.1] Where can I find Unix process checkpointing and restoration packages? From: Available software - [93-01-21-10-18.30] The Condor system is available via anonymous ftp from <URL:ftp://ftp.cs.wisc.edu>. Condor works entirely at user level [no kernel modifications required] but doesn't currently support interprocess communication, signals, or fork(). Definitely worth a look. - Bennet S Yee implemented a `mostly portable' checkpoint and restore package back around 1987. When the programmer invokes the checkpoint procedure, it saves the state to a file; when a second process with the same program (but with different arguments) is started which calls the restore procedure, it reads the old state from the file. Available via anonymous ftp from <URL:ftp://play.trust.cs.cmu.edu/usr/bsy/pub/save_world.shar.Z>. This package is known to work for Pmaxen, Sun4's, Sun3's, IBM RTs, and VAXen. Porting it to a new architecture should be relatively simple -- look at the README file. ------------------------------ Subject: [1.2] What threads packages are available for me to use? From: Available software Now that POSIX has arrived at a standard threads interface, it is expected that all major Unix vendors will soon release conformant threads packages. Currently, vendor-supplied threads packages vary widely in the interfaces they provide. Some vendors' packages conform to various drafts of the POSIX standard, while others provide their own interfaces. OS/2, Windows NT and Windows 95 all provide threads interfaces. None conforms to the POSIX standard, and neither IBM nor Microsoft has signalled any intention to provide conformant threads interfaces. - Michael T. Peterson <mtp@big.aa.net> has written a POSIX and DCE threads package, called PCthreads, for Intel-based Linux systems. See <URL:http://www.aa.net/~mtp/PCthreads.html> for more information. - Christopher Provenzano <proven@mit.edu> has written a portable implementation of draft 8 of the IEEE Pthreads standard. See <URL:http://www.mit.edu:8001/people/proven/pthreads.html> for further details, or fetch the software itself from <URL:ftp://sipb.mit.edu/pub/pthreads>. Currently supported are i386/i486/Pentium processors running NetBSD 1.0, FreeBSD 1.1, Linux 1.0, and BSDi 1.1; DECstations running Ultrix-4.2; SPARCstations running SunOS 4.1.3; and HP/PA machines running HP/UX-9.03. As far as I can see, development of this library has halted (at least temporarily), and it still contains many serious bugs. - Georgia Tech's OS group has a fairly portable user-level threads implementation of the Mach Cthreads package. It is called Cthreads, and can be found at <URL:ftp://ftp.cc.gatech.edu/pub/groups/systems/Falcon/cthreads_distribution.tar.gz>. It also contains the Falcon integrated monitoring system. It currently runs under SunOS 4.1.X, Irix 4.0.5, Irix 5.3, AIX 3.2.5, Linux 1.0 and higher, and KSR1 and KSR2. It is a fairly easy to port to other architectures. Current ports in progress are Solaris 2.4 and AIX 4.X. - The POSIX / Ada-Runtime Project (PART) has made available an implementation of draft 6 of the POSIX 1003.4a Pthreads specification, which runs under SunOS 4.x; the current release is version 1.20. Available using anonymous ftp from <URL:ftp://ftp.cs.fsu.edu/pub/PART>. - Elan Feingold has written a threads package called ethreads; I don't know anything about it, other than that it is available from <URL:ftp://frmap711.mathp7.jussieu.fr/pub/scratch/rideau/misc/threads/ethreads/ethreads.tgz>. - Stephen Crane has written a `fairly portable' threads package, which runs under Sun 3, Sun 4, MIPS/RISCos, Linux, and 386BSD. It is available via anonymous ftp from <URL:ftp://dse.doc.ic.ac.uk/rex/lwp.tar.gz>, with documentation in the same directory named lwp.ps.gz. - QuickThreads is a toolkit for building threads packages, written by David Keppel. It is available via anonymous ftp from <URL:ftp://ftp.cs.washington.edu/pub/qt-001.tar.Z>, with an accompanying tech report at <URL:ftp://ftp.cs.washington.edu/tr/1993/05/UW-CSE-93-05-06.PS.Z>. The code as distributed includes ports for the Alpha, x86, 88000, MIPS, SPARC, VAX, and KSR1. - On CONVEX SPP Exemplar machines there is a Compiler Parallel Support Library (CPSlib), a library of thread management and synchronisation routines. CPSlib is not compatible with anything else, but the interface is sufficiently similar to the Solaris threads or pthreads interface to allow straight porting. One special feature of CPSlib is the (possible) distiction between "symmetric" and "asymmetric" parallelism. A small number of vendors provide DCE threads packages for various Unix systems. ------------------------------ Subject: [1.3] Can I use distributed shared memory on my Unix system? From: Available software - CRL is a simple all-software distributed shared memory system intended for use on message-passing multicomputers and distributed systems. CRL 1.0 can be compiled for use on the MIT Alewife Machine, Thinking Machine's CM-5, and networks of Sun workstations running SunOS 4.1.3 communicating with one another using TCP and PVM. Because CRL requires no functionality from the underlying hardware, compiler, or operating system beyond that necessary to send and receive messages, porting CRL to other platforms should prove to be straightforward. General information about CRL can be found at <URL:http://www.pdos.lcs.mit.edu/crl>. The CRL 1.0 source distribution (sources for CRL 1.0 and several applications, user documentation, and a postscript version of a paper about CRL to appear in this SOSP later this year) is available at <URL:http://www.pdos.lcs.mit.edu/crl/source.html>. - Ron Minnich <rminnich@earth.sarnoff.com> has implemented a distributed shared memory system called MNFS, which is a modified version of NFS and runs alongside NFS in the kernel. Performance is good; page faults under FreeBSD 2.0R run at about the same speed as NFS (~5.9 milliseconds per page). If you need to update a page from one host to many clients, it can be done at a cost of 1.2 milliseconds or so per client. This scales: networks of 128 nodes running MNFS have been set up, and times should improve over faster LANs than Ethernet. The MNFS programming model uses mmap'ed files. Programs map files in and then use them as ordinary memory. Cache consistency of a page is maintained by the MNFS servers, ensuring that there is only one writeable copy in the network at a time. The model is not strongly coherent; read-only copies of a page are only refreshed by an explicit action on the part of the holder of a writeable page (using msync). For those who don't like this style of programming, a parallel C compiler has been retargeted to use MNFS on clusters and networks of computers running Condor. Both performance and scalability matched explicitly mmap-coded systems. The system has been implemented on Sunos 4.1.x, Solaris 2.2 and 2.3, IRIX 5.2 and 5.3, and AIX 3.2. All of these were legally encumbered, so the FreeBSD version is currently the only freely-available implementation. MNFS is available from <URL:ftp:ftp.sarnoff.com/pub/mnfs>, and may be installed either as a set of diffs to the FreeBSD 2.0.5R kernel, or installed in-place. Also included in this directory is a slightly out-of-date paper on MNFS, and a more current manual. A Linux port of MNFS is in the works. ------------------------------ Subject: [1.4] Where can I find operating systems distributions? From: Available software This section covers the availability of several well-known systems; the only criterion for inclusion of a system here is that it be of interest to some segment of the OS research community (commercial systems will be accepted for inclusion, so long as they are pertinent to research). ------------------------------ Subject: [1.4.1] Distributed systems and microkernels From: Available software See part one of the FAQ for further information on some of the systems listed below. - [93-03-31-22-49.53] ACE is the distribution, support and sales channel for Amoeba. `Due to overwhelming response from non-profit organisations wishing to obtain Amoeba for their research activities', VU is offering Amoeba 5.2 to research institutions for more or less free (via ftp at no charge, or on tape for $500 on Exabyte or $800 on QIC-24). Amoeba currently supports 68020 and 68030-based VME board machines, as well at i386- and i486-based AT PCs and Sun 3 and 4 machines. For further information on `commercial' Amoeba, you can contact ACE by email at <amoeba@ace.nl>, by phone at +31 20 664 6416, or by fax at +31 20 675 0389. Universities interested in obtaining a license should send mail to <amoeba-license@cs.vu.nl>, or fax to +31 20 642 7705. - Chorus Systemes has special programmes for universities interested in using Chorus. For more information on the offerings available, conditions, and other details, get the following files: - <URL:ftp://ftp.chorus.fr/pub/README> - <URL:ftp://ftp.chorus.fr/pub/academic/README> - <URL:ftp://ftp.chorus.fr/pub/academic/offerings> - The Cronus object-oriented distributed system may be obtained via ftp from <URL:ftp://pineapple.bbn.com>; email <cronus-help@bbn.com> for details of the account name and password. Before attempting to get the Cronus distribution, you must obtain, via anonymous ftp, <URL:ftp://pineapple.bbn.com/Cronus-via-FTP-Terms>. Maintenance, hotline support, and training for Cronus are available from BBN. Send email to the above address for information on these, or on obtaining a commercial license. - Flux is a Mach-based toolkit for developing operating systems; you can find more information about it on the Web at <URL:http://www.cs.utah.edu/projects/flux>. - Horus is available for research use; contact Ken Birman <ken@cs.cornell.edu> or Robbert van Renesse <rvr@cs.cornell.edu> for details. - Isis has not been publicly available since 1989, but may (I'm not sure) still be obtained using anonymous ftp from <URL:ftp://ftp.uu.net> or <URL:ftp://ftp.cs.cornell.edu>. After 1989, the code was picked up by Isis Distributed Systems, which has subsequently developed and supported it. The commercial version of Isis (available `at very low cost' to academic institutions) is available from the company. Email <info@isis.com> for information, or call +1-212-979-7729 or +1-607-272-6327. - Information on obtaining the latest Mach 4 distribution is available from the University of Utah's Mach 4 pages, at <URL:http://www.cs.utah.edu/projects/flux/mach4/html/Mach4-proj.html>. - The Plan 9 distribution is now commercially available for $350; it consists of a two-volume manual, a CD-ROM with all the sources, and four PC diskettes comprising a binary-only installation of a fairly complete version of the system that runs on a PC. For more information, <URL:http://plan9.att.com/plan9/index.html>; this site houses ordering information, a browsable copy of all the documentation, and the PC binary distribution. Kernels exist for the Sun SLC, Sun4Cs of various types, NeXTstations, MIPS Magnum 3000, SGI 4D series, AT&T Safari, `a whole bunch of' PCs, and the Gnot. Sydney University Basser Department of Computer Science has a port of Plan 9 underway to the DEC Alpha at the moment. A port to the Sun 3 has been completed. Contact <plan9info@cs.su.oz.au> for details. The Plan 9 user mailing list may be subscribed to by sending mail to <9fans-request@cse.psu.edu>. - QNX is available for academic applications through an education support programme run by QNX Software Systems, whereby QNX systems can be obtained for educational purposes at very low cost. For commercial and education availability and pricing, contact: QNX Software Systems QNX Software Systems 175 Terrence Matthews Cr. Westendstr. 19 Kanata, Ontario K2M 1W8 6000 Frankfurt am Main 1 Canada Germany 1 800 363 9001 +49 69 9754 6156 x299 +1 (613) 591 0931 +1 (613) 591 3579 (fax) +49 69 9754 6110 (fax) Versions after 4.2 of QNX run on the i386 and later processors, with a 16-bit kernel included for i286 machines. Native optimisations and a compiler for the Pentium are also included. Further marketing information can be obtained on the World Wide Web from <URL:http://www.qnx.com>. - The 1.1 Research Distribution of the Spring distributed object oriented operating system is available. Spring is a highly modular, object-oriented operating system, which is focused around a uniform interface definition language (IDL). The system is intrinsically distributed, with all system interfaces being accessible both locally and remotely. The 1.1 Research Distribution adds a number of fixes and improvements, including a Spring-Java IDL system that facilitates writing Java applets that can talk across Spring IDL interfaces. The Spring SRD 1.1 Binary CDROM is $75 to Universities and $750 to commercial research institutions. This includes all of the software and documentation necessary for installing, running, and developing new system modules and applications in Spring. All binaries, IDL files, development tools, key exemplary sources, and course teaching materials are included. A standard full source license and source CDROM is also available for $100 to Universities and $1000 to commercial research institutions. For more details and ordering information, see <URL:http://www.sun.com/tech/projects/spring>. - [93-02-07-16-03.48] The Sprite Network Operating System is available on CD-ROM. The disc contains the source code and documentation for Sprite, a research operating system developed at the University of California, Berkeley. All the research papers from the Sprite project are also included on the disc. This software on this disc is primarily intended for research purposes, and is not really intended to be used as a production system. Boot images are provided for Sun SPARCstations and DECstations. The CD-ROM is in ISO-9660 format with Rock Ridge extensions. The disc contains about 550 megabytes of software. You can get an overview of the Sprite Project, and a complete list of what is on this disc, by anonymous ftp from <URL:ftp://cdrom.com/pub/cdroms/sprite>. If you would like a CD-ROM please send $25. Add $4.95 if you would like a caddy too. S&H is $5 (per order, not per disc) for US/Can/Mex, and $10 for overseas. If you live in California, please add sales tax. You can send a check or money order, or you can order with Mastercard/Visa/AmEx. Bob Bruce <rab@cdrom.com> Walnut Creek CDROM 1547 Palos Verdes Mall, Suite 260 Walnut Creek, CA 94596 United States 1 800 786-9907 (USA only) +1 510 947-5996 +1 510 947-1644 (fax) - VSTa is a copylefted system written by Andrew Valencia <vandys@cisco.com> which uses ideas from several research operating systems in its implementation. It is currently in an `experimental but usable' state, and supports `lots of' POSIX, and runs on a number of different PC configurations. For further information, send mail to <vsta-request@cisco.com>, or ftp to <URL:ftp://ftp.cygnus.com/pub/embedded/vsta>. [Chorus, Clouds?, Choices?] ------------------------------ Subject: [1.4.2] Unix lookalikes From: Available software - FreeBSD is available via ftp from <URL:ftp://ftp.freebsd.org/pub/FreeBSD>, <URL:ftp://ftp.cosy.sbg.ac.at/pub/mirror/FreeBSD>, and <URL:ftp://pdq.coe.montana.edu/pub/mirrors/unix/freebsd>. The latest version is derived from 4.4BSD Lite, and contains many extensions. See <URL:http://www.freebsd.org> for further information. - NetBSD is available via ftp from <URL:ftp://ftp.netbsd.org/pub/NetBSD>, and is also derived from 4.4BSD Lite. See <URL:http://www.netbsd.org> for more information. - Linux is available via anonymous ftp from <URL:ftp://tsx-11.mit.edu/pub/linux>, <URL:ftp://ftp.funet.fi/pub/OS/Linux>, and <URL:ftp://sunsite.unc.edu/pub/Linux>. It is a freely-distributable System V compatible Unix, and is covered by the GNU General Public License. Linux runs almost all PCs with i386 or better CPUs and at least 4 megabytes of memory. See <URL:http://www.linux.org> for further details. - 386BSD is available via ftp from <URL:ftp://agate.berkeley.edu/pub/386BSD> or <URL:ftp://ftp.uu.net:systems/unix/386BSD>. It lies mid-way between 4.3BSD Reno and 4.4BSD internally, and contains no AT&T-copyrighted code. 386BSD runs on ISA bus PCs with i386 or better CPUs. Use of 386BSD is not recommended, since it is unstable and has long since been superseded by FreeBSD and NetBSD. - The Hurd is the GNU operating system, being written by Michael Bushnell. It is based on Mach 3.0, and should be available on most systems to which Mach has been ported. A preliminary runnable image may be fetched from <URL:ftp://alpha.gnu.ai.mit.edu/gnu/hurd-snap.tar.gz>. Trent A. Fisher <trent@gnurd.uu.pdx.edu> runs an unofficial Hurd page at <URL:http://www.cs.pdx.edu/~trent/gnu/hurd.html>. - Lites is a free 4.4BSD-based Unix server which runs on top of Mach. Lites provides binary compatibility with 4.4 BSD. NetBSD (0.8, 0.9, and 1.0), FreeBSD (1.1.5 and 2.0), 386BSD, UX (4.3BSD) and Linux on the i386 platform. It has also been ported to the pc532, and PA-RISC. Preliminary ports to the R3000 and Alpha processors have also been made. For more information, see the Lites home page at <URL:http://www.cs.hut.fi/lites.html>, and see also <URL:http://www.cs.utah.edu/projects/flux/lites/html>. ------------------------------ Subject: [1.4.3] Others From: Available software [93-03-18-10-19.02] Microsoft is making sources of Windows NT available under license to universities and research laboratories. You should have the appropriate officials contact <ntsrcreq@microsoft.com> to get started on this process. Patrick Bridges' operating systems home page at <URL:http://www.cs.arizona.edu/people/bridges/oses.html> is an excellent source of information on a variety of other operating systems. ------------------------------ Subject: [2] Performance and workload studies From: Performance and workload studies This section covers various different publicly-available traces and studies, libraries and source distributions, which may be of use. ------------------------------ Subject: [2.1] TCP internetwork traffic characteristics From: Performance and workload studies - The Internet Traffic Archive is a moderated repository to support widespread access to traces of Internet network traffic. The traces can be used to study network dynamics, usage characteristics, and growth patterns, as well as providing the grist for trace-driven simulations. The archive is also open to programs for reducing raw trace data to more manageable forms, for generating synthetic traces, and for analyzing traces. The archive is available on the Web at <URL:http://town.hall.org/Archives/pub/ITA>. There you will find a description of the archive, its associated mailing lists, the moderation policy and submission guidelines, and the contents of the archive (traces and programs). - [92-10-20-15-04.39] Peter Danzig and Sugih Jamin of USC have made available a report and a source library which simulates realistic day-to-day network traffic between nodes. The library, tcplib, `is motivated by our observation that present-day wide-area tcp/ip traffic cannot be accurately modeled with simple analytical expressions, but instead requires a combination of detailed knowledge of the end-user applications responsible for the traffic and certain measured probability distributions'. The technical report and the source library it describes are available via anonymous ftp from <URL:ftp://jerico.usc.edu/pub/jamin/tcplib>. All you need to transfer to use the library are: README, brkdn_dist.h, tcpapps.h, tcplib.1, and one of libtcp* that matches your setup. You need tcplib.tar.Z only if you must generate the library yourself. The file tcplibtr.ps.Z is the PostScript version of the report. The authors may be contacted at <traffic@excalibur.usc.edu>. - [93-08-09-15-15.54] Vern Paxson of Lawrence Berkeley Laboratories has a report available via anonymous ftp which describes analytic models for wide-area TCP connections based upon a set of wide-area traffic traces. The report may be obtained from <URL:ftp://ftp.ee.lbl.gov/WAN-TCP-models.{1,2}.ps.Z>. - [93-05-13-10-54.09] Vern Paxson also has made available another report, <URL:ftp://ftp.ee.lbl.gov/WAN-TCP-growth-trends.ps.Z>, which provides an analysis of the growth trends of a medium-sized research laboratory's wide-area TCP connections over a period of more than two years. ------------------------------ Subject: [2.2] File system traces From: Performance and workload studies - Randy Appleton <randy@dcs.uky.edu> has a set of filesystem traces which detail every operation performed during a period of more than a week (several hundred thousand events). Timestamps on the traces are accurate to under a millisecond. For more details, contact the author, or visit <URL:http://www.dcs.uky.edu/~randy/Research/index.html>. - Chris Ruemmler has done a study on low-level disk access patterns for a workstation, a server, and a time-shared system which appeared in the Winter 1993 USENIX proceedings. A copy may be obtained via anonymous ftp from <URL:ftp://ftp.hpl.hp.com/wilkes/HPL-92-152.ps.Z>. - Stephen Russell <smr@cs.unsw.oz.au> has instrumented the SunOS 4.1.x kernel running on Sun 3 machines. The system allows time-stamped event records to be obtained from various points in the kernel. Events can be categorised (eg, paging, file system, etc), and are read via pseudo-devices. Ioctl calls allow substreams to be enabled/disabled, buffer status checked, etc. An external high resolution timer is used for timestamping. - [93-05-09-09-23.32] The traces used in `Measurements of a distributed file system' (SOSP 1991) may be obtained from <URL:http://now.cs.berkeley.edu/Xfs/SpriteTraces>. ------------------------------ Subject: [2.3] Modern Unix file and block sizes From: Performance and workload studies The following sections are lifted more or less verbatim from a number of traces which were co-ordinated and analysed by Gordon Irlam <gordoni@home.base.com>. The numbers quoted below are based on Unix file size data for 12 million files, residing on 1000 file systems, with a total size of 250 gigabytes. Further information may be obtained on the World Wide Web at <URL:http://www.base.com/gordoni/ufs93.html>. ------------------------------ Subject: [2.3.1] File sizes From: Performance and workload studies There is no such thing as an average file system. Some file systems have lots of little files. Others have a few big files. However as a mental model the notion of an average file system is invaluable. The following table gives a break down of file sizes and the amount of space they consume. file size #files %files %files disk space %space %space (max. bytes) cumm. (Mb) cumm. 0 147479 1.2 1.2 0.0 0.0 0.0 1 3288 0.0 1.2 0.0 0.0 0.0 2 5740 0.0 1.3 0.0 0.0 0.0 4 10234 0.1 1.4 0.0 0.0 0.0 8 21217 0.2 1.5 0.1 0.0 0.0 16 67144 0.6 2.1 0.9 0.0 0.0 32 231970 1.9 4.0 5.8 0.0 0.0 64 282079 2.3 6.3 14.3 0.0 0.0 128 278731 2.3 8.6 26.1 0.0 0.0 256 512897 4.2 12.9 95.1 0.0 0.1 512 1284617 10.6 23.5 566.7 0.2 0.3 1024 1808526 14.9 38.4 1442.8 0.6 0.8 2048 2397908 19.8 58.1 3554.1 1.4 2.2 4096 1717869 14.2 72.3 4966.8 1.9 4.1 8192 1144688 9.4 81.7 6646.6 2.6 6.7 16384 865126 7.1 88.9 10114.5 3.9 10.6 32768 574651 4.7 93.6 13420.4 5.2 15.8 65536 348280 2.9 96.5 16162.6 6.2 22.0 131072 194864 1.6 98.1 18079.7 7.0 29.0 262144 112967 0.9 99.0 21055.8 8.1 37.1 524288 58644 0.5 99.5 21523.9 8.3 45.4 1048576 32286 0.3 99.8 23652.5 9.1 54.5 2097152 16140 0.1 99.9 23230.4 9.0 63.5 4194304 7221 0.1 100.0 20850.3 8.0 71.5 8388608 2475 0.0 100.0 14042.0 5.4 77.0 16777216 991 0.0 100.0 11378.8 4.4 81.3 33554432 479 0.0 100.0 11456.1 4.4 85.8 67108864 258 0.0 100.0 12555.9 4.8 90.6 134217728 61 0.0 100.0 5633.3 2.2 92.8 268435456 29 0.0 100.0 5649.2 2.2 95.0 536870912 12 0.0 100.0 4419.1 1.7 96.7 1073741824 7 0.0 100.0 5004.5 1.9 98.6 2147483647 3 0.0 100.0 3620.8 1.4 100.0 A number of observations can be made: - the distribution is heavily skewed towards small files - but it has a very long tail - the average file size is 22k - pick a file at random: it is probably smaller than 2k - pick a byte at random: it is probably in a file larger than 512k - 89% of files take up 11% of the disk space - 11% of files take up 89% of the disk space Such a heavily skewed distribution of file sizes suggests that, if one were to design a file system from scratch, it might make sense to employ radically different strategies for small and large files. The seductive power of mathematics allows us treat a 200 byte and a 2MB file in the same way. But do we really want to? Are there any problems in engineering where the same techniques would be used in handling physical objects that span 6 orders of magnitude? A quote from sci.physics that has stuck with me: `When things change by 2 orders of magnitude, you are actually dealing with fundamentally different problems'. People I trust say they would have expected the tail of the above distribution to have been even longer. There are at least some files in the 1-2G range. They point out that DBMS shops with really large files might have been less inclined to respond to a survey like this than some other sites. This would bias the disk space figures, but it would have no appreciable effect on file counts. The results gathered would still be valuable because many static disk layout issues are determined by the distribution of small files and are largely independent of the potential existence of massive files. (It should be noted that many popular DBMSs, such as Oracle, Sybase, and Informix, use raw disk partitions instead of Unix file systems for storing data, hence the difficulty in gathering data about them in a uniform way.) ------------------------------ Subject: [2.3.2] Block sizes From: Performance and workload studies The last block of a file is normally only partially occupied, and so as block sizes are increased so too will the the amount of wasted disk space. The following historical values for the design of the BSD FFS are given in `Design and implementation of the 4.3BSD Unix operating system': fragment size overhead (bytes) (%) 512 4.2 1024 9.1 2048 19.7 4096 42.9 Files have clearly gotten larger since then; I obtained the following results: fragment size overhead (bytes) (%) 128 0.3 256 0.6 512 1.1 1024 2.5 2048 5.4 4096 12.3 8192 27.8 16384 61.2 By default the BSD FFS typically uses a 1k fragment size. Perhaps this size is no longer optimal and should be increased. (The FFS block size is constrained to be no more than 8 times the fragment size. Clustering is a good way to improve throughput for FFS based file systems, but it doesn't do very much to reduce the not insignificant FFS computational overhead.) It is interesting to note that even though most files are less than 2K in size, having a 2K block size wastes very little space, because disk space consumption is so totally dominated by large files. ------------------------------ Subject: [2.3.3] Inode ratios From: Performance and workload studies The BSD FFS statically allocates inodes. By default one inode is allocated for every 2K of disk space. Since an inode consumes 128 bytes this means that by default 6.25% of disk space is consumed by inodes. It is important not to run out of inodes since any remaining disk space is then effectively wasted. Despite this allocating 1 inode for every 2K is excessive. For each file system studied I worked out the minimum sized disk it could be placed on. Most disks needed to be only marginally larger than the size of their files, but a few disks, having much smaller files than average, needed a much larger disk---a small disk had insufficient inodes. bytes per overhead inode (%) 1024 12.5 2048 6.3 3072 4.5 4096 4.2 5120 4.4 6144 4.9 7168 5.5 8192 6.3 9216 7.2 10240 8.3 11264 9.5 12288 10.9 13312 12.7 14336 14.6 15360 16.7 16384 19.1 17408 21.7 18432 24.4 19456 27.4 20480 30.5 Clearly, the current default of one inode for every 2K of data is too small. Earlier results suggested that allocating one inode for every 5-6k was in some sense optimal, and allocating one inode for every 8k would only be 0.4% worse. The new data suggests one inode for every 4k is optimal, and allocating one inode for every 8k would be 2.1% worse. The analysis technique I used is very sensitive to even a few file systems with very small files. The main source of file systems with lots of small files would appear to be netnews servers. The typical Usenet message would appear to be 1-2k in length. Ignoring such file systems would drastically alter the conclusions I reach. If, as I believe might already be the case, news servers are manually tuned to have a lower than normal bytes per inode ratio, it would then be possible to justify setting the default ratio much higher. Clearly it is best if the file system dynamically allocate inodes; I believe AIX does this for instance. Systems that statically allocate inodes should probably increase the bytes per inode ratio, but it is not clear to exactly what value. The engineer in me says `it is important to play this one conservatively: stick to 6k', the artist goes `as Chris Torek says: aesthetics, 8k'. ------------------------------ Subject: [3] Papers, reports, and bibliographies From: Papers, reports, and bibliographies Network-available documents are listed in this section. I'd like to see information for obtaining other sets of reports which aren't electronically-available included here as well, at some stage. ------------------------------ Subject: [3.1] From where are papers for distributed systems available? From: Papers, reports, and bibliographies Amoeba <URL:ftp://ftp.cs.vu.nl/amoeba> <URL:http://www.cs.vu.nl/vakgroepen/cs/amoeba.html> <URL:ftp://ftp.cse.ucsc.edu/pub/amoeba> Arjuna <URL:ftp://arjuna.ncl.ac.uk/pub/Arjuna> Choices <URL:ftp://choices.cs.uiuc.edu/Papers> Chorus <URL:ftp://ftp.chorus.fr/pub/chorus-reports> <URL:ftp://cse.ogi.edu/pub/chorus/reports> Clouds <URL:ftp://helios.cc.gatech.edu/pub/papers> Cronus <URL:ftp://pineapple.bbn.com/doc> Mungi <URL:http://i30www.ira.uka.de/projects/cosy/index.html> ExOS <URL:http://www.pdos.lcs.mit.edu/exo.html> Flexmach <URL:http://www.cs.utah.edu/projects/flexmach> Fox <URL:http://www.cs.cmu.edu/afs/cs.cmu.edu/project/fox/mosaic/HomePage.html> Guide <URL:ftp://ftp.imag.fr/pub/GUIDE/doc> Horus <URL:ftp://ftp.cs.cornell.edu/pub/Horus> Isis <URL:ftp://ftp.cse.ucsc.edu/pub/bib/isis.bib> <URL:ftp://ftp.cs.cornell.edu/pub> Mach <URL:ftp://mach.cs.cmu.edu/doc> <URL:http://www.cs.cmu.edu/afs/cs.cmu.edu/project/mach/public/www/mach.html> <URL:http://riwww.osf.org:8001/os/index.html> <URL:http://www.cs.utah.edu/projects/flexmach/mach4/html/Mach4-proj.html> Nebula <URL:http://www.sys.cse.psu.edu/NEBFS/nebula.html> PEACE <URL:http://www.gmd.de/FIRST/peace/peace.html> Plan 9 <URL:ftp://plan9.att.com/plan9/plan9doc> <URL:http://www.ecf.toronto.edu/plan9> <URL:http://plan9.att.com/plan9/plan9doc> <URL:http://cooper.edu:9000/~rp/plan9/plan9-info.html> <URL:ftp://plan9.att.com/plan9/plan9man> RTmach <URL:http://www.cs.cmu.edu:8001/afs/cs.cmu.edu/project/art-6/www/rtmach.html> Spring <URL:http://www.sun.com/technology-research/spring> SUNMOS / Puma <URL:http://www.cs.sandia.gov/~rolf/puma/puma.html> Tigger X kernel / Scout <URL:ftp://cs.arizona.edu/pub/xkernel> <URL:http://www.cs.arizona.eduxkernel/www> Papers covering Amoeba, Choices, Chorus, Clouds, the Hurd, Guide, Mach, Mars, NonStop, and Plan 9 are also available via anonymous ftp from <URL:ftp://ftp.funet.fi/pub/doc/OS>. [I'd like to find the authoritative home for V---Mars and NonStop are a bit more obscure, I think; they certainly aren't asked after much] ------------------------------ Subject: [3.2] Where can I find other papers? From: Papers, reports, and bibliographies Angel <URL:ftp://ftp.cs.city.ac.uk/papers> Apertos <URL:http://www.csl.sony.co.jp/project/Apertos> Cache kernel <URL:http://www-dsg.stanford.edu/papers/cachekernel/main.html> Hive <URL:http://www-flash.stanford.edu/OS> Mungi <URL:ftp://ftp.vast.unsw.edu.au/pub/Mungi> KeyKOS <URL:ftp://cs.dartmouth.edu/pub/sasos/papers/KeyKOS> Pegasus <URL:http://www.cl.cam.ac.uk/Research/SRG/pegasus.html> QNX [93-09-19-22-22.26] <URL:ftp://ftp.cse.ucsc.edu/pub/qnx> <URL:ftp://ftp.qnx.com/pub/papers> <URL:http://www.qnx.com> Solaris 2.x [93-02-23-12-12.43] <URL:ftp://opcom.sun.ca/pub/docs/papers> <URL:ftp://opcom.sun.ca/pub/docs/solaris> SPIN <URL:http://www.cs.washington.edu:80/research/projects/spin/www> Synthetix <URL:http://www.cse.ogi.edu/DISC/projects/synthetix> VSTa <URL:http://www.cen.uiuc.edu/~jeske/VSTa> Windows NT [92-09-18-11-46.16] <URL:ftp://ftp.uu.net/vendor/microsoft/win32-api> <URL:ftp://ftp.uu.net/vendor/microsoft/isv-communications> ------------------------------ Subject: [3.3] Where can I find bibliographies? From: Papers, reports, and bibliographies Distributed shared memory <URL:http://www.cs.uno.edu/~rasit/dsmbiblio.html> Load balancing <URL:ftp://ftp.cse.ucsc.edu/pub/bib/load-balancing.bib> Mobile computing <URL:ftp://ftp.comp.lancs.ac.uk/pub/mpg> Multimedia operating systems [94-04-15-23-29.51] <URL:ftp://cs.ucsd.edu/pub/multimedia> <URL:ftp://ftp.cse.ucsc.edu/pub/bib/mmos.bib> Object-oriented operating systems <URL:ftp://ftp.cse.ucsc.edu/pub/bib/ooos.bib.Z> <URL:ftp://ftp.inria.fr/INRIA/bib/ooos.bib.gz> Parallel and distributed I/O <URL:ftp://ftp.cse.ucsc.edu/pub/bib/io.bib> Sprite network operating system <URL:ftp://ftp.cs.berkeley.edu/ucb/sprite/sprite.html> See also the section on General Net Resources. [There's quite a lot more at <URL:ftp://ftp.cse.ucsc.edu/pub/bib>, if anyone wants to add more to this list.] ------------------------------ Subject: [4] General Internet-accessible resources From: General Internet-accessible resources This section contains information about a variety of services available to the OS research community via the Internet. ------------------------------ Subject: [4.1] Wide Area Information Service (WAIS) and World-Wide Web (WWW) servers From: General Internet-accessible resources [92-09-21-16-38.23] Loughborough University high-performance networking and distributed systems archive may be accessed via the World Wide Web at <URL:http://hill.lut.ac.uk/DS-Archive>. This archive contains, according to Jon Knight <J.P.Knight@lut.ac.uk>, the organiser: - Technical reports and papers written at LUT by the networks and distributed systems researchers in the Department of Computer Studies. - Technical reports, papers and theses which have been produced at other sites and then made available for public electronic access. - Software which is of use in research or which has been produced by a specific research project. - Details of relevant conferences, collected from a variety of sources (USENET, email, flyers, etc). - Information on ongoing research projects. - Bibliographies that have been generated for research at LUT and also access to other WAIS indexed bibliographies, both at LUT and elsewhere. - A list of contacts in the field, with details of their research interests. This is entirely voluntary (i.e. people have agreed to Jon entering their details rather than him just rooting round the Internet to build up the information). Bibliographies in the comp.os.research collection are accessible via WAIS from UCSC. (:source :version 3 :ip-address "128.114.134.19" :ip-name "ftp.cse.ucsc.edu" :tcp-port 210 :database-name "os-bibliographies" :cost 0.00 :cost-unit :free :maintainer "paul@cse.ucsc.edu" :description "Server created with WAIS release 8 b5 on Jul 9 22:38:27 1992 by paul@cse.ucsc.edu The files of type bibtex used in the index were: /home/ftp/pub/bib" ) ------------------------------ Subject: [4.2] Refdbms---a distributed bibliographic database system From: General Internet-accessible resources [92-10-01-11-39.32] The 13th alpha release of refdbms version 3, developed by John Wilkes of the Concurrent Systems Project at Hewlett-Packard Laboratories and Richard Golding of the Concurrent Systems Laboratory at UC Santa Cruz, is now available. It can be obtained by anonymous ftp from <URL:ftp://ftp.cse.ucsc.edu/pub/refdbms>. The system has been tested on Sun 3 and 4 systems running SunOS 4.1.x, and on DECstations running Ultrix 4.1. It is an experiment in building weak-consistency wide-area distributed applications, and the databases currently available for the system have a good systems coverage. The system includes tools to query the database, to produce bibliographies for LaTeX documents, and to enter new references into the database. It is part of ongoing research into wide-area distributed information systems on the Internet. Features include: - Distributed databases: a reference database can be shared among multiple sites. Updates can be entered at any site, and will be propagated to the other sites holding a replica of the database. - Multiple databases: every database has a name, and users specify the order in which databases will be searched. - Private databases: databases can be private, available site-wide, or they can be made available to other sites. - Database query by keyword, author, and title word. - Translator for refer-format databases. - Usable with LaTeX documents: the internal refdbms format can be translated into a special BibTeX format. An up-to-date list of bibliographies exported by various institutions may be obtained using anonymous ftp from <URL:ftp://ftp.cse.ucsc.edu/pub/refdbms/current-databases>. ------------------------------ Subject: [4.3] Willow -- the information looker-upper From: General Internet-accessible resources The University of Washington's Willow system provides a Motif-based user interface to a heterogeneous collection of on-line bibliographic databases. It will compile and run on most systems which provide a Motif library. For further information, see the Willow home page at <URL:http://www.cac.washington.edu/willow/home.html>. ------------------------------ Subject: [4.4] Computer science bibliographies and technical reports From: General Internet-accessible resources - A collection of bibliographies in various fields of computer science is available via anonymous ftp and the World Wide Web. The bibliographies contain about 260,000 references, most of which are references to journal articles, conference papers or technical reports. The collection has been formed by using various freely accessible services in the Internet (anonymous ftp, mailserver, wais, telnet) and converting each bibliography into a uniform BibTeX format. It is organised in files containing references to a (more or less) specific area within computer science. The database has been organised by Alf-Christian Achilles <achilles@ira.uka.de>. It may be accessed on the Web at <URL:http://liinwww.ira.uka.de/bibliography/index.html>, via ftp from <URL:ftp://ftp.cs.umanitoba.ca/pub/bibliographies>, and through a more useful search mechanism on the Web at <URL:http://glimpse.cs.arizona.edu/1994/bib>. - As part of the ARPA Electronic Library Project, the Database Group at Stanford is providing a Selective Dissemination of Information (SDI) service to disseminate information about computer science technical reports. You can have a server email you periodic announcements of new papers on topics that interest you. See <URL:http://cs-tr.cs.cornell.edu/Info/cstr.html> for details, or contact Tak Yan <tyan@cs.stanford.edu> or the mail server itself at <elib@db.stanford.edu>. ------------------------------ Subject: [4.5] The comp.os.research archive From: General Internet-accessible resources [93-02-18-21-18.31] An archive of all messages posted to comp.os.research since 1988 is maintained at UC Santa Cruz. It may be accessed via anonymous ftp at <URL:ftp://ftp.cse.ucsc.edu/pub/comp.os.research>. The archive is organised by year. Postings may also be found via WAIS at UCSC's Computer Science gopher hole: (:source :version 3 :ip-address "128.114.134.19" :ip-name "ftp.cse.ucsc.edu" :tcp-port 210 :database-name "comp-os-research" :cost 0.00 :cost-unit :free :maintainer "paul@cse.ucsc.edu" :description "Server created with WAIS release 8 b5 on Jul 9 03:51:11 1992 by paul@cse.ucsc.edu The files of type netnews used in the index were: /home/ftp/pub/comp.os.research" ) ------------------------------ Subject: [4.6] Miscellaneous resources From: General Internet-accessible resources - Paul Harrington <phrrngtn@dcs.st-andrews.ac.uk> maintains a World Wide Web page on checkpointing, at <URL:http://warp.dcs.st-and.ac.uk/warp/systems/checkpoint>. - Jay Lepreau <lepreau@cs.utah.edu> has made available an electronic version of the proceedings of OSDI '94 at <URL:http://www.cs.utah.edu/~lepreau/osdi94>. Available are such things as - Papers: abstracts, papers, slides, bibtex entries, and for most, the actual software. - Keynote: audio and slides - Extensible OS panel: audio, slides, project URLs - Insularity panel: audio - Mach/Chorus workshop: TRs for most, slides, some software - Tutorials: slides for half, descriptions for all - Miscellaneous: summary report from ;login, list of works-in-progress talks, hard-copy proceedings ordering info, CFP, proceedings introduction, list of referees. ------------------------------ Subject: [5] Disclaimer and copyright From: Disclaimer and copyright Note that this document is provided as is. The information in it is not warranted to be correct; you use it at your own risk. Following recent reports on the <faq-maintainers@mit.edu> list I think it wise to change the copyright: NOTICE OF COPYRIGHT AND PERMISSIONS Answers to Frequently Asked Questions for comp.os.research (hereafter referred to as These Articles) are Copyright (C) 1993, 1994, 1995, and 1996 by Bryan O'Sullivan <bos@serpentine.com>. They may be reproduced and distributed in whole or in part, subject to the following conditions: - This copyright and permission notice must be retained on all complete or partial copies of These Articles. - These Articles may be copied or distributed in part or in full for personal or educational use. Any translation, derivative work, or copies made for other purposes must be approved by the copyright holder before distribution, unless otherwise stated. - If you distribute These Articles, instructions for obtaining the complete current versions of them free or at cost price must be included. Redistributors must make reasonable efforts to maintain current copies of These Articles. Exceptions to these rules may be granted, and I shall be happy to answer any questions about this copyright notice -- write to Bryan O'Sullivan, PO Box 62215, Sunnyvale, CA 94088-2215, USA or email <bos@serpentine.com>. These restrictions are here to protect the contributors, not to restrict you as educators and learners.
Закладки на сайте Проследить за страницей |
Created 1996-2025 by Maxim Chirkov Добавить, Поддержать, Вебмастеру |