University of Oxford Crest

Laboratory of Molecular Biophysics
Laboratory Journal 2002
R. K. Bryan


Return to: Contents.

Computing
R. K. Bryan

Central Services.

We have made a significant upgrade to the central file services and backup facilities. An important consideration is the availability and security of the storage, as well as the capacity and speed of access. The model of a RAID storage system served via dual computer systems, up to now provided by the StorageWorks array and Alphaserver systems, is one that has provided reliable service for the last five years. It provides resiliance against the most common causes of lack of access to data, such as disk failure and system crashes, and well as providing access to the storage during a software upgrade on one of the systems. A fully redundant system with no single point of failure would however require considerably more hardware (dual storage systems, dual controllers and dual connections for all possible data paths).
Equally important is the software running on the file servers which coordinates disk access between the servers and ensures that external clients are provided with an appropriate access path to the storage whenever one or more of the servers is functioning. After some investigation of such 'high-availablity' software and demonstrations (some of which showed that potential software packages did not perform as advertised), we decided on the 'Convolo' package from Mission Critical Linux. Although less sophisticated than VMS clustering, Convolo provides several important mechanisms for checking the availabilty of the servers, including both serial-line and dedicated ethernet interfaces for heartbeat, which enable each server to ascertain the status of the other. The file services are seen by the clients as being served from a number of 'virtual' host addresses which are switched between systems if failover occurs or by management commands. The services for specific filesystems are associated with these virtual addresses, so a change to the alternate server is completely transparent to the client, involving only an updating of its ARP table to reflect the change of host hardware address.
Reliable hardware is also essential, and we therefore required well-constructed server systems which do not fail like cheap hardware for often trivial reasons, such as poor-quality power supplies or inadequate ventillation. The final choice of hardware was
All the above fit into one standard rack, with space still available for expanding the storage, either with additional disks arrays or greater magazine capacity for the SDLT robot. The SuperDLT tapes are a development of DLT technology that we have been using with great success since 1994, and are backwards compatible with the existing DLT IV tapes.
Each system has its own system disk, which allows each system to operate should the other fail or be unavailable due to maintenance or upgrading, but does mean that there is a certain duplication of effort on installing or upgrading.
The storage array has a separate SCSI connection to each system, which allows one system to be powered off and disconnected without affecting disk access by the other. It supports the most common RAID levels, but for our purposes it has been configured as a RAID5 array, so a single disk can fail with no loss of data. It is partitioned into four logical disk devices and each of these is further software partioned to present appropriately-sized filesystems to the users.
As of October 2002, a number of filesystems are being served both via NFS and Samba, and migration of filesystems currently hosted on the older servers will take place progressively.
At the moment this systems is supplementing rather than replacing the AlphaServers, and the following systems continue to be in use:-

1.1.2 Laboratory Network.

The laboratory network continues to be based on a 100Mbit/sec Fast Ethernet network. The only significant addition this year has been a 3Com 4900 12-port Gigabit Ethernet switch. This initally provides connectivity between the Proliant server nodes and the other network switches, with of course the latter links still operating at 100Mbit/sec, but will provide connectivity for Gigabit links for further servers and upgraded peripheral switches in the future.
An Uninterruptable Power Supply (UPS) has been installed to supply power to the main network components in the computer room (switches and media converter for the University backbone link). This protects against mains power surges and dropouts, and also battery backup power for about 15 minutes in case of power failure. This has been invaluable during the series of power disconnections in the building for testing, etc., as the equipment can continue to run while the power source is changed to a temporary supply.
The thinwire network has finally been disconnected, and the other older networks (LocalTalk, FDDI) serve rather few nodes, just two older printers on LocalTalk, although the FDDI provides an useful addition to the network capacity to the AlphaServers.

1.1.3 Reorganising NIS services and NFS mounting.

A vital component of the network services are the NIS services which provide password information and other network information to all Unix and Linux client systems. For several years various SGI workstations were being used as both master and slave servers. In Feb 2002 this service was migrated to the AS4100 as master and the dual Linux servers as slaves, a rather fiddly process to ensure uninterrupted information was supplied to clients, and that e.g. password changes were communicated to the correct servers. This was fortuitously performed just a month before the system disk on the previous SGI master failed!
In addition, a set of NIS maps were produced to distribute automatically the NFS automounting information to all clients. This information relates network filesystem names to the appropriate server hosting the disk and the mount point to use, and had previously been maintained separately on each client. After a one-off reconfiguration of the automounter on each client, we are now assured that the disk mounting information is consistent on all clients, and further changes and additions to NFS disk mounting information can be propagated by a single change to the NIS maps. This is invaluable as new disk services are set up and existing ones migrated between servers
There are now nearly 200 host addresses registered to the Laboratory network, which includes a wide variety of systems:- the central servers as described above; Mark Sansom's group's PC cluster; a large number of desktop systems, including Intel systems running either Windows or Linux, and Apple Macintosh computers; personal laptop computers; and a number of systems dedicated to specific tasks, such as control of the Area Detectors and the Optronics scanner.

Return to: Contents.


[ Lab. of Molecular Biophysics | University of Oxford | Feedback ]

Last updated: 11-MAR-2004 13:19