UNIVERSITY MATHEMATICAL LABORATORY, CAMBRIDGE

CAMBRIDGE SUPERVISOR : PLANNING DOCUMENT No. 24
File Master Dump System - Weekly Activity

This document gives details of our proposed system for tidying up material copied from disc to tape so that it can readily be recovered. The system will be implemented as soon as possible and then amended if necessary on the basis of experience.

1. General Principles

Files are written onto magnetic tape as they are produced and there can be no control over the sequence in which this information is written.

Once or twice daily the dump tapes are copied so that two copies of all dumped information exist. After this the system seeks to maintain a minimum of two copies of all dumped information.

The weekly activity is required to obtain the following results:

  1. Remove unwanted files and their copies from the system.
  2. Arrange the dumped information in such a way that recovery is made practical and efficient.

It is expected that about 14K blocks of file space on disc will be used to hold about 10K blocks of permanent file data. Archive facilities might then be allowed to extend this to 20 - 30K blocks of dumped information. At 4K blocks per magnetic tape the dumped volume would cover some 8 magnetic tapes.

Since two copies must be guaranteed and since in a simple updating cycle, one set of tapes is in preparation the minimum backing store contains three sets of 8 tapes each. The weekly activity will seek to provide the required support with the fewest number of tapes above the basic minimum of 24.

2. Tape Use.

The two main weekend activities which link the dump tapes and backing tapes are compression and sorting. The compression is achieved by removing copies of files which have dump names that do not correspond to the dump names in the file directories at the end of the week in question. Sorting is required so that the updated backing store will contain files organised by owner identity rather than date of creation.

Multi-reel sorts are clumsy to operate and to be avoided if possible. In view of this the compression must result in one reel of dumped information or it must make a preliminary split on the basis of user identity. In the first instance we shall assume that no split is necessary and that the compressed weekly output does not exceed 4K blocks.

The compressed weekly information must be kept for at least two weeks if we are to guarantee two copies of all information and if the backing store tapes are not to be duplicated. A cycle of three tapes will therefore be used to hold the compressed dump information.

The weekend activity is therefore as follows:

  1. Change over to using a new set of dump tapes.
  2. Compress and sort the past week's data putting the result on a single reel of tape.
  3. Use the compressed dump to update the cycle of backing store tapes.
  4. Release the dump tapes for re-use.

If the "weekend" activity can be guaranteed to be complete before the end of the following week, only two sets of dump tapes will be required. On the assumption that the weekly output can be accommodated on, say, five reels of 4K block tape some 10 tapes will be required for dumping and a further 10 for copies of these. The total tape requirement would be:
3 x 8 backing store tapes=24
3 x 1 compressed dump tapes=3
2 x 5 dump tapes=10
2 x 5 archive tapes=10
Total=47

3. Tape Titles

The dump tapes have titles of the form

αβDP

where α = set number and β = reel number within the set. On the basis of the figures given above
α=1 or 2
β=0 (1) 4

The archive tapes which are duplicate copies of dump tapes have titles of the form

αβAC

The backing store tapes have titles of the form

γβwσ

where β = 1 under normal circumstances, γ is the set number and w identify the different tapes in one set. On the basis of the figures given above

γ = 1, 2 or 3

and w are two-letter identifiers from a set of 8 different identifiers.

The compressed dump tapes will have titles of the form

γβCP

where β = 1 under normal circumstances and γ is a set number such that if the tape γβCP contains the dumped information for some week t then the tapes γβwσ contain information dumped prior to week t.

The total backing store data up to the end of week t is therefore on tapes γβCP + γβwσ and if γ1 is the value of for week t + 1 then the information on γ1βwσ = information on γβCP + γβwσ less the information deleted during week t.

4. Weekend Runs

The single weekend updating activity will be split into two distinct runs.

(a) Compression and Sort

To produce γβCP from αβDP (or αβAC)

(b) Main Update

To produce γ'βwσ from γβwσ and γβCP

5. Dump tape format

The existing dump tape format will be used as standard throughout the system. If, subsequently, it is considered necessary an additional type of marker block will be introduced which marks the beginning of a group of several files.

All tapes will contain identifying data in block 1 (with copies in the following 4 blocks) and the information proper will start at block 20.

Marker blocks are as follows (identified by first half-word):
0.0Start of a file
0.1Start of a directory
0.2Dummy
0.4Copy of Control block
J4End of data
.1J4Unused block

On a dump tape there will always be at least one copy of every directory. Copies of files for one person will also be followed by a copy of his directory. The last thing to be dumped on any one tape will be a directory if that tape is the last for the week.

On the compressed dump tape only one copy of each directory will be held and this will always precede the files for that person. The same arrangement will also apply on the backing store tapes.

Each file owner will be associated with just one backing store tape identifier (i.e. one value of wσ). The compressed dump tape will contain user information sorted in ascending value of backing store tape identifier and in alphabetic order of user title within that. On each backing store tape file owners will be sorted into alphabetic order. The files for one person will be held together but in no particular order.

6. Compression and Sort

This run will be carried out in four phases:

(a) Directory summary.

Prepare a list of the dump names of all files to be carried forward onto the compressed dump tape. This list is obtained from the directories on the last dump tape for the week; the latest version of each being used.

(b) Compress and split.

Files from all dump tapes for the week are checked against the list of dump names. If required they are copied over to working tapes and the major key is added into the marker block. This is a single half-word in which the m.s. bit = 0 for a file and = J4 for a directory. Dummy blocks and control blocks are ignored and only the most recent copy of each directory is output.

Two working tapes are used and the output seeks to produce an equal number of maximum length strings on each.

(c) Merger Sort

A 2 x 2 merge sort uses working tapes to sort first on the major key and then on user title. Within one user the order is then arbitrary. The final pass of the merge sort will leave two strings; one on each of two working tapes.

(d) Final merge

The final merge will output the sorted information onto a compressed dump tape using the standard format.

7. Backing Store Update

The backing store tapes are updated by merging with the compressed dump tape. During the merge the file directory on the compressed dump tape is used and that on the backing store taoe is dropped. If there is no directory on the compressed dump tape then that file owner is dropped altogether.

On both tapes the directory appears first and a list of current dump names is constructed before merging the files for that user. Any file not in the list is dropped and all others are copied over in no particular sequence.

31.3.67


Copyright © 1967 University of Cambridge Computer Laboratory. Distributed by permission. Thanks to Barry Landy, Roger Needham and David Hartley for giving permission to distribute these documents. Thanks to Barry Landy for lending me the paper document from which this was scanned. Any typographical errors probably arose in the course of OCR.


Previous Planning Document: 23. File Protection, AGF, 16.11.66
Next Planning Document: 25. User Groups, A.G. Fraser and lmw, 8.5.67
Return to Cambridge Supervisor Planning Documents
Return to CUCPS TITAN page
Return to CUCPS home page
Return to University of Cambridge home page


Contact: CUCPS Committee (soc-cucps-committee@lists.cam.ac.uk)
This HTML version last updated: $Date: 1999/06/21 20:32:09 $