
Understanding Record Size and
Record Delimiters:
The seemingly simple concept
of record size can be confusing to some people when converting records
from one computer to another. This discussion is about the differences
in fixed length records between mainframes and PCs. Variable length
records are not considered here.
Fixed-Length Records Defined
A file that has "Fixed-Length
Records" is a file where the records are of a fixed (unchanging) size.
All records in the file are the same size. Such a file will also
have fixed-length fields. Each field will have a predetermined and
unchanging size, set when the record layout is designed, and the sum of
the field sizes will add up to the record size. If the data stored in a given field contains
fewer characters than the defined size, the rest of the field will be filled
with spaces, or some other character. For example, if the "LASTNAME"
field in a file is set to 15 characters, and the last name is "Smith" there
will be 10 spaces after the name, to fill out the field.
Field Delimiters
Fixed length records don't have
field delimiters. This is true on both mainframe and PC platforms.
Since the fields are always the same size, they are always in the same
location in the record, and no delimiter is needed to locate any field.
Record Delimiters
As we will see below, fixed
length records sometimes have record delimiters. A "record delimiter"
is a character or set of characters that are used to mark the end of a
record. If records vary in size, as they do in a variable length
file, this is necessary to be able to separate records. A record
delimiter must be a code that is not found in the data; the only place
it will be found is at the end of a record, so every time the computer
finds that code, it knows it has reached the end of the record. The most
common record delimiters are the carriage-return (CR), line-feed (LF),
or carriage-return line-feed (CR-LF) combination.
Accessing Fixed-Length Records
Because the size of each fixed-length
record is known in advance, you don't really need a record delimiter
to locate any record in a fixed-length file. For example, if your
records are 100 bytes in size, then the first record begins at byte 1 of
the file, record 2 begins at 101, record 3 begins at 201, etc., and each
record is always 100 bytes in size.
Technically, fixed-length
records can be accessed perfectly well without delimiters. However,
many PC programs expect record delimiters, even on fixed-length records,
and won't work properly without them.
Fixed-length records take
more disk space than variable-length records, but they have advantages.
For example, there is no need to read the record byte-by-byte, searching
for a record delimiter, and there is no danger of a rogue CR falsely indicating
the end of the record. But more importantly, you can locate any record
in the file by a simple calculation, which makes random access possible
and efficient, as you can jump to any record in the file without reading
through the previous records. You cannot do that with variable-length
records.
Fixed-Length Records on Different
Computer Systems
Fixed-length records are stored
differently on mainframe computers and PCs.
Fixed-Length Mainframe Records
Data on a mainframe computer
is almost always stored as fixed-length records with no record or field
delimiters. (This article will not deal with indexed or other database
files). When these records are written to tape, the same is true
-- there are no record delimiters on the tape.
Fixed-Length PC (MSDOS and
Windows) Records
Although fixed-length records
can be accessed perfectly well without delimiters, many PC applications
require record delimiters, even on fixed-length records. So record
delimiters are standard practice for fixed-length PC files. The standard
record delimiter is the two byte carriage-return and line-feed (CR-LF)
pair, 0D, 0A hex.
Fixed-Length UNIX Records
Many UNIX applications work
with fixed-length records with no delimiters. Those that use a delimiter
usually use the UNIX "newline", which is the LF character, 0A hex.
Fixed-Length Macintosh Records
The Macintosh record delimiter
is a single CR, which is 0D hex. You are more likely to find variable
length records on Macintosh than fixed-length, but when fixed-length records
are used they are usually delimited with a CR.
Converting Between Mainframe
and PC Records
Need to convert Mainframe files?

That's our business!
|
Now that we have the necessary
background, let's discuss converting fixed-length records between a mainframe
and a PC.
When fixed-length mainframe
records are written to a tape, they are written as fixed-length with no
delimiters, just like they are stored on disk. Since most PC applications
have trouble with that type of file, DISC normally adds a record delimiter
to the end of each record. For Microsoft operating systems we add
a carriage-return and a line-feed, CR-LF. For UNIX applications we
add a "newline", which is the LF character, and for Macintosh we add a
carriage-return, CR. The CR is 0D hex and the LF is 0A hex.
When we convert a file from
a PC to a mainframe tape, we remove the record delimiter, as the mainframe
neither needs nor wants a delimiter. Writing the PC record delimiter
to a mainframe tape would cause the mainframe programmer quite a bit of
grief. Mainframe languages generally have no provision for handling
a record delimiter automatically, so the programmer would have to treat
it as junk data at the end of the record, and define a 2 byte filler field
to hold it, and increase his defined record length in the JCL accordingly.
Measuring Record Length
If a mainframe record is 100
bytes long, then it's clear the size is 100 bytes, period. There
is no ambiguity to the size. But that same record, when transferred
to a PC, is 100 bytes of data plus a CR-LF, for a total of 102 bytes.
And on a Macintosh or a UNIX system, it's 100 bytes plus a 1 byte delimiter.
So is it a 100 byte record, a 101 byte record, or a 102 byte record?
In each case the record contains 100 bytes of actual data, and such a record
is commonly referred to as a "100 byte record" on all the computers.
For common references to record length, the record size is considered to
be the amount of data the record holds, and any record delimiters are part
of the file structure, not the data. This makes for consistency between
systems.
But clearly the physical
space occupied by the PC record is 102 bytes, not 100. Many times
you will have to use the physical size, such as when calculating where
in a file the 89th record starts, or the disk space required to store a
million records. So it becomes necessary to use both values in different
situations.
When you need to make the
distinction between the two numbers, you usually call the sum of the data
(100) the "Logical record size", and the data plus delimiter(s) the "Physical
record size".
Summary
Mainframe computers do not use
record delimiters. Almost all MSDOS and Windows applications require
record delimiters, and the standard delimiter is the two byte CR-LF (carriage
return - line feed) pair. The standard Macintosh delimiter is a single
CR, and the standard UNIX delimiter is a LF, called "newline" in UNIX.
The CR is 0D hex and the LF is 0A hex.
When converting files from
a mainframe to a PC, DISC adds a delimiter, and when converting PC files
to a mainframe, we remove the delimiter.
The "record size" is the
number of characters you can type in a record. The number of bytes
occupied by a record on a PC disk will be two bytes greater than the record
size, to account for the CR-LF record delimiter.
If you need to make the distinction
between these two values, the number of characters in the record is called
the "Logical record size", and the number of bytes including any delimiters
is called the "Physical record size".
Additional Information
For more articles on data conversion,
see our TechTalk Index.
Disc Interchange Service
Company, Inc.
Media Conversion Specialists
15 Stony Brook Road
Westford, MA 01886
(978) 692-0050
Copyright © 1997 - 2007 by Disc Interchange
All rights reserved. See our copyright
page.