
Dealing with Binary Junk in
a File:
For a number of reasons, data
from mainframes will often contain "junk". That junk is often random
binary values that will cause problems when brought into your PC database.
There are four primary reasons for this junk:
-
Databases that are not properly
initialized when created can have literally any value in a byte.
-
Unused fields are often initialized
to nulls (hex 00), and if never populated they remain as nulls.
-
It's common practice to reserve
spare space in "filler" fields. Filler fields are commonly not
initialized, and can therefore contain anything.
-
Sometimes when you get a file
there will be fields in the file for "internal use" that are not specified
on the layout. Since these are not specified, they could be anything,
and are often binary values.
This binary junk can cause a
number of problems, from funny characters in your data to crashing your
database. One of the most serious is a control-Z (1A hex) in a file;
this signifies end-of-file to many PC applications, so the database will
stop importing the file when it sees a control-Z.
DISC has written several
programs to scan your files to catch these problems and fix them before
they cause you any grief. We routinely scan all jobs for control
codes, bytes with the high bit set, irregular records (short or long records,
a CR of LF in the middle of a record), control-Z, and other problems.
We don't just blindly convert your file.
When we encounter "junk"
in a file, there are several ways to deal with it, depending on what it
is. If it's caused by binary fields, and they contain data you need,
then it's not junk at all, and must be converted. But assuming it's
not data you want, there are several ways to fix the problem.
-
Remove the field from the record
and shift the remaining fields up.
-
Replace the field with one containing
spaces or something clean.
-
Replace any binary values
anywhere in the record with a space.
The first two are the cleanest
approach, but require programming so are usually the most expensive.
The third approach is an economical compromise. It simply scans the
entire record (so it doesn't require programming) replacing any binary
value it finds with a space (or sometimes an "*" so you can distinguish
a replacement from a normal space). It can't be used in all cases, but
when it works it's fairly inexpensive; we commonly do it for nothing on
repeat jobs.
Part of the compromise is
that it may leave some strange looking stuff behind. Say you have
a binary field of 5 bytes. Two of them may be binary values, but three
may actually be valid characters (binary data can be any value,
so sometimes it takes on the value of a character). Since they are
valid character codes, this method
doesn't remove them, and they
are left in your file. Frequently these are punctuation, so you may end
up with a field that looks like ")# !". Since these are valid
characters the database doesn't usually mind, and the file imports okay.
Disc Interchange Service
Company, Inc.
Media Conversion Specialists
15 Stony Brook Road
Westford, MA 01886
(978) 692-0050
Copyright © 1997 - 2007 by Disc Interchange
All rights reserved. See our copyright
page.