(DISC has published two data conversion tutorials. This is the simple overview of data conversion. For a more detailed look, please see our Intermediate Level Data Conversion Tutorial.)
Data Conversion is the generic term given to the process of converting computer data between different applications and/or between different computers.
Data conversion involves up-to four different issues. A conversion may involve any combination of:
That's our business!
Furthermore, the type of tape does not always indicate the physical recording format, and therefore the drive you need. A DLT IV tape can be used in DLT 4000, DLT 7000, DLT 8000, DLT-1, and VS-80 drives, and they write to tape differently. You simply can't tell by the type of tape what physical format has been recorded on it.
There is more information about tape formats, and examples, in our Intermediate Data Conversion Tutorial.
The "File Type" we are discussing here is the file type on disk, before it is written to tape, or after it is restored from tape.
The File Type and File Content are closely related, with overlapping issues and interactions. What "File Type" and "File Content" refer to depends on both the operating system and the kind of file. Furthermore, the issues are different for mainframes and PCs, and for different kinds of files, so it's difficult to make global statements about either. What follows is a simple overview. Our Intermediate Data Conversion Tutorial contains a more complete description.
File type refers to how the file is stored on disk. In the case of mainframe computers, the "how" is handled by the operating system, while on Windows, UNIX, and Macintosh computers it's handled by the application program. So "file type" has very different meanings and implications on mainframes than on PCs.
Regardless of where it is handled, it refers to the kind of file. Under an operating system that uses structured files, such as a mainframe, "file type" describes, for example, an indexed or sequential file, with fixed-length or variable-length records, and likely other file parameters such as the record length or type of indexing. Under operating systems that don't use structured files, such as UNIX or Windows, "file type" commonly refers to the application that created the file, such as "a Microsoft Access file", or to some common file type used by many applications, such as a comma-delimited file.
This is discussed in much greater detail in our Intermediate Data Conversion Tutorial.
File content refers to what is stored in the file, and what is stored in the file depends on what the file is -- text, word processing, database, spreadsheet, binary data, object file, executable, etc.
So File Content encompasses many concepts, and takes on different meanings for different types of files. For database files it may mean character fields versus binary fields, EBCDIC versus ASCII, etc. It may also include issues such as redefined fields or redefined records (multiple record types in one file). When used to describe a data conversion, "file content" generally does not refer to the specific data in individual fields or records in the file, such as "John Smith" or "Jane Doe", but to the method or data type used to store that data.
File Content may also be dictated by the application. For example, the content of an Access file is controlled by Access and the layout you specify when you create the file.
The issues are numerous, and are discussed in greater detail in our Intermediate Data Conversion Tutorial.
Before we can convert your data we will need to identify all these issues for the source tape, and determine what you want back on the destination tape or disk. Let's look at a very simple conversion of a UNIX text file to a PC text file. Let's say you receive a DLT-IV tape in tar format, containing a plain-text file created on UNIX. So far we know the following:
- Media: DLT-IV media
- Tape Format: tar file
- File type: UNIX text file
- File Content: Plain text (no word processor codes).
DLT-IV tapes can be recorded in different physical recording formats, but the recording format was not specified and will have to be determined. A tar file was specified, but the exact type of tar file and the block size were not specified and will have to be determined. The file type is a UNIX text file, so it will use standard ASCII characters, and each line of the file will end with a UNIX Newline. The File Content will be text only, with no word processor codes.
Now that we know what we have, it's time to specify what you want back. While we have primarily discussed tapes in this article, DISC commonly delivers PC files on CD or DVD, so let's specify that:
- Media: 74 minute CDR
- Format: Windows Joliet format
- File type: PC text file
- File content: Plain text.
After determining which tape drive to use, we would then inspect the tape to determine the tar block size and tar type. We would then extract the text file from the tar file, and convert the UNIX Newline to a carriage-return line-feed pair for a PC. The converted file would be written to a 74 minute CDR in Joliet format, for use on a Windows computer.
Our Intermediate Data Conversion Tutorial presents an example of converting a mainframe data file to a PC file for Access.
Some important issues may not be explicitly given. For example, most mainframe tapes will be in EBCDIC, but that may not be specified, just like most UNIX, PC, and MAC tapes will be in ASCII, but that may not be specified. You will have to deduce it from knowing what computer the tape originated on, or by inspecting the tape. A PC will not understand an EBCDIC file, so it needs to be converted before the PC can use it.
If you are getting a data file, you will need a record layout that specifies the fields in the file. If some of those fields are in binary format, they will probably need to be converted by us, as binary data types are generally not compatible across platforms (computers and CPUs).
Before submitting a conversion, you should try to get as much information as you can, and give some thought to what kind of file you want back. The more accurately you specify your conversion, the better job we can do for you.
This has been a simplified overview of Data Conversion. Greater detail, and references to other Disc Interchange articles, is available in our Intermediate Data Conversion Tutorial. DISC has also published numerous data conversion articles, via the link below.
For more articles on data conversion, see our TechTalk Index.
Our Data Conversion Services
Disc Interchange Service
Media Conversion Specialists
15 Stony Brook Road
Westford, MA 01886