![]() |
![]() |
||||
|
Using gzipped Sequential SAS DatasetsAn important new feature in SAS 6.12 for UNIX is the ability to work with gzipped sequential SAS datasets. Prior to version 6.12, SAS datasets could only be used in their uncompressed form. Now they can be created in a form suitable for use with gzip and gzcat. If you are working with large SAS datasets, you should read and understand this bulletin. You should also know how to use gzipped raw data files with SAS. See the bulletin SAS_PIPES. Three examples are given below:
The key to this new feature is to use sequential SAS datasets. By default, if you don't do anything special, SAS creates datasets that are designed to be used non-sequentially. You can program SAS to jump around your dataset, accessing cases in a non-sequential manner. Most statistical applications work in a sequential manner, and so can use sequential SAS datasets without any special consideration. The non-sequential access feature is not needed. The gzip and gzcat programs process files sequentially. To compress a file, gzip reads the file from beginning to end, outputing compressed records one after another. To uncompress a file, gzcat reverses the process, and writes uncompressed records in a way that re-creates the file from beginning to end. Neither program is able to skip around in a file, compressing or decompressing randomly specified parts of the file. Convert a gzipped SAS Dataset into a gzipped Sequential SAS Dataset Most people keep their SAS datasets in gzipped form, and gunzip them to work on them. When finished, they gzip them up again. This example is designed to show you how to convert those original gzipped SAS datasets into new sequential SAS datasets that are always stored in gzipped form. A key step is the construction of the FILENAME statement. Your gzip command MUST END with an ampersand, or your SAS job will hang up, doing nothing. Adapt the following SAS commands to your needs. The non-sequential SAS dataset ~/revenue/prindata.ssd01.gz is converted into sequential ~/revenue/prndata.ssd.gz. /* SAS Example to CONVERT a gzipped datasets */Creating a gzipped Sequential SAS Dataset from Raw Data This example is designed to show you how to create a sequential SAS dataset in gzipped form, starting with raw data on the file named "cancer.dat". The sequential SAS dataset will be named "newds.gz" at the end of the SAS job. A key step is the construction of the FILENAME statement. Your gzip command MUST END with an ampersand, or your SAS job will hang up, doing nothing. A slightly more complicated variation of this example would read the raw data in gzip compressed form. That exercise is left to the reader. More information can be found in the bulletin "sas_pipes". Adapt the following SAS commands to your needs: /* SAS Example to CREATE a gzipped Sequential Dataset */Reading a gzipped Sequential SAS Dataset Once you've converted your SAS datasets into gzipped sequential form, you will have to continue to use named pipes and gzcat to access your datasets. This example is a variation on the two previous examples. Again, the key is to make sure to terminate your gzcat command with an ampersand, so that it runs in the background. If you do not, your SAS job will hang, doing nothing. Adapt the following SAS commands to your needs: /* SAS Example to READ a gzipped Sequential SAS Dataset */For further informationSee the ``SAS Companion for the UNIX Environment and Derivatives'' in the section ``Reading from and Writing to UNIX Commands'' on page 119 of the Version 6 First Edition. And see the man page for gzip, gunzip and gzcat, `man gzip'. |
|||||
![]() |
Services |
Get Connected |
Support |
Educational Resources |
NUIT
|
|