Site Map Online Directory
  Search Information Technology   Northwestern University  
YOU ARE HERE >   NUIT > SSCC > Migration Information
Migration Information

About the SSCC

Cluster Report (NU Restricted)

HOWTOs

Bulletins

Statistical Software

Statistical Software Manuals

Additional Resources

Migration Information

Social Science Data Services

Kellogg Research Computing

Depot File Service

Improving Social Science Research Computing (PDF)

Contact List

Services

Get Connected

Support

Educational Resources

NUIT

Migration Information

printer friendly format

 

Overview

Back to top

The DCE Cell for Social Science Research was closed permanently on Monday June 13, 2005, at 5 pm Central Daylight Saving time.

All files, backup tapes and disk drives remaining on the DCE Cell were subsequently destroyed.

Migration from the DCE Cell for Social Science Research to the Social Sciences Computing Cluster is a change from an older UNIX system (HP-UX 10.20) to a newer UNIX system (Red Hat Enterprise Linux for Workstations).

The statistical software suites are substantially the same, although many programs are more up-to-date in the SSCC.

You will have to transfer your own files to the SSCC. Most file formats are unchanged, and very little file conversion is involved. SAS Version 6.12 datasets are an exception.

This document is designed to help you plan and perform your move to the SSCC. Please read it carefully.

Your comments are welcome! Send them via email to Bruce Foster.

 

Planning

Back to top

Please review the files in your DCE home directory and clean them up. You can find out how much storage is in use with the dfsquota command. Look under the column named "Used" for your read/write dataset. Units are KB.Your home directory is common to all compute servers you use in the DCE cell.

You may also have files stored locally on each DCE compute server you've used. Check your scratch directory and email on those hosts. Remember to login to every compute server you've been using to check your email and scratch directories.

If you make use of special groups to share files, arrange for those groups to be created in the SSCC before you transfer your files.

Your files may have been compressed by a system administrator if they were inactive for a period of time. Compression shrinks typical data files by 75% to 95%. Decompressing those files will expand them by a factor of 5 or more. You may be limited by your disk quota if you try to decompress all of your files at once.

There is no need to decompress your files before transferring them to the SSCC. Indeed, transfers will be much speedier if you leave them compressed.

Determine how large a decompressed .gz file will be with gzip -l filename.gz

Decompress all of your files with the command gunzip -vr *

Decompress individual files with the command gunzip -v filename.gz

You can compress files similarly, with gzip -vr * and gzip -v filename

SPSS is not available in the SSCC. See Statistical Software Changes.

SAS 6.12 datasets (files with .ssd01 extensions) must undergo conversion before transfer to another computing system or they will be useless. See SAS Version 6.12 Datasets.

SPSS data files (files with .sav extensions) created in the DCE cell are quite old and may not be recognized by more modern programs. Convert them to SPSS portable files before migration. See SPSS Data Files.

Object files from Fortran and C programs are not portable. Plan to recompile your programs after transferring the source files to the SSCC. Allow adequate time to compensate for slight variations in compiler implementations and library versions and locations.

After all preparations are complete, you must transfer your files to the SSCC yourself, using sftp. See Transferring Your Files.

Be sure to allow yourself enough time to verify that everything's working on the SSCC after your file transfer!

 

Statistical Software Changes

Back to top

  • The IMSL Fortran library is not available in the SSCC.
  • SAS version 6.12 datasets (files with .ssd01 extensions) must undergo conversion before transfer to the SSCC. See below for more information.
  • SPSS is not available in the SSCC. SAS can read SPSS portable files, and Stat/Transfer can convert SPSS portable files to formats that other programs can read.
  • R Version 2.1.0 is newly available.
  • The AMD Core Math Library is newly available.
  • The Intel® C++ and Fortan compilers are newly available.
  • GAUSS 6.0 has many more licensed applications available. See the Manuals for the details.
  • Many statistical programs have undergone new releases. See the Manuals for the details. In particular, look at the manuals for Amelia, DSTPLAN, GAUSS, MATLAB, Ox, SAS, S-PLUS, Stat/Transfer and STATTAB.
  •  

    SSH Secure Shell Configuration

    Back to top

     

    SSH Secure Shell is required to establish secure connections to the SSCC over the network. Terminal emulation is provided with ssh, and secure file transfer is provided with sftp and scp.

    Incoming connections from telnet and ftp are not accepted because they are insecure. Outgoing connections are permitted.

    You should obtain the most recent SSH Secure Shell Client for Windows client binary and install it on your Windows system before connecting to the SSCC.

    Authentication should be configured to use the Keyboard Interactive method, which is a relatively recent addition. Simple Password authentication no longer works. See the SSCC document How to Apply for an Account for detailed configuration information.

    Host Names

    The host names for the SSCC were chosen as successors to the DCE hosts, so their names are quite similar. To further confuse matters, the DCE hosts still reside in the nwu.edu domain, which will disappear June 20, 2005.

    Use fully qualified host names when setting up your connections:

    SSCC Hosts                       DCE Hosts
    
    seldon.it.northwestern.edu       seldon.acns.nwu.edu
    hardin2.it.northwestern.edu      hardin.it.nwu.edu
    mule2.ipr.northwestern.edu       mule.ipr.nwu.edu
    

    ASCII Transfers

    Sometimes, ASCII text files will not be recognized, and they'll be transferred as binary files without end of line translation. They will have the wrong appearance in a program like Notepad. The end of line translation is performed on the basis of the file extension of the name of the file being transferred.

    You can configure SSH Secure Shell to do the right thing. Choose the menu Edit, and then Settings. Under Global Settings in the left window, choose Mode. Click ASCII in the File transfer mode box. Look at the box to the right, on the line ASCII extensions. You'll see a square graphic on the right side of that line. Click that graphic to insert an extension. Type the name of the file extension without a "." in the insertion box and press Enter. You might want to add file extensions like sas, mat, ado, do, por, log, and lst.

     

    Scratch Directory Policy Change

    Back to top

     

    On the SSCC hosts, files in the scratch directories will be removed after ten days. See Policies.

    On the DCE hosts, files remained untouched in the scratch directories unless there was a shortage of available space.

    The PATH to your scratch directory remains the same on the SSCC hosts: /scr01/NetID.

     

    Transferring Your Files

    Back to top

    It is your responsibility to transfer your files to the SSCC. While most files can be copied without requiring additional attention, SAS and SPSS files may require processing on the DCE machines before the transfer.

    Files that should NOT be Transferred

    Do not transfer the hidden files (with names beginning with a ".") from your DCE home directory into your SSCC home directory. Those "dot-files" are system-specific, and are incompatible with the SSCC "dot-files" of the same name.

    File Transfers with sftp

    The sftp secure ftp client is used for secure file transfer over the network. The entire session is encrypted, starting with the login and including the transfer of your files. The best way to transfer your files is to first login to an SSCC host and then initiate the transfer session from there.

    Note that sftp copies the files from your DCE account. It does not move them.

    Login with ssh to hardin2.it.northwestern.edu or seldon.it.northwestern.edu.

    Make sure that you use your NetID password to login.

    You should transfer your files directly into your SSCC home directory to preserve your DCE file paths. The SSCC home directory structure mirrors the DCE home directory structure to make it easier to run programs transferred from DCE.

    Start sftp to one of the HP DCE systems. In the following example, I have first logged in to the new SSCC system hardin2.it.northwestern.edu, and then I sftp to the old HP DCE system on hardin.it.nwu.edu: I entered my DCE password at the prompt.

    [bef@hardin2 bef]$ sftp bef@hardin.it.nwu.edu
    bef@hardin.it.nwu.edu's password:
    
    Next, I transfer the directory Demo and all of its contents from my DCE account to my new SSCC account:
    sftp> get -p Demo
    demo.dat.gz       | 294B   | 294B/s  | TOC: 00:00:01 | 100%
    demo.log          | 4.6kB  | 4.6kB/s | TOC: 00:00:01 | 100%
    demo.lst          | 20kB   | 20kB/s  | TOC: 00:00:01 | 100%
    demo.por          | 5.8kB  | 5.8kB/s | TOC: 00:00:01 | 100%
    demo.por.gz       | 3.0kB  | 3.0kB/s | TOC: 00:00:01 | 100%
    demo.sas          | 2.3kB  | 2.3kB/s | TOC: 00:00:01 | 100%
    demo.sps          | 111B   | 111B/s  | TOC: 00:00:01 | 100%
    demo.sps.lst      | 4.2kB  | 4.2kB/s | TOC: 00:00:01 | 100%
    demobase.lst.gz   | 17kB   | 17kB/s  | TOC: 00:00:01 | 100%
    demobase.sps      | 6.6kB  | 6.6kB/s | TOC: 00:00:01 | 100%
    sftp> quit
    [bef@hardin2 bef]$ 
    

    By using the -p option in the command "get -p Demo" I preserve the creation dates and times of my files in the copies put on the SSCC disk. The directory Demo and all of its files were copied by the single sftp command.

    You can copy your entire DCE directory with the sftp command

    sftp> get -p *

    Transfers of 500 MB or more will take significant time and memory, as well as disk space. Please break up large transfers by using sftp to get directory trees one at a time — do not use the "get -p *" command when you have a large transfer to make.

     

    SSDS Data Library

    Back to top

    The Social Science Data Services data library known as /datalib has been completely transferred to the SSCC. Directory paths have been duplicated exactly, so that programs written for DCE machines will continue to access /datalib without modification. See Social Science Data Services.

     

    Directory Paths

    Back to top

    Symbolic links have been installed on the SSCC hosts to preserve the behavior of programs written for the DCE systems using absolute directory paths. Paths to home directories that start with /.../nwu.edu/fs/home will continue to work. Paths from /datalib will continue to work.

    The new SSCC paths to home directories start with /sscc/home. As an example, the path to user bef is ~bef or /sscc/home/b/bef. The DCE-style path /.../nwu.edu/fs/home/b/bef also works. Type echo $HOME to see the path to your home directory.

    Scratch directory paths of the form /scr01/NetID are provided, preserving the DCE scratch directory structure. Note, however, that files in the SSCC scratch directories are automatically removed after ten days. See Policies.

    The online backup directory paths starting with /.../nwu.edu/fs/backup will not work. This online backup feature is not available under NFS in the SSCC.

    Applications software is stored in a variety of places in the SSCC, and applications may have moved. Likely places to look are:

    /sscc/opt
    /sscc/opt/local
    /opt
    /usr
    /usr/local
    

    Use the whereis and which commands to find program locations.

     

    SAS Version 6.12 Datasets

    Back to top

    SAS Version 6.12 datasets are files with .ssd01 filename extensions.These datasets, created with SAS internal compression, cannot be used on the SSCC Linux systems.

    They must be converted to a portable format before transferring them elsewhere.

    If you plan to transfer your datasets to the SSCC, you can copy those datasets to SAS Version 8 datasets on a DCE system, and then transfer the v8 datasets to the SSCC. Once in the SSCC, you can make a similar conversion from v8 to v9 datasets to take advantage of even newer features. See the SAS Version 9 Migration site for more information about the v9 dataset features.

    If you plan to transfer your datasets to some other operating system environment, you might consider transforming your dataset libraries into transport datasets on a DCE system. An entire library of SAS datasets is put into a single transport dataset file. The SAS transport datasets are designed to be easy to use with SAS of many versions, on any supported operating system.

    Copy to SAS version 8 Datasets

    A SAS library is merely a directory containing SAS datasets. Libraries can be designated as Version 6 libraries (v6) or Version 8 libraries (v8). The recommended approach is to create a new (empty) directory for your v8 library. Then use the SAS PROC COPY to copy datasets from your v6 library to your v8 library. The resulting datasets in your v8 library will be SAS Version 8 datasets with the file extension .sas7bdat. Transfer only the datasets in your v8 library to the SSCC.

    In this example, we have an existing SAS v6 library named Labor. We created a new library named NewLabor for the SAS v8 datasets with the UNIX command mkdir NewLabor.

    The SAS program to copy the datasets is quite simple:

    libname old v6 '~/Labor';
    libname new v8 '~/NewLabor';
    	  
    proc copy in=old out=new memtype=data;
    run;

    You can learn lots more about converting from Version 6 to Version 8 datasets from the SAS Technical Support Document TS-628, Version 6 and Version 8: A Peaceful Co-Existence.

    Copy to SAS Transport File

    SAS Transport files are used to transport a library of SAS datasets to other computers without knowing the operating system or version of SAS in advance. Because only one file contains many datasets, it is convient to send over the network. On the other hand, since social science datasets can be large, a transport file many of those datasets can be huge.

    In this example, we have an existing SAS v6 library named Labor. We will create a SAS Transport file named TransLabor.xpt in the following SAS program:

    libname old v6 '~/Labor';
    libname tran xport '~/TransLabor.xpt';
    	  
    proc copy in=old out=tran memtype=data;
    run;

    Then use sftp to copy the transport file named TransLabor.xpt somewhere else.

    You can specify individual datasets to place in the transport file with SELECT statements:

    proc copy in=old out=tran memtype=data;
    select dsname;       /* the name of a SAS data set */
    run;

    Once you have transferred your transport file to another computer, you will have to "unpack" the file into individual SAS datasets within an existing library, with a variation on the following program (designed in this case for SAS Version 9 on a UNIX system). The directory Labor has to be created before running SAS:

    libname new v9 '~/Labor';
    libname tran xport '~/TransLabor.xpt';
    	  
    proc copy in=tran out=new;
    run;

     

    SPSS Data Files

    Back to top

     

    We have SPSS Version 6.1.4 on the DCE UNIX systems. It dates back to 1999, and parts are much older than that. You should convert your old SPSS data files (with .sav file extensions) to SPSS portable files (with .por file extensions) before the DCE systems are shut down. SPSS portable files have a much longer expected lifetime than the old SPSS data files from SPSS 6.1.4.

    SPSS is not available in the SSCC, for budgetary reasons.

    You can read SPSS portable files directly with SAS, and you can also use Stat/Transfer to convert SPSS portable files into files that can be used by Stata and other programs.

    If you transfer SPSS portable files between Windows and UNIX systems, you may run afoul of the different end of line conventions used by the two systems. On the SSCC systems, the programs dos2unix and unix2dos will perform the end of line conversions.

    Here is an example SAS program that reads an SPSS portable file using PROC CONVERT. The additional DATA STEP eliminates problem formats. See SAS Support Note SN-011764.

    filename myfile 'spss_file.por';
    proc convert spss=myfile out=new;
    		
    data new;
      set new;
      format _all_;
    run;

     

    Printing

    Back to top

     

    Spooled laserprinter queues have been configured for stable printers that are attached to the network and are always powered on. Obtain a list of these printers with the UNIX command lpstat -v. You'll see a list of familiar names, including cresap115, duncanhp, marx and stats. These "device names" are the names to use when you specify a destination to the lp command with the -d option. Additional laserprinters can be configured upon request.

    If you always use the same printer, you need not always specify the -d option to the lp command. Edit the file ~/.bash_profile (one of those dot-files in your home directory), and look for the following line at the end of the file:

    export LPDEST=not_defined

    Change "not_defined" to specify the laserprinter of your choice (e.g. cresap115). This change will take effect the next time you login. Keeping your current session open, test your change by opening a second session. Type the UNIX command echo $LPDEST and you should see the name of the printer you specified displayed on your screen. Note that this procedure is case-sensitive, so type the echo command exactly as it's specified above, all on one line.

    Laserprinter    Location
    	  
    cresap115       Cresap Hall rm 115
    marx            Economics Department computer lab
    stats           Statistics Department
    duncanhp        IPR
    

    Printing on your Windows computer

    You can also use SSH Secure Shell to transfer your files to your PC for printing.

    The procedure is to use your terminal window and UNIX cd commands to navigate to the subdirectory desired.

    Then pull down the Window menu and choose New File Transfer in Current Directory. A three-part window will open. The files listed in the box on the left are your PC files, and the box on the right are your UNIX files. The box at the bottom shows file transfer information.

    Next, pull down the Operation menu, and find File Transfer Mode at the bottom. Choose Auto select. This choice will convert UNIX end of line characters to Windows end of line characters for files with ASCII extensions. Note that conversion is based on a pre-defined list of file extensions, which you can configure.

    Look at your UNIX files, and double-click the file you want to print. The file will be transferred to your PC and Notepad will open with the contents of your file. Choose the Notepad menu File and then Print to print the file to your PC printer.

    Certain file extensions may correspond to programs installed on your PC. If your PC opens the wrong program when you use this method to print, you'll have to manually transfer the file to your PC (drag and drop) and then manually start Notepad to open the file and print it.

    Sometimes, ASCII text files will not be recognized by the transfer, and they'll be transferred as binary files without end of line translation. They will have the wrong appearance in a program like Notepad. The end of line translation is performed on the basis of the file extension of the name of the file being transferred. See SSH Secure Shell Configuration (ASCII Transfers) to fix this problem.

     

    Changes Related to the Shell

    Back to top

     

    The SSCC default shell is the bash shell, which is an sh-compatible command interpreter that incorporates features from the Korn and C shells (ksh and csh). The default shell in the DCE cell was the C shell.

    The first thing you'll notice when you login to an SSCC system is that your command prompt has changed. The command number is no longer printed. Your working directory is printed instead.

    You can now edit commands using the arrow keys on your keyboard, and the up- and down-arrows navigate through your command history. You can change the editing interface to vi or emacs with the commands set -o vi and set -o emacs. It's easier to search your command history file with vi or emacs commands than it is to scroll backward endlessly with the arrow keys.

    Filename expansion is accomplished with the TAB key, not the Escape key. The TAB key also performs command name expansion. Type the command new immediately followed by tapping the TAB key twice (with no spaces), and you'll get a list of five commands starting with new,

    The help command will give you hints about various shell commands. Try typing help alias to see the syntax of the alias command.

    The file ~/.bash_profile is read once per job, at login time. Put global shell variable definitions in your .bash_profile. The global shell variable LPDEST is defined there, to set your default printer. Define your shell aliases here.

    The file ~/.bashrc is read each time a bash shell is started (which is frequent). Use this file to define shell variables and options that are not persistent between shell invocations.

    The file ~/.bash_logout is executed when you logout. As provided, it clears your screen. You can change that behavior by editing that file.

    The ls command uses colors to distinguish file types. Normal files are uncolored. Here's a short list of the colors employed:

    File           color
    
    normal         none
    directory      blue
    symlink        cyan
    pipe           yellow
    orphan         red with hilighting   (a symbolic link to a nonexistent file)
    x permission   green
    .exe files     green   (file extensions of .exe, .cmd, .com, .bat)
    tar archives   red     (file extensions of .tar, .tgz, .zip, .gz, .. .Z)
    image files    magenta (file extensions of .jpg, .jpeg, .gif .. .wav, .mp3)
    

    Colors may be problematic if you have a nonstandard window background color, or if you're colorblind. You can disable the use of colors with an alias for your ls command: alias ls='ls --color=never'.

     

    X Windows

    Back to top

     

    CDE desktop access is not available. XDMCP connections are ignored.

    Desktop access of any kind is not provided. Use UNIX command-line access to run single-window applications, such as xstata, sas, Splus -g, matlab -desktop, mozilla and acroread.

     

    Fortran

    Back to top

     

    Two Fortran compilers are availble, the g77 GNU Project Fortran 77 compiler, and ifort, the Intel® Fortran compiler.

    See the Manuals page for Intel Fortran® Compiler documentation.

    The g77 documentation may be viewed online, with the commands man g77 and info g77.

     

    UNIX Applications

    Back to top

     

    Some traditional UNIX applications have been replaced:

    HP-UX     linux
    
    dtterm    xterm
    elm       mutt
    lynx      elinks
    mailx     mail
    netscape  mozilla
    pico      nano
    pine      mutt
    qedit     joe

     

    Computer and Network Security

    E-mail, NetID, and Password

    Hardware

    Listserv

    Network Services

    NUTV and TV Services

    Policies and Guidelines

    Reserve a Facility

    Service Status

    Software

    Telephone Services

    Videoconferencing Services

    Web Publishing Services

    Webcasting

    Webmail

    Off-campus Connections

    Safe access to the NU Network (VPN)

    Wired Connection

    Wireless access

    Departmental Desktop and Server Support

    NUIT Help

    Student Support

    Computer Labs

    Course Management System (Blackboard)

    Learning Opportunities

    Smart Classrooms

    about NUIT

    Job Opportunities in NUIT

    News, Press, and Publications

    What's New & Changing with Technology @ NU?