Staging Data
From WikIBEST
Contents |
[edit] Staging Files to and From the Central File Servers
As the number of servers and compute nodes in the bioinformatics core increases, the demand on our central file servers also increases and we have reached the physical limits of our current systems. Due to our limited operating budget and the expense of a high performance SAN, we are moving away from the central file servers on our clusters. This means that you will need to stage your files in to the cluster before your runs and stage them back out after the run has completed.
[edit] Things to remember about the cluster file storage
[edit] Automatic cleanup
- In order to maintain disk space on the cluster file system, files that have not been accessed and/or modified for 2 weeks will be automatically removed. If you need to keep them there longer, let us know and we will add your home directory to a list of exemptions.
[edit] No backups
- Even though the production clusters will use RAID technology to survive disk failure, there is still a risk of data loss from two main sources: Automatic cleaning and Users who inadvertently remove files. To guard against this be sure to move any modified files or output files back to the central file system.
[edit] How to access the central file servers from the cluster headnode
We have tried to make it as easy as possible to read and write files back to the central file servers.
- Each headnode will network mount the central file system.
- Each home directory will contain a link to this mount named "central" and it will be located at the root of your home directory.
- This link will not work when accessed from the compute nodes, you must copy the data in and out of the central file system while logged on to the headnode.
- To move data in and out you just have to use the copy application cp. If you really want to get fancy you can rsync specific directories as well (just make sure you know what you are doing).
[edit] Quick examples
[edit] Moving files from the central file server to the cluster
Move a single file from the central file server to your current working directory on the cluster:
cp ~/central/mydir/myfile .
Move a folder of files from the central file server to your current working directory on the cluster:
cp -r ~/central/mydir/myfolder .
You can replace the . that represents your current working directory in the commands with another path as well.
[edit] Moving files back to the central file server
Move a single file from the cluster to the central file system:
cp myfile ~/central/mydir
Move a folder of files from the cluster to the central file system:
cp -r myfolder ~/central/mydir
(Note: if mydir doesn't exist, then it will be the new name of myfolder when copied.)
