Skip to main content

FAQs

Access

Why is my access to Shaheen blocked with a message about not being in authorised IP range?

When you first applied for an account on Shaheen, you submitted on the application a list of IPs or host names that you wished to connect to Shaheen from. These IPs were assigned to your Shaheen account when it was created. You are seeing the error message because the IP you are currently connecting from is not included in the list of IPs associated with your Shaheen account.

If at any time you change work location or sign up with another Internet provider, you will need to inform us of the new location that you need to login from. This can be a host name, a domain name, an IP address or an IP address range. Please contact help@hpc.kaust.edu.sa, letting us know the new location(s) that you wish to login from.

I am a new user. Where can I find my OTP seed for Shaheen?

Your Shaheen two-factor OTP seed is stored on the website, and needs to be protected. Successful negotiation of the  security validation process allows the seed to be made temporarily visible.

Login to our website, where you will be asked to select a security Question and then type in an answer to that question. Once this is done, you can continue to the rest of the website. Go to the My KSL menu, and select View OTP Seed. You will then be asked to supply the answer to the Security Question that you were just asked.

On answering the question correctly, the QR code for the OTP seed will be displayed.

I've changed my phone. How can I scan the OTP seed again?

Your Shaheen two-factor OTP seed is stored on the website, and needs to be protected. Successful negotiation of the  security validation process allows the seed to be made temporarily visible.

Login to our website, go to the My KSL menu, and select View OTP Seed. You will then be asked to supply the answer to the Security Question that you were asked when you logged into the website for the first time.

On answering the question correctly, the QR code for the OTP seed will be displayed.

Connecting

Why can't I login to Shaheen?

There are many stages in the login process that can fail. SSH login to Shaheen uses two factor authentication, meaning you are verified using something you know (password or ssh key) and something you have (mobile device with OTP). In addition to this, there are extra layers that must be validated before access is successful:

  • Active Directory account is not locked (usually caused by too many incorrect password entries).
  • Active Directory account is not disabled (usually caused by AD account having reached expiry date)
  • You must be logging in from an IP range or domain that was approved either as part of your Shaheen application or requested at a later date.

If your password attempt fails, the reason may be obvious:

  • Did you fail to provide us when requested with an up-to-date ID in order to extend your Shaheen account? If so, your account may now be locked.
  • Did your login failure message indicate that you were trying to login from an unauthorised IP address? If so, you need to contact us to have that IP address added.
  • Is Shaheen under maintenance? If Shaheen is under maintenance, you might still pass the login process but your access will still be blocked until Shaheen is back in production. You should have received downtime notificatio0ns by email. If in doubr, check the recent Newsletter on this website.

Please note that if you entered a password, and then entered your OTP and then your login failed, this does not mean that your password was correct and your OTP was incorrect. Either or both could be incorrect.

You can check the status of your Active Directory account by using selecting Check AD Account from the My KSL menu. Note, however that if your account is locked, you will not be able to use this link because it requires authentication into this website first. However, you can ask a colleague to check for you.

If you are not sure why your login attempt was rejected, please contact us and we will investigate.

Data Transfer

What is the fastest way to transfer files to/from remote filesystems?

File Transfer Guidelines for Shaheen

  1. VPN Limitations
    VPNs are not designed for large file transfers. However, cybersecurity compliance is mandatory and beyond KSL’s control.
  2. Recommended Method: Globus
    For large file transfers, including Terabytes of data, use Globus:
  • Easy setup with Globus Connect Personal allows transfers between Shaheen and other systems, even behind firewalls or NAT.
  • Supports transfers without administrative privileges.

    Get started:

    • Download and install Globus Connect Personal: https://www.globus.org/globus-connect-personal
    • After installation, log in to your Globus account.
      • Off-campus users: Search for dtn5 to connect to dtn5.hpc.kaust.edu.sa (Data Mover Server).
        • Available endpoints: shaheen dtn5:project and shaheen dtn5:scratch.
      • On-campus users: Search for dtn6 to connect to dtn6.hpc.kaust.edu.sa.
        • Available endpoints: shaheen dtn6:project and shaheen dtn6:scratch.
    • Authenticate and log in with your credentials to begin transferring files.
  1. Other Methods (SCP, SFTP)
  • Use SCP or SFTP only for small file transfers over the local network.
  • Not recommended for large files due to high resource usage on login nodes and poor performance.
  • Note: Initiate SCP/SFTP transfers from your machine as the KAUST firewall blocks outbound port 22.
What is the fastest way to copy files between local filesystems?

Distributed Copy

dcp or distributed copy is a MPI-based copy tool developed by Lawrence Livermore National Lab (LLNL) as part of their mpifileutils suite. We have installed it on Shaheen. Here is an example jobscript to launch a data moving job with dcp:

#!/bin/bash  
#SBATCH --ntasks=4 
#SBATCH --time=01:00:00 
#SBATCH --hint=nomultithread  
module load mpifileutils 
time srun -n ${SLURM_NTASKS} dcp --verbose --progress 60 --preserve /path/to/source/directory /path/to/destination/directory 

The above script launches dcp in parallel on with 4 MPI processes.

--progress 60 means that the progress of the operation will be reported every 60 seconds.
--preserve means that the ACL permissions, group ownership, timestamps and extended attributes will be preserved on the files in the destination directory as they were in the parent/source directory.

What is the most efficient way to delete a lot of files?

Using the standard Linux command rm to delete multiple files on a Lustre filesystem is not recommended. Huge numbers of files deleted with the rm command will be very slow since it will provoke an increased load on the metadata server, resulting in instabilities with the filesystem, and therefore affecting all users.

It is recommended to use munlink, an optimized Lustre-specific command, as in the following example:

find ./my_testrun1 -type f -print0 | xargs -0 munlink
  • find ./ my_testrun1 -type f : will search files (-type f)  in the directory my_testrun1 and all its subdirectories
  • | xargs -0 munlink : xargs will then convert the list of files, line by line, into an argument for munlink. The -0 flag is related to the format of the listed files; if you use -print0 in the find command you must use -0 in the xargs command.

Once all of the files are deleted, the directory and its subdirectories can be deleted as follows:

find ./my_testrun1 -type d -empty -delete

Filesystem

Why can't I read my Shaheen II project data on Shaheen III?

The Shaheen II project filesystem continues on Shaheen III. All Shaheen II project directories will remain on the project filesystem until November 28th 2024. After this date, KSL will remove directories that it identifies as non-essential.

Although Shaheen II project directories are available on Shaheen III, by design the Unix project groups are not. This means that you will not be able to access Shaheen II project data from Shaheen III without help from KSL.

Can data be copied from my Shaheen II project directory to my new Shaheen III project directory?  Yes, provided that you are the owner of the data. Please log a ticket with us clearly stating the source and destination directories.

Is there a deadline for copying of the project data? We can do this for you before or after the decommissioning of Shaheen II up until November 28th.

Can you change the ownership of the Shaheen II project directory to my new project group, so that I can copy the data myself? We will evaluate this on a case by case basis.


Quotas

It’s important to realise that your data in Shaheen II Project directories will affect your personal inode (number of files) quota on Shaheen III project directories. It is the same physical filesystem. Therefore, it is advisable to delete the files from the Shaheen II project directory as soon as you have migrated the necessary files to the Shaheen III Project directory. Otherwise, you will quickly exceed your allocated quota.

Where has my data in /scratch gone?

Any files under /scratch/<user-name> or /scratch/project/<project-name>  that have not been accessed in the last 60 days will be deleted automatically without any warning. These files are not backed up and therefore cannot be recovered from tape. 

Please remember to back up any important data to your /project/<project-name> folders, /home/<username> directories, or download to your workstations.

General

How do I book a tour of the Supercomputing facilities?

All tours of the Core Labs facilities must be booked centrally through the Core Labs booking service. Please do not try to contact the Supercomputing Lab directly to organise a tour. Use the link Tour Booking System or email corelabstour@kaust.edu.sa.

For further information, see https://corelabs.kaust.edu.sa/services/tours.

Why am I seeing a "can't set the locale" error message when I login to Shaheen?

If you see the following message when connecting into Shaheen from a Mac: 

/usr/bin/manpath: can’t set locale; make sure $LC_* and $LANG are correct 

... this is most likely due to a setting in the Terminal application that should be altered: Terminal -> Preferences -> Profiles -> Advanced -> Uncheck “Set locale environment variables on startup” You will then need to restart the Terminal application.

Job Scheduler

How do I check core hour usage for my project(s)?

There are a couple of ways to do this.

Allocations for the computational resources on KSL systems are managed using a tool called "sbank." Users have access to this system via the "sb" command:

$ sb <project-id>

From the website, you can check your project allocation and usage as graphs from the My KSL menu (must login first). 

Project Monthly Usage - display the number of core hours consumed for each month of the duration of the project.

Project Remaining Core Hours - show total core hours per project member, plus core hours remaining.

Please note that the website graph data is only collected once per day, so will not be absolutely up to date. Also, the data is generated in a different way from the "sb" command, so the numbers don't always match exactly.

How do I cancel a running job?

If you need to cancel a running job, you typically use the scancel command followed by the job id:

$ squeue -u $USER

         JOBID PARTITION         NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)
      14185711     workq     my_jobName  bloggs  RUNNING      43:53   2:00:00      1 nid00312

You can cancel it using its name instead of its id with the following scancel command option.

$ scancel --name my_jobName

This is also a convenient way to kill/cancel a set of jobs sharing the same name or to avoid canceling an unwanted job by choosing a wrong id after misreading the output of the squeue command.

Why is my job not running?

When the estimated start time of your pending job is not available, you can get more details and reasons for your job not running:

By typing squeue --job <jobid >–l , you will get the following output along with the reason for your job not running.

           JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)
           110000   workq 8-tuned_   user1  PENDING       0:00 3-00:00:00      1 (AssocGrpCPUMinutesLimit)

 

Here are the most common reasons. These codes identify the reason that a job is waiting for execution.  A job may be waiting for more than one reason, in which case only one of those reasons is displayed.        

 

AssocGrpCPUMinutesLimit 

This job is waiting for a dependent job to complete.

Cleaning

The job is being requeued and still cleaning up from its previous execution.        

Dependency

This job is waiting for a dependent job to complete.

JobHeldAdmin

The job is held by a system administrator

JobHeldUser          

The job is held by the user

NodeDown

A node required by the job is down.

Priority

One or more higher priority jobs exist for this partition or advanced reservation. Other jobs in the queue have higher priority than yours.

QOSGrpNodeLimit
The maximum number of nodes available to the partition are in use.

QOSUsageThreshold

Required QOS threshold has been breached

ReqNodeNotAvail

No nodes can be found satisfying your limits, for instance because maintenance is scheduled and the job can not finish before it

Reservation

The job is waiting for its advanced reservation to become available.

Resources

The job is waiting for resources (nodes) to become available and will run when Slurm finds enough free nodes.

SystemFailure

Failure of the SLURM system, a file system, the network, etc.

SSH

Is there a way to avoid entering the OTP every time I login?

If you login once using your OTP, if you keep that connection always open, you can have subsequent sessions "piggy-back" off your established session, without having to enter your OTP again. This requires a few extra lines in your $HOME/.ssh/config file.

This solution works when using ssh from the command line. If you are using a GUI ssh tool, there may be an equivalent solution within the tool connection preferences.

ControlMaster auto
ControlPath ~/.ssh/master-%r@%h:%p
ControlPersist 4h
How do I set up key-based SSH access to Shaheen?

SSH keys establish a unique communications path between a user's workstation and the server that s/he is logging into (Shaheen/Ibex). The public-private key pair is generated on the user's workstation:

  • The private key should exist only on the laptop/workstation of the user and NEVER be shared.
  • The public key should be copied to the $HOME/.ssh/authorized_keys file on the server(s) that you are logging into (Shaheen/Ibex).

This will allow password-less SSH authentication from client to server; however we strongly recommend that you protect your private key using a passphrase, which should be different from your Active Directory password. The private key must then be "unlocked" using the passphrase each time that you make a SSH connection. If you are logging in frequently, you may also wish to investigate the use of an ssh agent to avoid continually having to enter your passphrase.

Creating the private-public key pair

On your laptop/workstation you can use the following command:

ssh-keygen -t rsa -b 4096

This will create a public key (id_rsa.pub) and a private key (id_rsa) in your workstation/laptop $HOME/.ssh directory. For example:

$ ls -lh $HOME/.ssh/
-rw------- 1 user user 3.4K Jan 25 23:20 id_rsa
-rw-r--r-- 1 user user  738 Jan 25 23:20 id_rsa.pub

Uploading/copying the key to the Server (Shaheen/Ibex)

There are two options to do this:

  1. Manually upload the PUBLIC key (id_rsa.pub).
  2. Use the ssh-copy-id command

Manually uploading the PUBLIC key (id_rsa.pub)

Open a terminal on your laptop/workstation.

Type:

cat ~/.ssh/id_rsa.pub

And copy the output.

Log into Shaheen or Ibex.

On Shaheen/Ibex, edit the authorized_keys file in your .ssh directory (don't worry if it doesn't exist):

vim ~/.ssh/authorized_keys

And paste what you copied previously.

Save and exit.

Change the permissions for that new file:

chmod 0600 ~/.ssh/authorized_keys

You should be able to SSH into Shaheen/Ibex without being prompted for a password

Using the ssh-copy-id command:

This one is much easier.

You open a terminal in your laptop/workstation and type:

ssh-copy-id    -i    /home/user/.ssh/id_rsa.pub    user@shaheen.hpc.kaust.edu.sa

 

Why does my SSH connection to Shaheen keep dropping?

The most common cause of this is the client end, where your SSH client will drop the connection if it has been idle for a certain period of time. However, you can configure your client to send null packets at specified intervals to the server during idle periods to ensure that the session stays alive. This can be done from the ssh command line or can be permanently set in ${HOME}/.ssh/config:

ServerAliveInterval 120

For X11 timeout issues:

ForwardX11Timeout 596h