Spin Up Models on Alps | Swiss AI Research Platform

Quick Start

Download the script:

wget https://raw.githubusercontent.com/swiss-ai/model-spinning/refs/heads/main/spin-model.py -O spin-model && chmod 755 spin-model && mv spin-model ~/.local/bin/

Check your available SLURM accounts:

sacctmgr show associations user=$USER format=user,account%20

Launch a model:

# Launch Mistral 7B with tensor parallelism 2 for 30 minutes
spin-model --model mistralai/Mistral-7B-Instruct-v0.3 --tp-size 2 --time 30m --account YOUR_ACCOUNT

Usage

usage: spin-model [-h] [--model MODEL] [--time TIME] [--vllm] [--vllm-help]
                     [--sp-help] [--account ACCOUNT] [--env ENV]
                     [--environment ENVIRONMENT]

Launch a model on SLURM

optional arguments:
  -h, --help            show this help message and exit
  --model MODEL         Name of the model to launch
  --time TIME           Time duration for the job. Examples: 2h, 1h30m, 90m,
                        1:30:00
  --vllm                Use vllm instead of sp to serve the model
  --vllm-help           Show available options for **vllm** model server
  --sp-help             Show available options for **sp** model server
  --account ACCOUNT     Slurm account to use for job submission
  --env ENV             Specify environment variables in format KEY=VALUE
  --environment ENVIRONMENT
                        Specify a custom environment file path

Additional model-specific arguments can be passed after the main arguments.

Important Parameters

Model Serving Options

Model Server: By default, the script uses the sp model server. For certain architectures, you can use --vllm to switch to the vLLM server.

Documentation:

Scratchpad (sp) documentation
vLLM documentation

View these docs directly with:

spin-model --sp-help    # For sp server options
spin-model --vllm-help  # For vllm server options

Tensor Parallelism

The --tp-size parameter specifies the tensor parallelism size when a model is too large to fit on a single GPU:

Models < 2B parameters: --tp-size 1
Models < 14B parameters: --tp-size 2
Models < 45B parameters: --tp-size 3
Models < 90B parameters: --tp-size 4

Time Allocation

The --time parameter accepts various formats:

2h (2 hours)
1h30m (1 hour and 30 minutes)
90m (90 minutes)
1:30:00 (1 hour and 30 minutes in SLURM format)

Note: On Bristen nodes, time is limited to 1 hour maximum, while Clariden nodes allow up to 24 hours.

Environment Variables

The --env parameter allows you to specify custom environment variables for your model server. This is useful for:

Setting API keys (e.g., Hugging Face tokens)
Configuring model-specific parameters
Passing authentication credentials

You can specify multiple environment variables by using --env multiple times.

spin-model --model CohereLabs/aya-expanse-8b --tensor-parallel-size 2 --time 4h --account YOUR_ACCOUNT --vllm --env HF_TOKEN=hf_abcdef0123456789 --env OPENAI_API_KEY=sk-proj-rniovncziroeuHNOIniuonOIU --env GOOGLE_API_KEY=aoimrewopv_einworcxz

Model launch examples

# Apertus 70B - SwissAI model
spin-model --model /a10/swiapertus3ss-alignment/checkpoints/apertus3-70B-iter_90000-tulu3-sft/checkpoint-14000 \
    --served-model-name swissai/apertus3-70b-0425 \
    --account YOUR_ACCOUNT \
    --tp-size 4

# Standard model launches (using sp server)
# Gemma 3 12B - Latest Google model with strong performance
spin-model --model google/gemma-3-12b-it --tp-size 2 --time 4h --account YOUR_ACCOUNT

# Qwen 2.5 7B 
spin-model --model Qwen/Qwen2.5-7B-Instruct --tp-size 2 --time 4h --account YOUR_ACCOUNT

# DeepSeek 14B - Distilled version of Qwen for better efficiency
spin-model --model deepseek-ai/DeepSeek-R1-Distill-Qwen-14B --tp-size 2 --time 4h --account YOUR_ACCOUNT

# vLLM-only architectures (must use --vllm flag)
# Mistral 7B 
spin-model --model mistralai/Mistral-7B-Instruct-v0.3 --tensor-parallel-size 2 --time 4h --account YOUR_ACCOUNT --vllm

# Phi-3 Mini 
spin-model --model microsoft/Phi-3-mini-4k-instruct --tensor-parallel-size 2 --time 4h --account YOUR_ACCOUNT --vllm

# Gemma 2 9B - Previous generation Google model
spin-model --model google/gemma-2-9b-it --tensor-parallel-size 2 --time 4h --account YOUR_ACCOUNT --vllm

# Launch Aya Expanse 8B model with vLLM server with a custom variable
spin-model --model CohereLabs/aya-expanse-8b --tensor-parallel-size 2 --time 4h --account YOUR_ACCOUNT --vllm --env HF_TOKEN=hf_abcdef0123456789

Local Model Launch and Apertus

spin-model --model /a10/swiapertus3ss-alignment/checkpoints/apertus3-70B-iter_90000-tulu3-sft/checkpoint-14000 \
    --served-model-name swissai/apertus3-70b-0425 \
    --account YOUR_ACCOUNT \
    --tp-size 4

The --model parameter specifies the actual path to your model checkpoint. Please, make sure the environment (.toml file) has a mount point at /a10.

The --served-model-name parameter allows you to specify a user-friendly name for your model when it’s served.

After Submission

Once your job is submitted, you’ll see:

Job ID
Commands to check job status and logs