Quick Start

This guide walks you through a complete Chiltepin workflow, from setting up an endpoint to submitting tasks.

Overview

Chiltepin is a collection of tools for implementing distributed exascale numerical weather prediction workflows using Parsl and Globus Compute.

Warning

This collection of resources is not intended for use in operational production environments, and is for research purposes only.

Prerequisites

Before starting, ensure you have:

Installed Chiltepin (see Installation)
Access to an HPC system (or use local execution for testing)
A Globus account and a web browser for Globus authentication

Complete Workflow Example

This example demonstrates the full workflow: configure an endpoint, start it, and submit tasks.

Step 1: Authenticate

First, log in to Globus services. This should be done on the machine where you want to run tasks:

$ chiltepin login

This opens a browser for authentication or, if one is not available, provides a URL to complete the authentication manually. Follow the prompts to authorize Chiltepin.

Step 2: Configure an Endpoint

Create a new Globus Compute endpoint to which you will submit tasks. This should be done on the machine where you want to run tasks:

$ chiltepin endpoint configure my-endpoint

This creates the endpoint configuration in ~/.globus_compute/my-endpoint/.

Step 3: Start the Endpoint

Launch the endpoint:

$ chiltepin endpoint start my-endpoint

The endpoint will register with Globus Compute and begin accepting tasks.

Step 4: Get the Endpoint UUID

Retrieve your endpoint’s UUID:

$ chiltepin endpoint list

Example output:

my-endpoint  a1b2c3d4-1234-5678-90ab-cdef12345678  Running

Note the UUID (a1b2c3d4-1234-5678-90ab-cdef12345678) for the next step.

Step 5: Create a Configuration File

Create my_config.yaml with your endpoint UUID:

# Laptop/workstation resource for small tasks
laptop:
  provider: "localhost"
  init_blocks: 1
  max_blocks: 1

# Remote endpoint for HPC tasks
remote:
  endpoint: "a1b2c3d4-1234-5678-90ab-cdef12345678"  # Use your UUID
  provider: "slurm"
  cores_per_node: 4
  nodes_per_block: 1
  partition: "compute"
  account: "myproject"
  walltime: "00:30:00"
  environment:
    - "module load python/3.11"

Replace the endpoint UUID with your actual UUID from Step 4.

Step 6: Write Your Workflow

Create my_workflow.py:

from chiltepin import run_workflow
from chiltepin.tasks import bash_task, python_task

# Define tasks
@python_task
def hello_local():
    import platform
    return f"Hello from {platform.node()}"

@bash_task
def hello_remote():
    return "hostname"

@python_task
def compute_task(n):
    """Simple computation task"""
    result = sum(i**2 for i in range(n))
    return result

if __name__ == "__main__":
    # Load configuration and run workflow
    with run_workflow("my_config.yaml", include=["laptop", "remote"], run_dir="./runinfo"):
        # Run local task on "laptop" resource
        local_future = hello_local(executor=["laptop"])

        # Run remote bash task on "remote" resource (returns exit code: 0 = success)
        remote_future = hello_remote(executor=["remote"])

        # Run multiple compute tasks on "remote" resource
        futures = [compute_task(i, executor=["remote"]) for i in range(1, 5)]

        # Get the results
        print(f"Local: {local_future.result()}")
        print(f"Remote exit code: {remote_future.result()}")
        print(f"Computation results: {[f.result() for f in futures]}")

        print("All tasks completed!")

Step 7: Run Your Workflow

Execute the workflow:

$ python my_workflow.py

Expected output:

Local: Hello from my-laptop.local
Remote exit code: 0
Computation results: [0, 1, 5, 14]
All tasks completed!

Step 8: Stop the Endpoint

When finished:

$ chiltepin endpoint stop my-endpoint

Note

Endpoints automatically scale down resources after idle periods, so manual stopping is optional.

Local-Only Quickstart

For testing without an HPC system, use local execution:

Configuration File (`local_config.yaml`)

laptop:
  provider: "localhost"
  init_blocks: 1
  max_blocks: 1

Simple Workflow (`simple_workflow.py`)

from chiltepin import run_workflow
from chiltepin.tasks import bash_task, python_task

# Define tasks
@python_task
def multiply(a, b):
    return a * b

@bash_task
def system_info():
    return "echo 'Task completed successfully'"

if __name__ == "__main__":
    # Load configuration and run workflow
    with run_workflow("local_config.yaml", run_dir="./runinfo"):
        result = multiply(6, 7, executor=["laptop"]).result()
        print(f"6 * 7 = {result}")

        exit_code = system_info(executor=["laptop"]).result()
        print(f"Bash task exit code: {exit_code}")

Run it:

$ python simple_workflow.py

Working with MPI Tasks

Chiltepin supports MPI applications on HPC systems:

Configuration (`mpi_config.yaml`)

mpi-resource-name:
  endpoint: "your-endpoint-uuid"
  mpi: True
  max_mpi_apps: 2
  mpi_launcher: "srun"
  provider: "slurm"
  cores_per_node: 128
  nodes_per_block: 4
  partition: "compute"
  account: "myproject"
  walltime: "01:00:00"
  environment:
    - "module load openmpi/4.1"
    - "export MPIF90=$MPIF90"

MPI Workflow

from chiltepin import run_workflow
from chiltepin.tasks import bash_task

@bash_task
def compile_mpi():
    return "$MPIF90 -o mpi_app mpi_app.f90"

@bash_task
def run_mpi():
    return "$PARSL_MPI_PREFIX ./mpi_app"

if __name__ == "__main__":
    with run_workflow("mpi_config.yaml", run_dir="./runinfo"):
        # Compile MPI application on the MPI resource (returns exit code)
        compile_result = compile_mpi(executor=["mpi-resource-name"]).result()
        print(f"Compilation exit code: {compile_result}")

        # Run with different geometries on the MPI resource
        # Small run: 1 node, 4 ranks
        result1 = run_mpi(
            executor=["mpi-resource-name"],
            chiltepin_task_geometry={"num_nodes": 1, "num_ranks": 4, "ranks_per_node": 4}
        ).result()
        print(f"Small run exit code: {result1}")

        # Medium run: 2 nodes, 16 ranks
        result2 = run_mpi(
            executor=["mpi-resource-name"],
            chiltepin_task_geometry={"num_nodes": 2, "num_ranks": 16, "ranks_per_node": 8}
        ).result()
        print(f"Medium run exit code: {result2}")

        # Large run: 4 nodes, 64 ranks
        result3 = run_mpi(
            executor=["mpi-resource-name"],
            chiltepin_task_geometry={"num_nodes": 4, "num_ranks": 64, "ranks_per_node": 16}
        ).result()
        print(f"Large run exit code: {result3}")

Key Concepts

Resources

Resources define where and how tasks run:

Local: Runs on the current machine
HPC: Submits jobs to schedulers (Slurm, PBS Pro)
Globus Compute: Runs on remote endpoints

See Configuration for detailed resource configuration options.

Task Decorators

Chiltepin provides three task decorators to define workflow tasks:

@python_task: Execute Python functions
@bash_task: Execute shell commands (returns exit code)
@join_task: Coordinate multiple tasks without blocking

When calling a task, use the executor parameter to specify which resource to use:

@python_task
def my_task():
    return "result"

# Specify which resource to use
result = my_task(executor=["compute"]).result()

The executor value must match a resource name from your configuration file.

For MPI tasks, you can also use the chiltepin_task_geometry parameter to specify parallel resource requirements:

@bash_task
def run_mpi_app():
    return "$PARSL_MPI_PREFIX ./my_app"

# Specify MPI geometry (nodes, ranks, ranks per node)
result = run_mpi_app(
    executor=["mpi-resource"],
    chiltepin_task_geometry={"num_nodes": 2, "num_ranks": 16, "ranks_per_node": 8}
).result()

Configuration Loading

The include parameter selects specific resources to load from the configuration.

Loading from a file:

# Load only specific resources from YAML file
with run_workflow(
    "my_config.yaml",
    include=["laptop", "compute"],  # Only these resources
    run_dir="./runinfo"
):
    # Run tasks using selected resources
    result = my_task(executor=["compute"]).result()

Loading from a dict:

# Define configuration as a dictionary
config = {
    "laptop": {
        "provider": "localhost",
        "cores_per_node": 4,
    },
    "compute": {
        "provider": "slurm",
        "partition": "compute",
        "nodes_per_block": 1,
    }
}

# Load only specific resources from dict
with run_workflow(
    config,
    include=["laptop", "compute"],  # Only these resources
    run_dir="./runinfo"
):
    # Run tasks using selected resources
    result = my_task(executor=["compute"]).result()

If include is omitted, all resources in the configuration are loaded.

Directory Structure

After running workflows, you’ll see:

.
├── my_config.yaml              # Configuration file
├── my_workflow.py              # Workflow script
└── runinfo/                    # Parsl runtime directory
    ├── 000/                     # Run directory
    │   ├── local/               # Local resource files
    │   ├── remote/              # Remote resource files
    │   └── submit_scripts/      # Job submission scripts
    └── parsl.log                # Parsl log file

The runinfo directory contains execution logs, job scripts, and task outputs.

Troubleshooting

Tasks Not Running

Verify endpoint is running: chiltepin endpoint list
Check you’re using the correct endpoint UUID
Review logs in runinfo/ directory
Check endpoint logs: ~/.globus_compute/my-endpoint/endpoint.log

Authentication Expired

$ chiltepin logout
$ chiltepin login

Configuration Errors

Validate your YAML syntax:

import yaml
with open("my_config.yaml") as f:
    config = yaml.safe_load(f)
    print(config)

Resource Limits

If jobs fail to start:

Check partition/queue names
Verify account/project is valid
Confirm node/core requests are within limits
Machine may be busy and resource pool job may be pending or may be full

Next Steps

Comprehensive task documentation: Tasks
Detailed configuration options: Configuration
Endpoint management: Endpoint Management
Run the test suite: Testing
Set up Docker environment: Docker Container
Explore the API: API Reference