Quick Start =========== This guide walks you through a complete Chiltepin workflow, from setting up an endpoint to submitting tasks. Overview -------- Chiltepin is a collection of tools for implementing distributed exascale numerical weather prediction workflows using Parsl and Globus Compute. .. warning:: This collection of resources is not intended for use in operational production environments, and is for research purposes only. Prerequisites ------------- Before starting, ensure you have: 1. Installed Chiltepin (see :doc:`installation`) 2. Access to an HPC system (or use local execution for testing) 3. A `Globus account `_ and a web browser for Globus authentication Complete Workflow Example -------------------------- This example demonstrates the full workflow: configure an endpoint, start it, and submit tasks. Step 1: Authenticate ^^^^^^^^^^^^^^^^^^^^ First, log in to Globus services. This should be done on the machine where you want to run tasks: .. code-block:: bash $ chiltepin login This opens a browser for authentication or, if one is not available, provides a URL to complete the authentication manually. Follow the prompts to authorize Chiltepin. Step 2: Configure an Endpoint ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Create a new Globus Compute endpoint to which you will submit tasks. This should be done on the machine where you want to run tasks: .. code-block:: bash $ chiltepin endpoint configure my-endpoint This creates the endpoint configuration in ``~/.globus_compute/my-endpoint/``. Step 3: Start the Endpoint ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Launch the endpoint: .. code-block:: bash $ chiltepin endpoint start my-endpoint The endpoint will register with Globus Compute and begin accepting tasks. Step 4: Get the Endpoint UUID ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Retrieve your endpoint's UUID: .. code-block:: bash $ chiltepin endpoint list Example output: .. code-block:: text my-endpoint a1b2c3d4-1234-5678-90ab-cdef12345678 Running Note the UUID (``a1b2c3d4-1234-5678-90ab-cdef12345678``) for the next step. Step 5: Create a Configuration File ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Create ``my_config.yaml`` with your endpoint UUID: .. code-block:: yaml # Laptop/workstation resource for small tasks laptop: provider: "localhost" init_blocks: 1 max_blocks: 1 # Remote endpoint for HPC tasks remote: endpoint: "a1b2c3d4-1234-5678-90ab-cdef12345678" # Use your UUID provider: "slurm" cores_per_node: 4 nodes_per_block: 1 partition: "compute" account: "myproject" walltime: "00:30:00" environment: - "module load python/3.11" Replace the endpoint UUID with your actual UUID from Step 4. Step 6: Write Your Workflow ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Create ``my_workflow.py``: .. code-block:: python from chiltepin import run_workflow from chiltepin.tasks import bash_task, python_task # Define tasks @python_task def hello_local(): import platform return f"Hello from {platform.node()}" @bash_task def hello_remote(): return "hostname" @python_task def compute_task(n): """Simple computation task""" result = sum(i**2 for i in range(n)) return result if __name__ == "__main__": # Load configuration and run workflow with run_workflow("my_config.yaml", include=["laptop", "remote"], run_dir="./runinfo"): # Run local task on "laptop" resource local_future = hello_local(executor=["laptop"]) # Run remote bash task on "remote" resource (returns exit code: 0 = success) remote_future = hello_remote(executor=["remote"]) # Run multiple compute tasks on "remote" resource futures = [compute_task(i, executor=["remote"]) for i in range(1, 5)] # Get the results print(f"Local: {local_future.result()}") print(f"Remote exit code: {remote_future.result()}") print(f"Computation results: {[f.result() for f in futures]}") print("All tasks completed!") Step 7: Run Your Workflow ^^^^^^^^^^^^^^^^^^^^^^^^^^ Execute the workflow: .. code-block:: bash $ python my_workflow.py Expected output: .. code-block:: text Local: Hello from my-laptop.local Remote exit code: 0 Computation results: [0, 1, 5, 14] All tasks completed! Step 8: Stop the Endpoint ^^^^^^^^^^^^^^^^^^^^^^^^^^ When finished: .. code-block:: bash $ chiltepin endpoint stop my-endpoint .. note:: Endpoints automatically scale down resources after idle periods, so manual stopping is optional. Local-Only Quickstart --------------------- For testing without an HPC system, use local execution: Configuration File (``local_config.yaml``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: yaml laptop: provider: "localhost" init_blocks: 1 max_blocks: 1 Simple Workflow (``simple_workflow.py``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: python from chiltepin import run_workflow from chiltepin.tasks import bash_task, python_task # Define tasks @python_task def multiply(a, b): return a * b @bash_task def system_info(): return "echo 'Task completed successfully'" if __name__ == "__main__": # Load configuration and run workflow with run_workflow("local_config.yaml", run_dir="./runinfo"): result = multiply(6, 7, executor=["laptop"]).result() print(f"6 * 7 = {result}") exit_code = system_info(executor=["laptop"]).result() print(f"Bash task exit code: {exit_code}") Run it: .. code-block:: bash $ python simple_workflow.py Working with MPI Tasks ---------------------- Chiltepin supports MPI applications on HPC systems: Configuration (``mpi_config.yaml``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: yaml mpi-resource-name: endpoint: "your-endpoint-uuid" mpi: True max_mpi_apps: 2 mpi_launcher: "srun" provider: "slurm" cores_per_node: 128 nodes_per_block: 4 partition: "compute" account: "myproject" walltime: "01:00:00" environment: - "module load openmpi/4.1" - "export MPIF90=$MPIF90" MPI Workflow ^^^^^^^^^^^^ .. code-block:: python from chiltepin import run_workflow from chiltepin.tasks import bash_task @bash_task def compile_mpi(): return "$MPIF90 -o mpi_app mpi_app.f90" @bash_task def run_mpi(): return "$PARSL_MPI_PREFIX ./mpi_app" if __name__ == "__main__": with run_workflow("mpi_config.yaml", run_dir="./runinfo"): # Compile MPI application on the MPI resource (returns exit code) compile_result = compile_mpi(executor=["mpi-resource-name"]).result() print(f"Compilation exit code: {compile_result}") # Run with different geometries on the MPI resource # Small run: 1 node, 4 ranks result1 = run_mpi( executor=["mpi-resource-name"], chiltepin_task_geometry={"num_nodes": 1, "num_ranks": 4, "ranks_per_node": 4} ).result() print(f"Small run exit code: {result1}") # Medium run: 2 nodes, 16 ranks result2 = run_mpi( executor=["mpi-resource-name"], chiltepin_task_geometry={"num_nodes": 2, "num_ranks": 16, "ranks_per_node": 8} ).result() print(f"Medium run exit code: {result2}") # Large run: 4 nodes, 64 ranks result3 = run_mpi( executor=["mpi-resource-name"], chiltepin_task_geometry={"num_nodes": 4, "num_ranks": 64, "ranks_per_node": 16} ).result() print(f"Large run exit code: {result3}") Key Concepts ------------ Resources ^^^^^^^^^ Resources define where and how tasks run: - **Local**: Runs on the current machine - **HPC**: Submits jobs to schedulers (Slurm, PBS Pro) - **Globus Compute**: Runs on remote endpoints See :doc:`configuration` for detailed resource configuration options. Task Decorators ^^^^^^^^^^^^^^^ Chiltepin provides three task decorators to define workflow tasks: - ``@python_task``: Execute Python functions - ``@bash_task``: Execute shell commands (returns exit code) - ``@join_task``: Coordinate multiple tasks without blocking When calling a task, use the ``executor`` parameter to specify which resource to use: .. code-block:: python @python_task def my_task(): return "result" # Specify which resource to use result = my_task(executor=["compute"]).result() The ``executor`` value must match a resource name from your configuration file. For MPI tasks, you can also use the ``chiltepin_task_geometry`` parameter to specify parallel resource requirements: .. code-block:: python @bash_task def run_mpi_app(): return "$PARSL_MPI_PREFIX ./my_app" # Specify MPI geometry (nodes, ranks, ranks per node) result = run_mpi_app( executor=["mpi-resource"], chiltepin_task_geometry={"num_nodes": 2, "num_ranks": 16, "ranks_per_node": 8} ).result() .. seealso:: For comprehensive documentation on defining and using tasks, including advanced patterns, error handling, and best practices, see :doc:`tasks`. Configuration Loading ^^^^^^^^^^^^^^^^^^^^^ The ``include`` parameter selects specific resources to load from the configuration. **Loading from a file:** .. code-block:: python # Load only specific resources from YAML file with run_workflow( "my_config.yaml", include=["laptop", "compute"], # Only these resources run_dir="./runinfo" ): # Run tasks using selected resources result = my_task(executor=["compute"]).result() **Loading from a dict:** .. code-block:: python # Define configuration as a dictionary config = { "laptop": { "provider": "localhost", "cores_per_node": 4, }, "compute": { "provider": "slurm", "partition": "compute", "nodes_per_block": 1, } } # Load only specific resources from dict with run_workflow( config, include=["laptop", "compute"], # Only these resources run_dir="./runinfo" ): # Run tasks using selected resources result = my_task(executor=["compute"]).result() If ``include`` is omitted, all resources in the configuration are loaded. Directory Structure ------------------- After running workflows, you'll see: .. code-block:: text . ├── my_config.yaml # Configuration file ├── my_workflow.py # Workflow script └── runinfo/ # Parsl runtime directory ├── 000/ # Run directory │ ├── local/ # Local resource files │ ├── remote/ # Remote resource files │ └── submit_scripts/ # Job submission scripts └── parsl.log # Parsl log file The ``runinfo`` directory contains execution logs, job scripts, and task outputs. Troubleshooting --------------- Tasks Not Running ^^^^^^^^^^^^^^^^^ 1. Verify endpoint is running: ``chiltepin endpoint list`` 2. Check you're using the correct endpoint UUID 3. Review logs in ``runinfo/`` directory 4. Check endpoint logs: ``~/.globus_compute/my-endpoint/endpoint.log`` Authentication Expired ^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: bash $ chiltepin logout $ chiltepin login Configuration Errors ^^^^^^^^^^^^^^^^^^^^ Validate your YAML syntax: .. code-block:: python import yaml with open("my_config.yaml") as f: config = yaml.safe_load(f) print(config) Resource Limits ^^^^^^^^^^^^^^^ If jobs fail to start: - Check partition/queue names - Verify account/project is valid - Confirm node/core requests are within limits - Machine may be busy and resource pool job may be pending or may be full Next Steps ---------- * Comprehensive task documentation: :doc:`tasks` * Detailed configuration options: :doc:`configuration` * Endpoint management: :doc:`endpoints` * Run the test suite: :doc:`testing` * Set up Docker environment: :doc:`container` * Explore the API: :doc:`api`