Documentation for AI-Flux - LLM Batch Processing Pipeline for HPC Systems
This document explains the organization of the AI-Flux codebase to help you understand and navigate the project.
aiflux/
├── src/
│ └── aiflux/
│ ├── core/
│ │ ├── processor.py # Base processor interface
│ │ ├── config.py # Configuration management
│ │ ├── config_manager.py # Configuration priority system
│ │ └── client.py # LLM client interface
│ ├── processors/ # Built-in processors
│ │ └── batch.py # JSONL batch processor
│ ├── slurm/ # SLURM integration
│ │ ├── runner.py # SLURM job management
│ │ └── scripts/ # SLURM scripts
│ ├── converters/ # Format converters (utilities)
│ │ ├── csv.py # CSV to JSONL converter
│ │ ├── json.py # JSON to JSONL converter
│ │ ├── directory.py # Directory to JSONL converter
│ │ ├── vision.py # Vision to JSONL converter
│ │ └── utils.py # JSONL utilities
│ ├── io/ # Input/Output handling
│ │ ├── base.py # Base output classes
│ │ └── output/ # Output handlers
│ │ └── json_output.py # JSON output handler
│ ├── templates/ # Model templates
│ │ ├── llama3.2/
│ │ ├── llama3.3/
│ │ └── qwen2.5/
│ └── utils/
│ └── env.py # Environment utilities
├── examples/ # Example implementations
├── tests/
└── pyproject.toml
The core
module contains the foundational components of the system:
processor.py
: Base class for all processorsconfig.py
: Configuration management and modelsconfig_manager.py
: Manages configuration priorityclient.py
: Interface for communicating with language modelsThe processors
module contains implementations of batch processors:
batch.py
: The main JSONL batch processor implementationThe slurm
module handles integration with SLURM for HPC systems:
runner.py
: SLURM job submission and managementscripts/
: SLURM batch scripts for job executionThe converters
module contains utilities for converting data to JSONL format:
csv.py
: Convert CSV files to JSONLjson.py
: Convert JSON files to JSONLdirectory.py
: Convert directory contents to JSONLvision.py
: Prepare vision data for JSONL processingutils.py
: Utility functions for JSONL handlingThe io
module handles input and output operations:
base.py
: Base classes for input/output handlingoutput/json_output.py
: JSON output formatterThe templates
module contains YAML configuration files for supported models:
llama3.2/
) then size (e.g., 7b.yaml
)The utils
module contains utility functions used throughout the codebase:
env.py
: Environment variable utilitiesexamples/
: Example scripts demonstrating usage of the librarytests/
: Unit and integration testsdocs/
: Documentation filesdata/
: Default directory for input and output datamodels/
: Default directory for model cachelogs/
: Default directory for log filescontainers/
: Container definitions and scriptspyproject.toml
: Package configuration and dependencies.env.example
: Example environment configurationREADME.md
: Main project documentation