Batch Processing¶
Efficiently process many spectra with the CLI batch command.
Command¶
# Process multiple spectra; write outputs under results/
sage batch "data/*.dat" --output-dir results/
# Parallel: use all CPU cores (Windows, macOS, Linux)
sage batch "data/*.dat" --output-dir results/ --workers -1
# Parallel: specify number of workers
sage batch "data/*.dat" --output-dir results/ --workers 4
List-based input (CSV)¶
# Use a CSV listing spectra with optional per-row redshift
sage batch --list-csv "data/spectra_list.csv" --output-dir results/
# If your CSV uses different column names
sage batch --list-csv input.csv --path-column "Spectrum Path" --redshift-column "Host Redshift" --output-dir results/
Notes: - The CSV must include a path column; redshift is optional. - Relative paths in the CSV are resolved relative to the CSV file's directory. - When a row has a redshift value, analysis is performed at that fixed redshift for that spectrum; otherwise the global search range is used.
Modes¶
| Mode | Description | 
|---|---|
| Default | Main outputs per spectrum plus a summary | 
| --minimal | Summary only (fastest, least disk) | 
| --complete | Full outputs and plots (largest disk) | 
Common options¶
| Option | Description | 
|---|---|
| --zmin FLOAT/--zmax FLOAT | Redshift search range (default: -0.01 to 1.0) | 
| --forced-redshift FLOAT | Force a fixed redshift for all spectra | 
| --type-filter TYPE... | Restrict templates by type (e.g., Ia Ib Ic) | 
| --template-filter NAME... | Only use specific templates by name | 
| --rlapmin FLOAT/--lapmin FLOAT | Quality/overlap thresholds (defaults: 4.0 / 0.3) | 
| --rlap-ccc-threshold FLOAT | Clustering quality threshold (default: 1.8) | 
| --output-dir DIR | Output directory for results | 
| --stop-on-error | Stop processing upon first error | 
| --verbose | Verbose console output | 
| --brief/--full | Toggle concise vs detailed console output | 
| --no-progress | Disable progress output | 
| --workers INT | Parallel workers: 0=sequential (default), -1=all cores, N=fixed | 
# Redshift search range
sage batch "data/*.dat" --zmin 0.0 --zmax 0.5 --output-dir results/
# Force a fixed redshift for all spectra
sage batch "data/*.dat" --forced-redshift 0.023 --output-dir results/
# Force per-row redshift from a CSV list (only where present)
sage batch --list-csv "path/to/list.csv" --output-dir results/
# Restrict to specific types
sage batch "data/*.dat" --type-filter Ia Ib Ic --output-dir results/
# Stop on first error; increase verbosity
sage batch "data/*.dat" --stop-on-error --verbose --output-dir results/
# Parallel on all cores (cross-platform)
sage batch "data/*.dat" --output-dir results/ --workers -1
Output structure (default)¶
| Per spectrum | Summary | 
|---|---|
| .output,.fluxed,.flattened | batch_summary.txt(includes azFixedcolumn) | 
With --complete, additional plots are generated (comparison, clustering, redshift–age, subtype proportions).
Tips¶
- Start with a small subset to validate parameters
- Use --type-filterto reduce runtime on large template sets
- Prefer --minimalfor quick surveys; rerun interesting cases with--complete
- When using CSV lists, keep spectrum paths relative to the CSV folder for portability
- Parallelism uses only Python standard library; no extra packages required
- On multi-core systems, start with --workers -1; reduce if memory is limited
Troubleshooting¶
- “Out of memory” on large batches: narrow the type range, reduce --workers, or run smaller batches
- Windows/macOS/Linux supported: ensure you run from a normal shell; the CLI handles process spawning safely