Batch Processing

Efficiently process many spectra with the CLI batch command.

Command

# Process multiple spectra; write outputs under results/
sage batch "data/*.dat" --output-dir results/

# Parallel: use all CPU cores (Windows, macOS, Linux)
sage batch "data/*.dat" --output-dir results/ --workers -1

# Parallel: specify number of workers
sage batch "data/*.dat" --output-dir results/ --workers 4

List-based input (CSV)

# Use a CSV listing spectra with optional per-row redshift
sage batch --list-csv "data/spectra_list.csv" --output-dir results/

# If your CSV uses different column names
sage batch --list-csv input.csv --path-column "Spectrum Path" --redshift-column "Host Redshift" --output-dir results/

Notes: - The CSV must include a path column; redshift is optional. - Relative paths in the CSV are resolved relative to the CSV file's directory. - When a row has a redshift value, analysis is performed at that fixed redshift for that spectrum; otherwise the global search range is used.

Modes

Mode Description
Default Main outputs per spectrum plus a summary
--minimal Summary only (fastest, least disk)
--complete Full outputs and plots (largest disk)

Common options

Option Description
--zmin FLOAT / --zmax FLOAT Redshift search range (default: -0.01 to 1.0)
--forced-redshift FLOAT Force a fixed redshift for all spectra
--type-filter TYPE... Restrict templates by type (e.g., Ia Ib Ic)
--template-filter NAME... Only use specific templates by name
--rlapmin FLOAT / --lapmin FLOAT Quality/overlap thresholds (defaults: 4.0 / 0.3)
--rlap-ccc-threshold FLOAT Clustering quality threshold (default: 1.8)
--output-dir DIR Output directory for results
--stop-on-error Stop processing upon first error
--verbose Verbose console output
--brief / --full Toggle concise vs detailed console output
--no-progress Disable progress output
--workers INT Parallel workers: 0=sequential (default), -1=all cores, N=fixed
# Redshift search range
sage batch "data/*.dat" --zmin 0.0 --zmax 0.5 --output-dir results/

# Force a fixed redshift for all spectra
sage batch "data/*.dat" --forced-redshift 0.023 --output-dir results/

# Force per-row redshift from a CSV list (only where present)
sage batch --list-csv "path/to/list.csv" --output-dir results/

# Restrict to specific types
sage batch "data/*.dat" --type-filter Ia Ib Ic --output-dir results/

# Stop on first error; increase verbosity
sage batch "data/*.dat" --stop-on-error --verbose --output-dir results/

# Parallel on all cores (cross-platform)
sage batch "data/*.dat" --output-dir results/ --workers -1

Output structure (default)

Per spectrum Summary
.output, .fluxed, .flattened batch_summary.txt (includes a zFixed column)

With --complete, additional plots are generated (comparison, clustering, redshift–age, subtype proportions).

Tips

  • Start with a small subset to validate parameters
  • Use --type-filter to reduce runtime on large template sets
  • Prefer --minimal for quick surveys; rerun interesting cases with --complete
  • When using CSV lists, keep spectrum paths relative to the CSV folder for portability
  • Parallelism uses only Python standard library; no extra packages required
  • On multi-core systems, start with --workers -1; reduce if memory is limited

Troubleshooting

  • “Out of memory” on large batches: narrow the type range, reduce --workers, or run smaller batches
  • Windows/macOS/Linux supported: ensure you run from a normal shell; the CLI handles process spawning safely

See also