cclib

cclib is an open-source Python library for parsing and interpreting output files from quantum chemistry codes. It focuses on extracting data from log files, providing a consistent interface for results (coordinates, orbitals, vibrational modes, TD‑DFT, etc.), <span>and</span> enabling interoperability with other open-source chemistry and visualization tools.

MatDaCs Tool Review: cclib

Overview

cclib is a log‑file parsing toolkit for computational chemistry. It targets a practical pain point in materials modeling workflows: extracting reliable, structured data from heterogeneous quantum‑chemistry outputs. cclib offers a unified API for parsed data and bridges to other analysis/visualization tools.

What is cclib?

cclib reads output files from a range of quantum chemistry programs and turns them into a consistent Python object with standardized attributes (geometries, energies, orbitals, vibrational data, TD‑DFT transitions, etc.). Its design focuses on data reusability: parse once, then feed the results into analysis scripts, plotting, or machine‑learning pipelines. citeturn0search0turn0search2

Key Features

  • Wide parser coverage: supports many common computational chemistry packages.
  • Standardized data model: parsed outputs are exposed through a uniform attribute interface to reduce per‑code branching logic.
  • Interoperability: provides bridges to external tools such as Open Babel and PySCF where available.
  • Lightweight CLI: command‑line utilities for quick extraction and conversion.

Installation

python3 -m pip install --user cclib

Example 1: Parsing a log file into a Python object

The canonical workflow uses ccopen/ccread to parse a log file and then accesses standardized attributes on the returned object. This is the key abstraction that makes cross‑code processing viable.

from cclib.io import ccread

# Parse a log file into a ccData object
ccdata = ccread("calc.log")

# Access standardized properties
energies = ccdata.scfenergies
coords = ccdata.atomcoords
charges = getattr(ccdata, "atomcharges", None)

This pattern enables downstream steps such as building training tables, filtering calculations by convergence, or collecting features for ML models without re‑writing parsers for each code.

Example 2: Command‑line extraction

For quick inspection, cclib provides CLI tools like ccget that can extract specific quantities without writing Python code.

ccget scfenergies calc.log

This is useful for batch screening or sanity checks in large directories of calculations.

Application Areas in Materials Informatics

  • High‑throughput post‑processing of DFT calculations across multiple codes.
  • Feature extraction for property prediction models (energies, orbitals, band gaps, vibrational properties).
  • Data curation for datasets where provenance or code heterogeneity must be preserved.
  • Pipeline glue between electronic‑structure workflows and ML/visualization tooling.

Hands‑on Notes

  • cclib is most valuable when a project mixes outputs from different quantum‑chemistry codes or needs consistent parsing for large‑scale workflows.
  • The standardized data model makes it straightforward to map parsed attributes into pandas tables or downstream featurization steps.

Conclusion

cclib is a focused, dependable parser that reduces friction in computational materials workflows. It shines when you need a single, stable interface over many quantum‑chemistry outputs, and it integrates cleanly with data‑science tooling. For MatDaCs authors, cclib is a foundational utility for turning raw calculation logs into reproducible datasets.

References