Matminer

Matminer is an open-source Python library that accelerates data-driven materials discovery. It consolidates ready-made datasets, automated data-retrieval utilities, and a comprehensive catalog of featurizers so researchers can move from raw compositions or structures to machine-learning-ready tables with minimal boilerplate code.

Information

Official site https://hackingmaterials.lbl.gov/matminer
Openness ★★★
License

Modified BSD License

Core Developers

Developers: HackingMaterials group, Lawrence Berkeley National Laboratory (LBNL), with contributions from the Materials Project and the broader materials informatics community.

Related Papers
  1. Logan Ward et al., “Matminer: An open source toolkit for materials data mining,” Computational Materials Science 152, 60–69 (2018). doi:10.1016/j.commatsci.2018.05.018
  2. Kristin A. Persson et al., “The Materials Project: A materials genome approach to accelerating materials innovation,” APL Materials 1, 011002 (2013). doi:10.1063/1.4812323
Other

Description: Matminer organizes the typical materials informatics workflow into modular subpackages. matminer.datasets exposes one-line access to 40+ curated datasets covering thermoelectrics, elasticity, dielectric constants, metallic glasses, and more—each bundled with provenance metadata. matminer.data_retrieval wraps external repositories such as the Materials Project, Citrination, and MPDS to build custom datasets through authenticated APIs. The matminer.featurizers namespace contains 70+ descriptors spanning composition statistics (Magpie), structural fingerprints (SiteStatsFingerprint, BondFractions), electronic-structure summaries (BandStructureFeatures, DOSFeaturizer), and conversion tools that bridge ASE, pymatgen, and pandas. By standardizing these tasks, Matminer lowers the barrier for MatDaCs users who want to benchmark algorithms, reproduce published datasets, or compare feature families (including DScribe’s local-environment representations) within a single pandas-centric workflow.