MODNet

MODNet (Material Optimal Descriptor Network) is a supervised machine-learning framework for predicting materials properties from either composition or crystal structure. It combines matminer-based feature generation with mutual-information feature selection and joint learning to deliver strong results on limited datasets.

DaCs Tool Review: MODNet

Overview

MODNet (Material Optimal Descriptor Network) targets small or noisy materials datasets by pairing matminer descriptors with mutual-information feature selection and joint learning. I tested MODNet 0.4.5 on macOS (Apple M4 Pro) using a clean conda environment (modnet39) to evaluate day-one usability for MatDaCs contributors.

What is MODNet?

MODNet is a supervised machine-learning framework that builds compact, information-rich descriptors before training a neural network. The typical workflow uses:

  • MODData to featurize compositions or structures via matminer and to perform feature selection.
  • MODNetModel to train a feed-forward network that can support single or multiple targets.

This design makes MODNet a natural bridge between featurizer-centric pipelines (Matminer) and model-centric benchmarks (MatBench).

Key Features

  • Feature selection for limited data: mutual-information scoring trims thousands of candidate features down to a compact descriptor set.
  • Joint learning: optional multi-target training helps share signal across correlated properties.
  • Pretrained models: ready-to-use predictors for refractive index and vibrational thermodynamics.
  • MatBench integration: MODNet appears on the MatBench leaderboard and ships benchmarking utilities.
  • Composable featurizers: composition-only or structure-based presets keep workflows consistent with Matminer.

Installation

The README recommends Python 3.8+ and a pinned environment. I followed the same pattern:

conda create -n modnet39 python=3.9
conda activate modnet39
pip install modnet

This pulled in TensorFlow, matminer, pymatgen, scikit-learn, and their dependencies without manual builds.

Example workflow and local test

I adapted the official MODNet workflow into a small, reproducible demo that uses the matminer dielectricconstant dataset. The script is stored at modnetdemo.py and does the following:

  • Load 200 samples with formulas and dielectric constants.
  • Featurize with CompositionOnlyMatminer2023Featurizer.
  • Select 64 features with mutual information.
  • Train a compact MODNet model for 40 epochs.
  • Export predictions and a parity plot.
from modnet.preprocessing import MODData
from modnet.models import MODNetModel
from modnet.featurizers.presets import CompositionOnlyMatminer2023Featurizer

moddata = MODData(materials=materials, targets=targets, target_names=['poly_total'],
                  featurizer=CompositionOnlyMatminer2023Featurizer())
moddata.featurize(n_jobs=1)
moddata.feature_selection(n=64, n_jobs=1, random_state=42)

model = MODNetModel(targets=[[['poly_total']]], weights={'poly_total': 1.0},
                    num_neurons=[[128], [64], [32], [16]], n_feat=64)
model.fit(train_data, val_data=test_data, epochs=40, batch_size=16, verbose=0)

Running conda run -n modnet39 python modnet_demo.py produced:

  • Samples: 200
  • Selected features: 64
  • Validation MAE: 4.12 (more stable baseline)
  • Outputs saved to modnetpredictions.csv and modnetval_scatter.png

The plot below uses the full absolute path to avoid image resolution issues in MatDaCs markdown:

Comparison with Matminer and DScribe

  • Matminer vs. MODNet: Matminer handles dataset access and featurization; MODNet builds on those features to select a compact descriptor set and train a neural model.
  • DScribe vs. MODNet: DScribe focuses on local atomic-environment descriptors (SOAP, MBTR), while MODNet targets global composition or structure descriptors optimized for tabular ML. Use DScribe when local structure detail is critical; use MODNet when you want compact descriptors and joint learning.
  • Complementarity: MODNet can sit downstream of Matminer and side-by-side with DScribe features in benchmarking workflows.

Hands-on notes

  • fast=True in some MODNet examples downloads precomputed features from figshare; on this machine the download failed with an MD5 mismatch (likely WAF interference). Local featurization (fast=False or explicit featurizers) is more reliable.
  • Matminer featurizers emit impute_nan=False warnings; consider enabling imputation if your dataset includes elements with missing tabulated properties.
  • Even with a tiny dataset, the end-to-end pipeline (feature selection + training) completes in seconds on CPU.

Conclusion

MODNet is a practical middle ground between feature engineering and neural modeling. It keeps the Matminer-friendly workflow while adding feature selection and multi-target learning, making it especially suitable for small materials datasets. For MatDaCs content, MODNet pairs well with Matminer baselines and DScribe descriptors when you want a compact, explainable descriptor set plus a neural model.

References

  • MODNet GitHub: <https://github.com/ppdebreuck/modnet>
  • MODNet documentation: <https://modnet.readthedocs.io/en/latest/>
  • De Breuck et al., npj Comput. Mater. 7, 83 (2021)
  • De Breuck et al., J. Phys.: Condens. Matter 33, 404002 (2021)
  • MatBench leaderboard: <https://matbench.materialsproject.org/>