An open-source hub for all your molecular featurizers

Discover an unparalleled diversity of molecular featurizers and deploy them directly in your machine learning workflows.

getting-started.py
environment.yml
import datamol as dm
from molfeat.calc import RDKitDescriptors2D
data = dm.data.freesolv().sample(500).smiles.values
mol2d = data[83]
calc = RDKitDescriptors2D()
calc(mol2d)

What is molfeat?

molfeat is an open-source hub that makes it easy for ML scientists to evaluate and implement a wide range of molecular featurizers. Find the right featurizer for your workflow today.

MolT5

MolT5 is a self-supervised learning framework that pretrains transformer-based models on vast amounts of unlabeled natural language text and molecule strings allowing generation of high-quality outputs for molecule captioning and text-based molecule generation.

Updated on

desc3D

3D molecular descriptors are numerical representations of chemical and physical properties of molecules that are based on 3D structures of molecules.

Updated on

desc2D

2D molecular descriptors are numerical representations of chemical and physical properties of molecules that are based on 2D structures of molecules. We augment the RDKit 2D descriptors with additional optional properties.

Updated on

mordred

Mordred calculates over 1800 molecular descriptors, including constitutional, topological, electronic, and geometrical descriptors, among others. Both 2D and 3D descriptors are supported and optional.

Updated on

ecfp-count

The ECFP-Count (Extended Connectivity Fingerprints-Coun is essentially the same as the ECFP. However, instead of being hashed into a binary vector, there is no hashing process and simply a count vector is returned

Updated on

pcqm4mv2_graphormer_base

Pretrained Graph Transformer on PCQM4Mv2 Homo-Lumo energy gap prediction using 2D molecular graphs.

Updated on