An open-source hub for all your molecular featurizers
Discover an unparalleled diversity of molecular featurizers and deploy them directly in your machine learning workflows.
import datamol as dmfrom molfeat.calc import RDKitDescriptors2Ddata = dm.data.freesolv().sample(500).smiles.valuesmol2d = datacalc = RDKitDescriptors2D()calc(mol2d)
What is molfeat?
molfeat is an open-source hub that makes it easy for ML scientists to evaluate and implement a wide range of molecular featurizers. Find the right featurizer for your workflow today.
MolT5 is a self-supervised learning framework that pretrains transformer-based models on vast amounts of unlabeled natural language text and molecule strings allowing generation of high-quality outputs for molecule captioning and text-based molecule generation.
3D molecular descriptors are numerical representations of chemical and physical properties of molecules that are based on 3D structures of molecules.
2D molecular descriptors are numerical representations of chemical and physical properties of molecules that are based on 2D structures of molecules. We augment the RDKit 2D descriptors with additional optional properties.
Mordred calculates over 1800 molecular descriptors, including constitutional, topological, electronic, and geometrical descriptors, among others. Both 2D and 3D descriptors are supported and optional.
The ECFP-Count (Extended Connectivity Fingerprints-Coun is essentially the same as the ECFP. However, instead of being hashed into a binary vector, there is no hashing process and simply a count vector is returned
Pretrained Graph Transformer on PCQM4Mv2 Homo-Lumo energy gap prediction using 2D molecular graphs.