Featurizers tagged with #rdkit
desc3D
3D molecular descriptors are numerical representations of chemical and physical properties of molecules that are based on 3D structures of molecules.
Updated on
desc2D
2D molecular descriptors are numerical representations of chemical and physical properties of molecules that are based on 2D structures of molecules. We augment the RDKit 2D descriptors with additional optional properties.
Updated on
pharm2D-default
3D version of the pharmacophores computed with the default rdkit feature definition: https://github.com/rdkit/rdkit/blob/master/Data/BaseFeatures.fdef
Updated on
atompair-count
The Atompair-Count fingerprint is essentially the same as the atompair fingerprint. However, instead of being hashed into a binary vector, there is no hashing process and simply a count vector is returned
Updated on
topological-count
The Topological-Count fingerprint is essentially the same as the Topological fingerprint. However, instead of being hashed into a binary vector, there is no hashing process and simply a count vector is returned
Updated on
fcfp-count
The FCFP-Count (Functional Class Fingerprints-Count) is essentially the same as the FCFP. However, instead of being hashed into a binary vector, there is no hashing process and simply a count vector is returned
Updated on
estate
Electrotopological state (Estate) indices are numerical values computed for each atom in a molecule, and which encode information about both the topological environment of that atom and the electronic interactions due to all other atoms in the molecule.
Updated on
erg
Extended Reduced Graph approach (ErG) describes a molecular structure by defining its pharmacophoric points and the topological distance between them. It uses a pairwise combination of pharmacophores and their distance to set a corresponding bit in a vector. The ErG fingerprint implements fuzzy incrementation, which favours retrieval of actives with different core structures (scaffold hopping).
Updated on
pattern
Pattern fingerprints were designed to be used in substructure screening. The algorithm identifies features in the molecule by doing substructure searches using a small number of very generic SMARTS patterns and then hashing each occurrence of a pattern based on the atom and bond types involved. The fact that a particular pattern matched the molecule at all is also stored by hashing the pattern ID and size.
Updated on
rdkit
This is an RDKit-specific fingerprint that is inspired by (though it differs significantly from) public descriptions of the Daylight fingerprint. The fingerprinting algorithm identifies all subgraphs in the molecule within a particular range of sizes, hashes each subgraph to generate a raw bit ID, that is then folded into the requested fingerprint size as binary vectors. Options are available to generate count-based forms of the fingerprint or “non-folded” forms (using a sparse representation).
Updated on
topological
Topological torsion fingerprints are a type of molecular fingerprint that represents the topological features of a molecule based on its graph representation. They are generated by computing the frequencies of all possible molecular torsions in a molecule and then encoding them as a binary vector.
Updated on
fcfp
Functional-class fingerprints (FCFPs) are an extension of ECFPs which incorporate information about the functional classes of atoms in a molecule. FCFPs are intended to capture more abstract property-based substructural features and leverage atomic characteristics that relate more to pharmacophoric features (e.g. hydrogen donor/acceptor, polarity, aromaticity, etc.).
Updated on
avalon
Similar to Daylight fingerprints, Avalon uses a fingerprint generator that enumerates certain paths and feature classes of the molecular graph. The fingerprint bit positions are hashed from the description of the feature; however, the hash codes for all the path-style features are computed implicitly while they are enumerated.
Updated on