Featurizers tagged with #rdkit

desc3D

3D molecular descriptors are numerical representations of chemical and physical properties of molecules that are based on 3D structures of molecules.

Updated on

desc2D

2D molecular descriptors are numerical representations of chemical and physical properties of molecules that are based on 2D structures of molecules. We augment the RDKit 2D descriptors with additional optional properties.

Updated on

pharm2D-default

3D version of the pharmacophores computed with the default rdkit feature definition: https://github.com/rdkit/rdkit/blob/master/Data/BaseFeatures.fdef

Updated on

atompair-count

The Atompair-Count fingerprint is essentially the same as the atompair fingerprint. However, instead of being hashed into a binary vector, there is no hashing process and simply a count vector is returned

Updated on

topological-count

The Topological-Count fingerprint is essentially the same as the Topological fingerprint. However, instead of being hashed into a binary vector, there is no hashing process and simply a count vector is returned

Updated on

fcfp-count

The FCFP-Count (Functional Class Fingerprints-Count) is essentially the same as the FCFP. However, instead of being hashed into a binary vector, there is no hashing process and simply a count vector is returned

Updated on

ecfp-count

The ECFP-Count (Extended Connectivity Fingerprints-Coun is essentially the same as the ECFP. However, instead of being hashed into a binary vector, there is no hashing process and simply a count vector is returned

Updated on

estate

Electrotopological state (Estate) indices are numerical values computed for each atom in a molecule, and which encode information about both the topological environment of that atom and the electronic interactions due to all other atoms in the molecule.

Updated on

erg

Extended Reduced Graph approach (ErG) describes a molecular structure by defining its pharmacophoric points and the topological distance between them. It uses a pairwise combination of pharmacophores and their distance to set a corresponding bit in a vector. The ErG fingerprint implements fuzzy incrementation, which favours retrieval of actives with different core structures (scaffold hopping).

Updated on

pattern

Pattern fingerprints were designed to be used in substructure screening. The algorithm identifies features in the molecule by doing substructure searches using a small number of very generic SMARTS patterns and then hashing each occurrence of a pattern based on the atom and bond types involved. The fact that a particular pattern matched the molecule at all is also stored by hashing the pattern ID and size.

Updated on

rdkit

This is an RDKit-specific fingerprint that is inspired by (though it differs significantly from) public descriptions of the Daylight fingerprint. The fingerprinting algorithm identifies all subgraphs in the molecule within a particular range of sizes, hashes each subgraph to generate a raw bit ID, that is then folded into the requested fingerprint size as binary vectors. Options are available to generate count-based forms of the fingerprint or “non-folded” forms (using a sparse representation).

Updated on

topological

Topological torsion fingerprints are a type of molecular fingerprint that represents the topological features of a molecule based on its graph representation. They are generated by computing the frequencies of all possible molecular torsions in a molecule and then encoding them as a binary vector.

Updated on

fcfp

Functional-class fingerprints (FCFPs) are an extension of ECFPs which incorporate information about the functional classes of atoms in a molecule. FCFPs are intended to capture more abstract property-based substructural features and leverage atomic characteristics that relate more to pharmacophoric features (e.g. hydrogen donor/acceptor, polarity, aromaticity, etc.).

Updated on

ecfp

Extended-connectivity fingerprints (ECFPs) are a family of circular fingerprints that are commonly used for the measure of molecular similarity. They are based on the connectivity of atoms in molecular graphs.

Updated on

avalon

Similar to Daylight fingerprints, Avalon uses a fingerprint generator that enumerates certain paths and feature classes of the molecular graph. The fingerprint bit positions are hashed from the description of the feature; however, the hash codes for all the path-style features are computed implicitly while they are enumerated.

Updated on

maccs

MACCS keys are 166-bit 2D structure fingerprints that are commonly used for the measure of molecular similarity. They described the presence of key features in molecular graphs

Updated on