Molecular encoder/featurizer using rdkit and OCaml

Chemical fingerprints are lossy encodings of molecules. molenc allows to encode molecules using unfolded and counted fingerprints (i.e. potentially very long, but sparse, integer vectors).

Currently, Faulon fingerprints are supported. In the future, atom pair fingerprints might be added. Currently, atom types are the quadruplet (#pi-electrons, element symbol, #HA neighbors, formal charge). In the future, pharmacophore features might be supported (a more abstract/fuzzy atom typing scheme). In the future, the stereo-chemistry information that can be encoded into SMILES strings might be taken into account.


