Glycan

class Glycan(iupac, root_orientation='n', start=100, tree_only=False, full=True)[source]

This class is like an interaction with the Parser for the IUPAC representation of the glycan. The grammar for glycans is defined using ANTLR (https://www.antlr.org/). From this ANTLR is able to generate lexer and parser that fit the defined grammar. Don’t touch those files those are auto generated and therefore mostly uncommented.

The defined grammar discards the last glycan which is used to define the root of the glycan tree. Therefore, the resulting abstract syntax trees (AST)s are not intuitive.

summary()[source]

Aggregate some statistics of the glycan. This includes in the following order [the key in the output dictionary in brackets]: molecular formula [formula], number of atoms [atoms], number of bonds [bonds], number of rings [rings], number of monomers [monomers], max depth of the tree [depth], the root monomer [root], list of all leaf monomers [leaves], molecular weight [weight].

Returns

The above named statistics are returned as dictionary with the given keys.

count(glycan, match_all_fg=False, match_some_fg=False, match_edges=False, match_nodes=False, match_leaves=False, match_root=False)[source]

Match a glycan against a query molecule and return the number of hits. This matching can be restricted by setting some flags introducing additional conditions of the matches.

This matching does not include the configuration (alpha/beta/undefined) of the root monomer of the query. So query “Gal” will result a hit in “GalNAc6S b” but neither do “Gal a” or “Gal b”.

Parameters
  • glycan (Union[str, 'Glycan']) – query glycan to be matched against the monomers of this glycan

  • match_all_fg (bool) – flag indicating to match all fgs of the query glycan to all fgs of a monomer

  • match_some_fg (bool) – flag indicating to match all fgs of the query glycan to some fgs of a monomer

  • match_edges (bool) – flag indicating to also match edges

  • match_nodes (bool) – flag indicating to match against all nodes

  • match_leaves (bool) – flag indicating to match against the leaf monomers only

  • match_root (bool) – flag indicating to match against the root monomer only

Returns

The number of matches of the query in this glycan under the given conditions

count_protonation(grouping)[source]

Count the possible deprotonation sites in the final molecule.

Parameters

grouping (bool) – If True, count functional groups based on their common atom, so an SO2 group will count as 1. Otherwise, count groups based on the protonizable oxygen atoms, so an SO2 group will count as 2.

Returns

The number of possible deprotonations in the molecule.

count_functional_groups(groups)[source]

Count the number of the provided functional group in the final molecule.

Parameters

groups (Union[str, List[str]]) – each string has to be a valid functional group representable by this tool, a valid SMILES string or a valid SMARTS string

Returns

The number of matches of all functional groups. This count might overlap in the matched atoms.

get_smiles()[source]

Request the SMILES string of the parsed molecule.

Returns

Generated SMILES string

get_tree()[source]

Request the tree parsed from the IUPAC in this instance.

Returns

The parsed tree with the single monomers in the nodes.

save_dot(output, horizontal=False)[source]

Save the tree structure of the encoded glycan molecule into a dot file visualizing the graph of monomers.

Parameters
  • output (str) – path to store the DOT file in

  • horizontal (bool) – Show graph in horizontal orientation from left to right

Returns

pydot graph object containing the graph

create_snfg_img(filepath, **kwargs)[source]

Create an image representation for a glycan using the SNFG symbols. The final image will not have a fixed size, but it’s size adapts to the shape of the glycan. The width will be (max_depth + 1) * kwargs[‘width’] and the height will depend on the branching structure of the glycan.

Parameters
  • filepath (str) – path where to store the image

  • **kwargs

    • width (int): scaling factor for the width in the image generation

    • height (int): scaling factor for the width in the image generation

    • stroke (int): stroke size to be used when drawing the lines of the SNFG symbols

    • line (int): width of a line that connected two monomer-representing symbols.

Returns

PIL image representation using the SNFG symbols