What data structure would you use to represent an organic compound

data structures

Are there any good data structures out there that can be used to represent a molecule?

I was thinking maybe I represent it as a Graph by making every atom a vertex, however, it's common for organic compounds to have lots of Carbons and Hydrogens. How would you number it? Is there a good way to represent molecules, but at the same time, have an efficient .contains() method?

One of the most basic uses for this would be to check if a compound contains carbonyl group, or a benzylic hydrogen, or even a benzene ring.

Best Answer

(Biochemistry graduate with 30 years software development experience)

Non-organic molecules are "relatively" simple. The interesting ones are the ones that can bond with themselves e.g. C, N, O, Si because you can get some really funky combinations. The Benzene ring is a very simple example. Some variations substitute a Nitrogen for one of the Carbons and it gets weird fast.

I'd start with an "atom" object with the various types of atom inheriting from it.

Each "atom" object would contain a list of atom objects to represent the various bonds so Nitrogen would have a list of fixed size 3. It could then store links to three other atoms. A double bond could be represented as a duplicate entry.

Each atom would have rules embedded about what it can legally bond to and how.

So you can make up reasonably complicated molecules unambiguously - because bond 3 on the Carbon #1 is linked to bond 1 on Hydrogen 2 etc.

Hope that makes sense...

Related Topic