Universe

If you wish to make an apple pie from scratch, you must first invent the universe.

—Carl Sagan, Cosmos

MDAnalysis is structured around two fundamental classes: the Universe and the AtomGroup. Almost all code in MDAnalysis begins with Universe, which contains all the information describing a molecular dynamics system.

It has two key properties:

  • atoms: an AtomGroup of the system’s atoms, providing access to important analysis methods (described below)

  • trajectory: the currently loaded trajectory reader

A Universe ties the static information from the “topology” (e.g. atom identities) to dynamically updating information from the “trajectory” (e.g. coordinates). A key feature of MDAnalysis is that an entire trajectory is not loaded into memory (unless the user explicitly does so with MemoryReader). Instead, the trajectory attribute provides a view on a specific frame of the trajectory. This allows the analysis of arbitrarily long trajectories without a significant impact on memory.

Creating a Universe

Loading from files

A Universe is typically created from a “topology” file, with optional “trajectory” file/s. Trajectory files must have the coordinates in the same order as atoms in the topology. See Formats for the topology and trajectory formats supported by MDAnalysis, and how to load each specific format.

u = Universe(topology, trajectory)
u = Universe(pdbfile)                       # read atoms and coordinates from PDB or GRO
u = Universe(topology, [traj1, traj2, ...]) # read from a list of trajectories
u = Universe(topology, traj1, traj2, ...)   # read from multiple trajectories

The line between topology and trajectory files is quite blurry. For example, a PDB or GRO file is considered both a topology and a trajectory file. The difference is that a topology file provides static information, such as atom identities (name, mass, etc.), charges, and bond connectivity. A trajectory file provides dynamic information, such as coordinates, velocities, forces, and box dimensions.

If only a single file is provided, MDAnalysis tries to read both topology and trajectory information from it. When multiple trajectory files are provided, coordinates are loaded in the order given.

The default arguments should create a Universe suited for most analysis applications. However, the Universe constructor also takes optional arguments.

The following options specify how to treat the input:

  • format: the file format of the trajectory file/s. (default: None, formats are guessed)

  • topology_format: the file format of the topology file. (default: None, formats are guessed)

  • all_coordinates: whether to read coordinate information from the first file (default: False. Ignored when only one file is provided)

  • continuous: whether to give multiple trajectory files continuous time steps. This is currently only supported for XTC/TRR trajectories with a GRO/TPR topology, following the behaviour of gmx trjcat (default: False.)

In [1]: import MDAnalysis as mda

In [2]: from MDAnalysis.tests.datafiles import PDB, GRO, XTC

In [3]: u1 = mda.Universe(GRO, XTC, XTC, all_coordinates=True)

In [4]: u1.trajectory
Out[4]: <ChainReader containing adk_oplsaa.gro, adk_oplsaa.xtc, adk_oplsaa.xtc with 21 frames of 47681 atoms>

In [5]: u2 = mda.Universe(GRO, XTC, XTC, all_coordinates=False, continuous=False)

In [6]: print([int(ts.time) for ts in u2.trajectory])
[0, 100, 200, 300, 400, 500, 600, 700, 800, 900, 0, 100, 200, 300, 400, 500, 600, 700, 800, 900]

The following options modify the created Universe:

  • guess_bonds: whether to guess connectivity between atoms. (default: False)

  • vdwradii: a dictionary of {element: radius} of van der Waals’ radii for use in guessing bonds.

  • transformations: a function or list of functions for on-the-fly trajectory transformation.

  • in_memory: whether to load coordinates into memory (default: False)

  • in_memory_step: only read every nth frame into an in-memory representation. (default: 1)

  • is_anchor: whether to consider this Universe when unpickling AtomGroups (default: True)

  • anchor_name: the name of this Universe when unpickling AtomGroups (default: None, automatically generated)

You can also pass in keywords for parsing the topology or coordinates. For example, many file formats do not specify the timestep for their trajectory. In these cases, MDAnalysis assumes that the default timestep is 1 ps. If this is incorrect, you can pass in a dt argument to modify the timestep. This does not modify timesteps for formats that include time information.

In [7]: from MDAnalysis.tests.datafiles import PRM, TRJ

In [8]: default_timestep = mda.Universe(PRM, TRJ)

In [9]: default_timestep.trajectory.dt
Out[9]: 1.0

In [10]: user_timestep = mda.Universe(PRM, TRJ, dt=5)  # ps

In [11]: user_timestep.trajectory.dt
Out[11]: 5

Constructing from AtomGroups

A new Universe can be created from one or more AtomGroup instances with Merge(). The AtomGroup instances can come from different Universes, meaning that this is one way to concatenate selections from different datasets.

For example, to combine a protein, ligand, and solvent from separate PDB files:

u1 = mda.Universe("protein.pdb")
u2 = mda.Universe("ligand.pdb")
u3 = mda.Universe("solvent.pdb")
u = Merge(u1.select_atoms("protein"), u2.atoms, u3.atoms)
u.atoms.write("system.pdb")

Constructing from scratch

A Universe can be constructed from scratch with Universe.empty. There are three stages to this process:

  1. Create the blank Universe with specified number of atoms. If coordinates, set trajectory=True.

  2. Add topology attributes such as atom names.

  3. (Optional) Load coordinates.

For example, to construct a universe with 6 atoms in 2 residues:

In [12]: u = mda.Universe.empty(6, 2, atom_resindex=[0, 0, 0, 1, 1, 1], trajectory=True)

In [13]: u.add_TopologyAttr('masses')

In [14]: coordinates = np.empty((1000,  # number of frames
   ....:                         u.atoms.n_atoms,
   ....:                         3))
   ....: 

In [15]: u.load_new(coordinates, order='fac')
Out[15]: <Universe with 6 atoms>

See this notebook tutorial for more information.

Guessing topology attributes

MDAnalysis has a guesser library that hold various guesser classes. Each guesser class is tailored to be context-specific. For example, PDBGuesser is specific for guessing attributes for PDB file format. See Guessing for more details about the available context-aware guessers. The Universe has guess_TopologyAttributes() API, which have ability to guess an attribute within a specific context either at the universe creation or by using the API directly. For example, to guess element attribute for a PDB file by either of two ways:

In [16]: u = mda.Universe(PDB, context='PDB', to_guess=['elements'])

or

In [17]: u = mda.Universe(PDB)

In [18]: u.guess_TopologyAttributes(context='PDB', to_guess=['elements'])
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-18-395663fbf5d9> in <module>
----> 1 u.guess_TopologyAttributes(context='PDB', to_guess=['elements'])

AttributeError: 'Universe' object has no attribute 'guess_TopologyAttributes'

The following options modify how to guess attribute(s):

  • context: the context of the guesser to be used in guessing the attribute. You can pass either a string representing the context (see Guessing for more detail about available guessers and their context), or as an object of a guesser class. The default value of the context is default, which corresponds to a generic DefaultGuesser, that is not specific to any context. You can pass a context once, and whenever you call guess_TopologyAttributes() again it will assume that you still using the same context. N.B.: If you didn’t pass any context to the API, it will use the DefaultGuesser

  • to_guess: list of the attributes to be guessed (these attributes will be either guessed if they don’t exist in the universe or partially guessed by only filling its empty values if universe has the attribute). This has to be the plural name of the attributes (masses not mass).

  • force_guess: a list of attributes to be forced guessed (these attributes will be either guessed if they don’t exist in the universe or their values will be completely overwritten by guessed ones if the universe has the attribute). This has to be the plural name of the attributes (masses not mass).

  • **kwargs: to pass any supplemental data to the guess_TopologyAttributes() API that can be useful in guessing some attributes (eg. passing vdwradii for bond guessing).

For now, MDAnalysis automatically guess types and * masses at the universe creation by having a default value of the to_guess parameter to be * :code`[‘types’, ‘masses’]`. This is done using the DefaultGuesser. you can stop this by passing () to the to_guess parameter.

Universe properties and methods

A Universe holds master groups of atoms and topology objects:

  • atoms: all Atoms in the system, in an AtomGroup.

  • residues: all Residues in the system

  • segments: all Segments in the system

  • bonds: all bond TopologyObjects in the system

  • angles: all angle TopologyObjects in the system

  • dihedrals: all dihedral TopologyObjects in the system

  • impropers: all improper TopologyObjects in the system

Residues and Segments are chemically meaningful groups of Atoms.

Modifying a topology is typically done through the Universe, which contains several methods for adding properties:

  • add_TopologyAttr()

  • add_Residue()

  • add_Segment()

See Topology attributes for more information on which topology attributes can be added, and examples/constructing_universe.ipynb for examples on adding attributes and Segments.