From 6dacb8d4203fc8972bbd7a8663997e78ffecb5cd Mon Sep 17 00:00:00 2001 From: Lawson Woods Date: Mon, 7 Oct 2024 23:27:47 -0700 Subject: [PATCH 01/24] new protocol writeup --- docs/source/protocol_v3.rst | 460 +++++++++++++++++++++++++++++++++--- 1 file changed, 423 insertions(+), 37 deletions(-) diff --git a/docs/source/protocol_v3.rst b/docs/source/protocol_v3.rst index cb60295..0b322af 100644 --- a/docs/source/protocol_v3.rst +++ b/docs/source/protocol_v3.rst @@ -1,51 +1,437 @@ -Proposed Protocol v3 -==================== +IMDv3 Protocol +============== -The suggested changes to the protocol are as follows: +IMDv3 is a protocol for communicating molecular simulation data through a socket. +It allows for two-way communication: via IMDv3, a simulation engine sends data to a receiver +and the receiver can send forces and certain requests back to the simulation engine. -1. Via an ``.mdp`` file setting, the user should be able to specify which of simulation box, positions, forces, and velocities are sent. - These should be rate-adjustable like with xtc and trr output settings but should be set to every step by default. +Terms +----- +**Simulation engine**: The producer of simulation data which listens for a receiver connection and sends it simulation data -2. The IMD_HANDSHAKE packet should have a 1-byte body which contains the configuration settings the simulation was setup with. - This will allow the client to be choose appropriate packets to send and receive without redundant configuration. +**Receiver**: The receiver of simulation data, connects to listening simulation engine and receives simulation data through a socket - The modified handshake packet would look like this: +**IMD frame**: A simulation frame in which IMD data will be sent from the simulation engine to the receiver + +**IMD system**: The subset of all atoms in the simulation for which the simulation engine can output IMD data +and for which the receiver can send forces to the simulation engine to apply to. + +Example +------- + +For a quick understanding of how IMDv3 works, this scenario describes the flow of a typical IMDv3 session: + +1. GROMACS, a simulation engine, starts listening for IMDv3 connections on port 8888 +2. IMDClient, an IMDv3 receiver, connects to the listening port +3. Through the connected socket, GROMACS sends IMDClient a handshake packet containing its running machine's endianness and the version of the session (v3) +4. GROMACS sends IMDClient a session info packet (introduced in IMDv3) informing it that it should expect time, box, coordinate, and velocity data in this session +5. After parsing the handshake and session info packets, IMDClient sends a go signal to GROMACS and begins waiting for data packets +6. GROMACS repeatedly sends time, box, coordinate, and velocity packets for each simulation frame in a fixed order +7. IMDClient continuously parses the stream of bytes, making the data available for processing by analysis code +8. GROMACS receives a termination signal from the environment it is running in and closes the socket +9. IMDClient recognizes that the session has ended, raising an EOF error after the last available frame of data has been processed + + +Packet Types +------------ + +In IMD, there are two kinds of packets: header packets and body packets. + +A header packet is composed of 8 bytes. The first 4 bytes +are always the header type and the next 4 bytes +serve as a flexible slot for holding other information. +All bytes in header packets are provided in network (big) +endian order except for the second 4 +bytes of the handshake packet. This is described in greater detail +in the `Handshake`_ section. + +In this document, header packets are described like this: .. code-block:: none - Header: - 4 (int32) (IMD_HANDSHAKE) - 2 (int32) (Protocol version, must be unswapped so client can determine endianness) - Body: - (bit) (imdpull: true or false. if true, the client can input forces to the simulation) - (bit) (imdwait: true or false. if true, the simulation will wait for the client to send a go signal before starting) - (bit) (imdterm: true or false. if true, the client can terminate the simulation) - (bit) (wrapped positions: true or false. if positions rate is 0, this is a placeholder value) + Header: + (int32) Header type + (dtype) Data slot description - (uint32) (energies rate: number of steps that elapse between steps that contain energy data. 0 means never) - (uint32) (dimensions rate: number of steps that elapse between steps that contain dimension data. 0 means never) - (uint32) (positions rate: number of steps that elapse between steps that contain position data. 0 means never) - (uint32) (velocities rate: number of steps that elapse between steps that contain velocity data. 0 means never) - (uint32) (forces rate: number of steps that elapse between steps that contain force data. 0 means never) +Some header packets sent by the simulation engine have associated body packets that contain additional data, +like atomic coordinates. These body packets vary in length and composition, +but their bytes are always encoded in the native endianness of the machine running the simulation engine. - "wrapped positions" will be a new ``.mdp`` setting which specifies whether the atoms' positions - should be adjusted to fit within the simulation box before sending. This is useful for visualization purposes. +This table lists all header types available in IMDv3. Below, each header type +and its associated body packet (if present) is described in detail. -3. The server should wait longer than 1 second (possibly up to 60s) for the go signal so that the client - has plenty of time to allocate memory buffers based on the endianness and information on included data types - it received in the handshake packet. +.. list-table:: + :widths: 10 10 + :header-rows: 1 -4. In the simulation loop, the server will send the client data in this order (if the configuration says to send it) - - i. Energy data (IMD_ENERGIES) unchanged - - ii. Dimension data (IMD_BOX) in triclinic vectors + * - Header type + - 32-bit integer enum value + * - Disconnect + - 0 + * - Energies + - 1 + * - Coordinates + - 2 + * - Go + - 3 + * - Handshake + - 4 + * - Kill + - 5 + * - MD Communication + - 6 + * - Pause + - 7 + * - Transmission rate + - 8 + * - IO Error + - 9 + * - Session info + - 10 + * - Resume + - 11 + * - Time + - 12 + * - Box + - 13 + * - Velocities + - 14 + * - Forces + - 15 - iii. Position data (IMD_FCOORDS) unchanged except box adjustments (see 5) - - iv. Velocity data (IMD_VELS) in the same manner as positions +Disconnect +^^^^^^^^^^ + +Sent from the receiver to the simulation engine any time after the `Session info`_ +has been sent to indicate that the simulation engine should +close the connected socket. Whether the simulation engine blocks execution until another connection is +made is an implementation decision. + +.. code-block:: none + + Header: + 0 (int32) Disconnect + (no type) Unused slot, any value acceptable + + +Energies +^^^^^^^^ + +Sent from the simulation engine to the receiver each IMD frame if +energies were previously specified for this session in `Session info`_. + +.. note:: + While the integration step is included in this + packet, this is a result of inheriting the IMD energy block from IMDv2. It is reccomended + to make use of the 64-bit integer integration step value from the time packet + in analysis code instead. + +.. code-block:: none + + Header: + 1 (int32) Energies + 1 (int32) Number of IMD energy blocks being sent + + Body: + (int32) Current integration step of the simulation + (float32) Absolute temperature + (float32) Total energy + (float32) Potential energy + (float32) Van der Waals energy + (float32) Coulomb interaction energy + (float32) Bonds energy + (float32) Angles energy + (float32) Dihedrals energy + (float32) Improper dihedrals energy + + +Coordinates +^^^^^^^^^^^ + +Sent from the simulation engine to the receiver each IMD frame if +coordinates were previously specified for this session in `Session info`_. + +.. code-block:: none + + Header: + 2 (int32) Coordinates + (int32) Number of atoms in the IMD system + + Body: + (float32[n_atoms * 3]) X, Y, and Z coordinates of each atom in the + IMD system encoded in the order + [X1, Y1, Z1, ..., Xn, Yn, Zn] + +Go +^^ + +Sent from the receiver to the simulation engine after the receiver receives +the `Handshake`_ and `Session info`_ packets. + +If the simulation engine does not +receive this packet within 1 second of sending the handshake and session info +packets, it should assume the receiver is incompatible. Whether the simulation engine +exits or accepts another connection after this is an implementation decision. + +.. code-block:: none + + Header: + 3 (int32) Go + (no type) Unused slot, any value acceptable + + +Handshake +^^^^^^^^^ + +Sent from the simulation engine to the receiver after a socket connection +is established. Unlike other header packets, the last four bytes of this packet are provided in +the native endianness of the sending simulation engine's hardware. + +The receiver can use this packet to determine both the IMD version +of the session and the endianness of the simulation engine. By providing +the endianness of the machine running the simulation engine, the bulk of the +data being sent in the session, i.e. the body packets, do not have to be swapped +by the simulation engine before being sent, speeding up execution. + +.. code-block:: none + + Header: + 4 (int32) Handshake + 3 (int32, unswapped byte order) IMD version used in session + +Kill +^^^^ + +Sent from the receiver to the simulation engine any time after the `Session info`_ +has been sent to request that the simulation engine +stops execution of the simulation and exits. Whether or not the simulation engine +honors this request is an implementation decision. + +.. code-block:: none + + Header: + 5 (int32) Kill + (no type) Unused slot, any value acceptable + +MD Communication +^^^^^^^^^^^^^^^^ + +Sent from the receiver to the simulation engine any time after the `Session info`_ +has been sent to request that the forces +in the body packet are applied to the atoms specified in the body packet. +Whether or not the simulation engine honors this request is an implementation decision. + +.. code-block:: none + + Header: + 6 (int32) MD Communication + (int32) Number of forces to be applied to the simulation engine + + Body: + (int32[n_forces]) Indices of atoms in the IMD system to apply forces to + (float32[n_forces * 3]) The X, Y, and Z components of forces to be applied to + the atoms at the indices specified in the above array + +The array of IMD system indices does not need to be monotonically increasing, meaning +the indices can be "out of order". However, indices cannot be duplicated, i.e. +the index array cannot contain index "2" twice. In this case, the force vectors should +be combined before being sent to the simulation engine to be applied. + +Pause +^^^^^ + +Sent from the receiver to the simulation engine any time after the `Session info`_ +has been sent to request that the simulation +engine blocks execution until a `Resume`_ packet is sent. Unlike in IMDv2, pause is idempotent, +meaning sending a pause packet multiple times in a row will not toggle the +state of the simulation. Instead, subsequent pause packets sent after the first one will have no effect. + +.. code-block:: none + + Header: + 7 (int32) Pause + (no type) Unused slot, any value acceptable + + +Transmission rate +^^^^^^^^^^^^^^^^^ + +Sent from the receiver to the simulation engine any time after the `Session info`_ +has been sent to change the number of steps between each IMD frame. For example, +if the transmission rate is 1, every integration step in the simulation will +be an IMD frame. + +.. code-block:: none + + Header: + 8 (int32) Transmission rate + (int32) New transmission rate. Any value less than 1 will reset + the transmission rate to its default value (configured + by the simulation engine) + +IO Error +^^^^^^^^ + +Never sent from one party to another during an IMD session. Can be used internally +by the simulation engine or receiver to indicate an error has occurred. + +.. code-block:: none + + Header: + 9 (int32) IO Error + (no type) Unused slot, any value acceptable + + +Session info +^^^^^^^^^^^^ + +Sent by the simulation engine to the receiver immediately after +the `Handshake`_ is sent to indicate to the receiver which data it +should expect for each IMD frame during the session along with +whether coordinates will be wrapped into the simulation box if present. + +.. code-block:: none + + Header: + 10 (int32) Session info + 7 (int32) Number of 1-byte configuration options in the body packet - v. Force data (IMD_FORCES) in the same manner as positions + Body: + (int8) Nonzero if time packets sent in each IMD frame + (int8) Nonzero if IMD energy block packets sent in each IMD frame + (int8) Nonzero if box packets sent in each IMD frame + (int8) Nonzero if coordinate packets sent in each IMD frame + (int8) Nonzero if coordinates wrapped into the simulation box. + Meaningless if coordinates not sent in the session + (int8) Nonzero if velocity packets sent in each IMD frame + (int8) Nonzero if force packets sent in each IMD frame + +Resume +^^^^^^ + +Sent from the receiver to the simulation engine any time after the `Session info`_ +has been sent to request that the simulation resumes execution +if it is in a paused state. Like `Pause`_, resume is idempotent. + +.. code-block:: none + + Header: + 11 (int32) Resume + (no type) Unused slot, any value acceptable + +Time +^^^^ + +Sent from the simulation engine to the receiver each IMD frame if +time packets were previously specified for this session in `Session info`_. + +.. code-block:: none + + Header: + 12 (int32) Time + 1 (int32) Number of time packets being sent + + Body: + (unsigned float64) dt for the simulation + (unsigned float64) Current time of the simulation + (unsigned int64) Current integration step of the simulation + +Box +^^^ + +Sent from the simulation engine to the receiver each IMD frame if +box packets were previously specified for this session in `Session info`_. + +.. code-block:: none + + Header: + 13 (int32) Box + 1 (int32) Number of simulation boxes being sent + Body: + (float32[9]) Triclinic box vectors for the simulation encoded in + in the order [ABC] where A = (aX,aY,aZ), B = (bX,bY,bZ), + and C = (cX,cY,cZ) + + +Velocities +^^^^^^^^^^ + +Sent from the simulation engine to the receiver each IMD frame if +velocities were previously specified for this session in `Session info`_. + +.. code-block:: none + + Header: + 14 (int32) Velocities + (int32) Number of atoms in the IMD system + + Body: + (float32[n_atoms * 3]) X, Y, and Z components of the velocities + of each atom in the + IMD system encoded in the order + [X1, Y1, Z1, ..., Xn, Yn, Zn] + +Forces +^^^^^^ + +Sent from the simulation engine to the receiver each IMD frame if +forces were previously specified for this session in `Session info`_. + +.. code-block:: none + + Header: + 14 (int32) Forces + (int32) Number of atoms in the IMD system + + Body: + (float32[n_atoms * 3]) X, Y, and Z components of the forces + of each atom in the + IMD system encoded in the order + [X1, Y1, Z1, ..., Xn, Yn, Zn] + +Packet order +------------ + +After the simulation engine sends the `Handshake`_ and `Session info`_ +to the receiver and gets back a `Go`_ signal, it beings sending +IMD frames. The data within each IMD frame is always sent in the same, fixed order +in IMDv3: + +1. Time +2. Energy block +3. Box +4. Coordinates +5. Velocities +6. Forces + +If the simulation engine is configured to send only a strict subset of all +available data packets, the fixed order of the list still applies to the +remaining packets in the session. + +Units +----- + +The units in IMDv3 are fixed. The simulation engine must convert +values into these units before sending them through the socket. +The receiver must also convert forces it sends back to the simulation +engine into these units. + + +.. list-table:: + :widths: 10 10 + :header-rows: 1 + + * - Measurement + - Unit + * - Length + - angstrom + * - Velocity + - angstrom/picosecond + * - Force + - kilojoules/(mol*angstrom) + * - Time + - picosecond + * - Energy + - kilojoules/mol + -5. The server will send a new IMD_EOS (end of stream) packet after the last frame is sent unless the client initiates the disconnection with - IMD_DISCONNECT. From 6d31caafc0455597af5dc01ae3f46c93a449fbd4 Mon Sep 17 00:00:00 2001 From: ljwoods2 Date: Tue, 8 Oct 2024 20:17:42 -0700 Subject: [PATCH 02/24] revisions --- docs/source/protocol_v3.rst | 64 ++++++++++++++++++++++++++++++------- 1 file changed, 53 insertions(+), 11 deletions(-) diff --git a/docs/source/protocol_v3.rst b/docs/source/protocol_v3.rst index 0b322af..76272ce 100644 --- a/docs/source/protocol_v3.rst +++ b/docs/source/protocol_v3.rst @@ -1,17 +1,38 @@ IMDv3 Protocol ============== -IMDv3 is a protocol for communicating molecular simulation data through a socket. -It allows for two-way communication: via IMDv3, a simulation engine sends data to a receiver +IMD (Interactive Molecular Dynamics) is a protocol for communicating molecular simulation +data through a socket. +It allows for two-way communication: via IMD, a simulation engine sends data to a receiver and the receiver can send forces and certain requests back to the simulation engine. +Version numbers in IMD are monotonically increasing integers. +IMDv3, the protocol version described in this document, builds upon IMDv2, the protocol version +implemented at the time of writing in NAMD, VMD, HOOMD, GROMACS, and LAMMPS. IMDv2 is itself +a modification of the original IMD protocol published in 2001 [IMDv1]_. + +.. list-table:: + :widths: 10 30 + :header-rows: 1 + + * - IMD Version + - Protocol specification + * - 1 + - [IMDv1]_ + * - 2 + - No official specification, but sample protocol and API implementation available [IMDv2]_. + Some characteristics of the IMDv2 protocol mentioned in this document are inferred from commonalities in existing IMDv2 implementations. + * - 3 + - This specification + Terms ----- **Simulation engine**: The producer of simulation data which listens for a receiver connection and sends it simulation data **Receiver**: The receiver of simulation data, connects to listening simulation engine and receives simulation data through a socket -**IMD frame**: A simulation frame in which IMD data will be sent from the simulation engine to the receiver +**IMD frame**: A simulation integration step in which IMD data will be sent from the simulation engine to the receiver. The IMD +transmission rate defines which integration steps are IMD frames. **IMD system**: The subset of all atoms in the simulation for which the simulation engine can output IMD data and for which the receiver can send forces to the simulation engine to apply to. @@ -61,43 +82,60 @@ This table lists all header types available in IMDv3. Below, each header type and its associated body packet (if present) is described in detail. .. list-table:: - :widths: 10 10 + :widths: 10 10 10 :header-rows: 1 * - Header type - 32-bit integer enum value + - Present in IMDv2 * - Disconnect - 0 + - Yes * - Energies - 1 + - Yes * - Coordinates - 2 + - Yes * - Go - 3 + - Yes * - Handshake - 4 + - Yes * - Kill - 5 + - Yes * - MD Communication - 6 + - Yes * - Pause - 7 + - Yes, but acts as toggle in IMDv2. See `Pause`_ * - Transmission rate - 8 + - Yes * - IO Error - 9 + - Yes * - Session info - 10 + - No * - Resume - 11 + - No * - Time - 12 + - No * - Box - 13 + - No * - Velocities - 14 + - No * - Forces - 15 + - No Disconnect ^^^^^^^^^^ @@ -242,7 +280,7 @@ Pause Sent from the receiver to the simulation engine any time after the `Session info`_ has been sent to request that the simulation -engine blocks execution until a `Resume`_ packet is sent. Unlike in IMDv2, pause is idempotent, +engine pauses execution of the simulation until a `Resume`_ packet is sent. Unlike in IMDv2, pause is idempotent, meaning sending a pause packet multiple times in a row will not toggle the state of the simulation. Instead, subsequent pause packets sent after the first one will have no effect. @@ -369,7 +407,7 @@ velocities were previously specified for this session in `Session info`_. (float32[n_atoms * 3]) X, Y, and Z components of the velocities of each atom in the IMD system encoded in the order - [X1, Y1, Z1, ..., Xn, Yn, Zn] + [Vx1, Vy1, Vz1, ..., Vxn, Vyn, Vzn] Forces ^^^^^^ @@ -387,15 +425,15 @@ forces were previously specified for this session in `Session info`_. (float32[n_atoms * 3]) X, Y, and Z components of the forces of each atom in the IMD system encoded in the order - [X1, Y1, Z1, ..., Xn, Yn, Zn] + [Fx1, Fy1, Fz1, ..., Fxn, Fyn, Fzn] Packet order ------------ After the simulation engine sends the `Handshake`_ and `Session info`_ -to the receiver and gets back a `Go`_ signal, it beings sending -IMD frames. The data within each IMD frame is always sent in the same, fixed order -in IMDv3: +to the receiver and gets back a `Go`_ signal, it begins sending simulation data via +IMD. The data within each IMD frame is always sent in the same, fixed order +in IMDv3. This is a break from IMDv2 in which any packet order is acceptable. 1. Time 2. Energy block @@ -411,7 +449,7 @@ remaining packets in the session. Units ----- -The units in IMDv3 are fixed. The simulation engine must convert +Like in IMDv2, the units in IMDv3 are fixed. The simulation engine must convert values into these units before sending them through the socket. The receiver must also convert forces it sends back to the simulation engine into these units. @@ -434,4 +472,8 @@ engine into these units. * - Energy - kilojoules/mol +References +---------- +.. [IMDv1] https://doi.org/10.1145/364338.364398 +.. [IMDv2] https://www.ks.uiuc.edu/Research/vmd/imd/ \ No newline at end of file From 7535d00d181e01131c00a87a3a83ed831a875dc0 Mon Sep 17 00:00:00 2001 From: ljwoods2 Date: Tue, 8 Oct 2024 20:51:05 -0700 Subject: [PATCH 03/24] revisions --- docs/source/protocol_v3.rst | 119 +++++++++++++++++++++++++----------- 1 file changed, 85 insertions(+), 34 deletions(-) diff --git a/docs/source/protocol_v3.rst b/docs/source/protocol_v3.rst index 76272ce..5943786 100644 --- a/docs/source/protocol_v3.rst +++ b/docs/source/protocol_v3.rst @@ -7,29 +7,33 @@ It allows for two-way communication: via IMD, a simulation engine sends data to and the receiver can send forces and certain requests back to the simulation engine. Version numbers in IMD are monotonically increasing integers. -IMDv3, the protocol version described in this document, builds upon IMDv2, the protocol version -implemented at the time of writing in NAMD, VMD, HOOMD, GROMACS, and LAMMPS. IMDv2 is itself +IMDv3, the protocol version described in this document, builds upon IMDv2, which is implemented +at the time of writing in NAMD, VMD, HOOMD, GROMACS, and LAMMPS. IMDv2 is itself a modification of the original IMD protocol published in 2001 [IMDv1]_. .. list-table:: :widths: 10 30 :header-rows: 1 - * - IMD Version + * - IMD version - Protocol specification * - 1 - [IMDv1]_ * - 2 - No official specification, but sample protocol and API implementation available [IMDv2]_. - Some characteristics of the IMDv2 protocol mentioned in this document are inferred from commonalities in existing IMDv2 implementations. * - 3 - - This specification + - This document + +.. note:: + + Because IMDv2 has no official specification, characteristics of the IMDv2 protocol mentioned in this document are inferred from + the sample implementation and commonalities in existing IMDv2 implementations. Terms ----- -**Simulation engine**: The producer of simulation data which listens for a receiver connection and sends it simulation data +**Simulation engine**: The producer of simulation data which listens for a receiver connection and sends it simulation data. -**Receiver**: The receiver of simulation data, connects to listening simulation engine and receives simulation data through a socket +**Receiver**: The receiver of simulation data, connects to listening simulation engine and receives simulation data through a socket. **IMD frame**: A simulation integration step in which IMD data will be sent from the simulation engine to the receiver. The IMD transmission rate defines which integration steps are IMD frames. @@ -37,6 +41,9 @@ transmission rate defines which integration steps are IMD frames. **IMD system**: The subset of all atoms in the simulation for which the simulation engine can output IMD data and for which the receiver can send forces to the simulation engine to apply to. +**IMD energy block**: A data structure specific to IMD which contains the simulation integration step and information +on the energy of the simulated system (not just the IMD system). + Example ------- @@ -88,55 +95,57 @@ and its associated body packet (if present) is described in detail. * - Header type - 32-bit integer enum value - Present in IMDv2 - * - Disconnect + * - :ref:`disconnect` - 0 - Yes - * - Energies + * - :ref:`energies` - 1 - Yes - * - Coordinates + * - :ref:`coordinates` - 2 - Yes - * - Go + * - :ref:`go` - 3 - Yes - * - Handshake + * - :ref:`handshake` - 4 - Yes - * - Kill + * - :ref:`kill` - 5 - Yes - * - MD Communication + * - :ref:`md-communication` - 6 - Yes - * - Pause + * - :ref:`pause` - 7 - - Yes, but acts as toggle in IMDv2. See `Pause`_ - * - Transmission rate + - Yes + * - :ref:`transmission-rate` - 8 - Yes - * - IO Error + * - :ref:`io-error` - 9 - Yes - * - Session info + * - :ref:`session-info` - 10 - No - * - Resume + * - :ref:`resume` - 11 - No - * - Time + * - :ref:`time` - 12 - No - * - Box + * - :ref:`box` - 13 - No - * - Velocities + * - :ref:`velocities` - 14 - No - * - Forces + * - :ref:`forces` - 15 - No +.. _disconnect: + Disconnect ^^^^^^^^^^ @@ -151,6 +160,7 @@ made is an implementation decision. 0 (int32) Disconnect (no type) Unused slot, any value acceptable +.. _energies: Energies ^^^^^^^^ @@ -160,7 +170,7 @@ energies were previously specified for this session in `Session info`_. .. note:: While the integration step is included in this - packet, this is a result of inheriting the IMD energy block from IMDv2. It is reccomended + packet, this is a result of inheriting the IMD energy block from IMDv2. It is recommended to make use of the 64-bit integer integration step value from the time packet in analysis code instead. @@ -182,6 +192,7 @@ energies were previously specified for this session in `Session info`_. (float32) Dihedrals energy (float32) Improper dihedrals energy +.. _coordinates: Coordinates ^^^^^^^^^^^ @@ -200,6 +211,8 @@ coordinates were previously specified for this session in `Session info`_. IMD system encoded in the order [X1, Y1, Z1, ..., Xn, Yn, Zn] +.. _go: + Go ^^ @@ -217,6 +230,7 @@ exits or accepts another connection after this is an implementation decision. 3 (int32) Go (no type) Unused slot, any value acceptable +.. _handshake: Handshake ^^^^^^^^^ @@ -237,6 +251,8 @@ by the simulation engine before being sent, speeding up execution. 4 (int32) Handshake 3 (int32, unswapped byte order) IMD version used in session +.. _kill: + Kill ^^^^ @@ -251,6 +267,8 @@ honors this request is an implementation decision. 5 (int32) Kill (no type) Unused slot, any value acceptable +.. _md-communication: + MD Communication ^^^^^^^^^^^^^^^^ @@ -263,26 +281,28 @@ Whether or not the simulation engine honors this request is an implementation de Header: 6 (int32) MD Communication - (int32) Number of forces to be applied to the simulation engine + (int32) Number of atoms in the IMD system to apply forces to Body: - (int32[n_forces]) Indices of atoms in the IMD system to apply forces to - (float32[n_forces * 3]) The X, Y, and Z components of forces to be applied to - the atoms at the indices specified in the above array + (int32[n_atoms]) Indices of atoms in the IMD system to apply forces to + (float32[n_atoms * 3]) The X, Y, and Z components of forces to be applied to + the atoms at the indices specified in the above array The array of IMD system indices does not need to be monotonically increasing, meaning -the indices can be "out of order". However, indices cannot be duplicated, i.e. -the index array cannot contain index "2" twice. In this case, the force vectors should +the indices can be "out of order". However, the index array cannot contain any index twice. +Force vectors acting on the same index should be combined before being sent to the simulation engine to be applied. +.. _pause: + Pause ^^^^^ Sent from the receiver to the simulation engine any time after the `Session info`_ has been sent to request that the simulation -engine pauses execution of the simulation until a `Resume`_ packet is sent. Unlike in IMDv2, pause is idempotent, -meaning sending a pause packet multiple times in a row will not toggle the -state of the simulation. Instead, subsequent pause packets sent after the first one will have no effect. +engine pauses execution of the simulation until a `Resume`_ packet is sent. +Pause is idempotent, meaning subsequent pause packets sent after the first one will have no effect. + .. code-block:: none @@ -290,6 +310,13 @@ state of the simulation. Instead, subsequent pause packets sent after the first 7 (int32) Pause (no type) Unused slot, any value acceptable +.. versionchanged:: 3 + + In IMDv2, pause acted as a toggle, meaning sending a pause packet twice + would pause and then resume the simulation's execution. In IMDv3, the `Resume`_ + packet is required to resume a paused simulation since pausing is idempotent. + +.. _transmission-rate: Transmission rate ^^^^^^^^^^^^^^^^^ @@ -307,6 +334,8 @@ be an IMD frame. the transmission rate to its default value (configured by the simulation engine) +.. _io-error: + IO Error ^^^^^^^^ @@ -319,6 +348,7 @@ by the simulation engine or receiver to indicate an error has occurred. 9 (int32) IO Error (no type) Unused slot, any value acceptable +.. _session-info: Session info ^^^^^^^^^^^^ @@ -344,6 +374,10 @@ whether coordinates will be wrapped into the simulation box if present. (int8) Nonzero if velocity packets sent in each IMD frame (int8) Nonzero if force packets sent in each IMD frame +.. versionadded:: 3 + +.. _resume: + Resume ^^^^^^ @@ -357,6 +391,10 @@ if it is in a paused state. Like `Pause`_, resume is idempotent. 11 (int32) Resume (no type) Unused slot, any value acceptable +.. versionadded:: 3 + +.. _time: + Time ^^^^ @@ -374,6 +412,10 @@ time packets were previously specified for this session in `Session info`_. (unsigned float64) Current time of the simulation (unsigned int64) Current integration step of the simulation +.. versionadded:: 3 + +.. _box: + Box ^^^ @@ -390,6 +432,9 @@ box packets were previously specified for this session in `Session info`_. in the order [ABC] where A = (aX,aY,aZ), B = (bX,bY,bZ), and C = (cX,cY,cZ) +.. versionadded:: 3 + +.. _velocities: Velocities ^^^^^^^^^^ @@ -409,6 +454,10 @@ velocities were previously specified for this session in `Session info`_. IMD system encoded in the order [Vx1, Vy1, Vz1, ..., Vxn, Vyn, Vzn] +.. versionadded:: 3 + +.. _forces: + Forces ^^^^^^ @@ -427,6 +476,8 @@ forces were previously specified for this session in `Session info`_. IMD system encoded in the order [Fx1, Fy1, Fz1, ..., Fxn, Fyn, Fzn] +.. versionadded:: 3 + Packet order ------------ From f1d943513663f2bab46cd68f24aed4f64d5cb8b3 Mon Sep 17 00:00:00 2001 From: ljwoods2 Date: Tue, 8 Oct 2024 20:58:12 -0700 Subject: [PATCH 04/24] moving text to versionchanged --- docs/source/protocol_v3.rst | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/docs/source/protocol_v3.rst b/docs/source/protocol_v3.rst index 5943786..291b98f 100644 --- a/docs/source/protocol_v3.rst +++ b/docs/source/protocol_v3.rst @@ -483,8 +483,7 @@ Packet order After the simulation engine sends the `Handshake`_ and `Session info`_ to the receiver and gets back a `Go`_ signal, it begins sending simulation data via -IMD. The data within each IMD frame is always sent in the same, fixed order -in IMDv3. This is a break from IMDv2 in which any packet order is acceptable. +IMD. The data within each IMD frame is always sent in the same, fixed order: 1. Time 2. Energy block @@ -497,10 +496,18 @@ If the simulation engine is configured to send only a strict subset of all available data packets, the fixed order of the list still applies to the remaining packets in the session. +.. versionchanged:: 3 + + In IMDv2, any packet order sent by the simulation engine is acceptable + and IMD frames in the same session don't have to contain the same data packets. + For example, an IMD frame in which only energies are sent can be followed by + an IMD frame in which only coordinates are sent. In IMDv3, all packets + specified in the session info must be sent in the same order and for every IMD frame. + Units ----- -Like in IMDv2, the units in IMDv3 are fixed. The simulation engine must convert +The units in IMDv3 are fixed. The simulation engine must convert values into these units before sending them through the socket. The receiver must also convert forces it sends back to the simulation engine into these units. From 5316e64796f5450d06a649f40f41e58f5302ee8e Mon Sep 17 00:00:00 2001 From: ljwoods2 Date: Tue, 8 Oct 2024 20:59:16 -0700 Subject: [PATCH 05/24] grammar --- docs/source/protocol_v3.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/protocol_v3.rst b/docs/source/protocol_v3.rst index 291b98f..260743f 100644 --- a/docs/source/protocol_v3.rst +++ b/docs/source/protocol_v3.rst @@ -502,7 +502,7 @@ remaining packets in the session. and IMD frames in the same session don't have to contain the same data packets. For example, an IMD frame in which only energies are sent can be followed by an IMD frame in which only coordinates are sent. In IMDv3, all packets - specified in the session info must be sent in the same order and for every IMD frame. + specified in the session info must be sent for every IMD frame and in the same order. Units ----- From 1ba3b9390d456375502e25483d9c69989bd2323c Mon Sep 17 00:00:00 2001 From: ljwoods2 Date: Tue, 8 Oct 2024 21:02:31 -0700 Subject: [PATCH 06/24] moving definitions --- docs/source/protocol_v3.rst | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/source/protocol_v3.rst b/docs/source/protocol_v3.rst index 260743f..d6c4dae 100644 --- a/docs/source/protocol_v3.rst +++ b/docs/source/protocol_v3.rst @@ -38,6 +38,9 @@ Terms **IMD frame**: A simulation integration step in which IMD data will be sent from the simulation engine to the receiver. The IMD transmission rate defines which integration steps are IMD frames. +**IMD transmission rate**: The number of integration steps in between each IMD frame. For example, +if the transmission rate is 1, every integration step in the simulation will be an IMD frame. + **IMD system**: The subset of all atoms in the simulation for which the simulation engine can output IMD data and for which the receiver can send forces to the simulation engine to apply to. @@ -322,9 +325,7 @@ Transmission rate ^^^^^^^^^^^^^^^^^ Sent from the receiver to the simulation engine any time after the `Session info`_ -has been sent to change the number of steps between each IMD frame. For example, -if the transmission rate is 1, every integration step in the simulation will -be an IMD frame. +has been sent to change the IMD transmission rate. .. code-block:: none From ca1a90709357918c19d4566a43428d7af193febe Mon Sep 17 00:00:00 2001 From: ljwoods2 Date: Wed, 9 Oct 2024 12:54:25 -0700 Subject: [PATCH 07/24] revisions from feedback --- docs/source/index.rst | 3 +-- docs/source/protocol_v3.rst | 14 +++++++++++--- 2 files changed, 12 insertions(+), 5 deletions(-) diff --git a/docs/source/index.rst b/docs/source/index.rst index e17cb28..05d21f1 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -12,5 +12,4 @@ Welcome to IMDClient's documentation! getting_started api - protocol_current - protocol_proposed \ No newline at end of file + protocol_v3 \ No newline at end of file diff --git a/docs/source/protocol_v3.rst b/docs/source/protocol_v3.rst index d6c4dae..5baca74 100644 --- a/docs/source/protocol_v3.rst +++ b/docs/source/protocol_v3.rst @@ -296,6 +296,14 @@ the indices can be "out of order". However, the index array cannot contain any i Force vectors acting on the same index should be combined before being sent to the simulation engine to be applied. +.. note:: + + Though this packet is sent by the reciever, the rule that all body packets are + sent in the native endianness of the machine running the simulation engine + still applies here. The receiver must use the endianness it gets from + the :ref:`handshake` and swap the endianness of the indices and forces + if necessary before sending. + .. _pause: Pause @@ -409,9 +417,9 @@ time packets were previously specified for this session in `Session info`_. 1 (int32) Number of time packets being sent Body: - (unsigned float64) dt for the simulation - (unsigned float64) Current time of the simulation - (unsigned int64) Current integration step of the simulation + (float64) dt for the simulation + (float64) Current time of the simulation + (int64) Current integration step of the simulation .. versionadded:: 3 From 398898952d90320e1fed274ab7b85f0ad2d1075b Mon Sep 17 00:00:00 2001 From: Lawson Woods Date: Wed, 9 Oct 2024 13:37:28 -0700 Subject: [PATCH 08/24] readthedocs, readme, bibtex --- README.md | 42 ++++++++++++++++++------------------- docs/requirements.yaml | 1 + docs/source/conf.py | 4 ++++ docs/source/protocol_v3.rst | 13 ++++++------ readthedocs.yaml | 16 ++++++++++++++ 5 files changed, 48 insertions(+), 28 deletions(-) create mode 100644 readthedocs.yaml diff --git a/README.md b/README.md index 23611e7..48e49be 100644 --- a/README.md +++ b/README.md @@ -1,33 +1,33 @@ IMDClient ============================== - Build this package from source: @@ -87,8 +87,8 @@ pip install ".[test,doc]" ### Copyright -The IMDReader source code is hosted at https://github.com/becksteinlab/imdreader -and is available under the GNU General Public License, version 2 (see the file [LICENSE](https://github.com/becksteinlab/imdreader/blob/main/LICENSE)). +The IMDClient source code is hosted at https://github.com/becksteinlab/imdclient +and is available under the MIT license (see the file [LICENSE](https://github.com/becksteinlab/imdclient/blob/main/LICENSE)). Copyright (c) 2024, Lawson @@ -97,4 +97,4 @@ Copyright (c) 2024, Lawson Project based on the [MDAnalysis Cookiecutter](https://github.com/MDAnalysis/cookiecutter-mda) version 0.1. -Please cite [MDAnalysis](https://github.com/MDAnalysis/mdanalysis#citation) when using IMDReader in published work. --> + --> diff --git a/docs/requirements.yaml b/docs/requirements.yaml index 88acb21..3e94825 100644 --- a/docs/requirements.yaml +++ b/docs/requirements.yaml @@ -9,5 +9,6 @@ dependencies: - pip - mdanalysis-sphinx-theme >=1.0.1 + - sphinxcontrib-bibtex # Pip-only installs #- pip: diff --git a/docs/source/conf.py b/docs/source/conf.py index 20a3cb0..248f4e1 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -54,6 +54,7 @@ "sphinx.ext.intersphinx", "sphinx.ext.extlinks", "mdanalysis_sphinx_theme", + "sphinxcontrib.bibtex", ] autosummary_generate = True @@ -195,3 +196,6 @@ "python": ("https://docs.python.org/3/", None), "mdanalysis": ("https://docs.mdanalysis.org/stable/", None), } + +bibtex_bibfiles = ["refs.bib"] +bibtex_default_style = "unsrt" diff --git a/docs/source/protocol_v3.rst b/docs/source/protocol_v3.rst index 5baca74..d0a5b9b 100644 --- a/docs/source/protocol_v3.rst +++ b/docs/source/protocol_v3.rst @@ -8,8 +8,8 @@ and the receiver can send forces and certain requests back to the simulation eng Version numbers in IMD are monotonically increasing integers. IMDv3, the protocol version described in this document, builds upon IMDv2, which is implemented -at the time of writing in NAMD, VMD, HOOMD, GROMACS, and LAMMPS. IMDv2 is itself -a modification of the original IMD protocol published in 2001 [IMDv1]_. +at the time of writing in NAMD, VMD, GROMACS, and LAMMPS. IMDv2 is itself +a modification of the original IMD protocol published in 2001 :cite:p:`IMDv1:2001`. .. list-table:: :widths: 10 30 @@ -18,9 +18,9 @@ a modification of the original IMD protocol published in 2001 [IMDv1]_. * - IMD version - Protocol specification * - 1 - - [IMDv1]_ + - *A system for interactive molecular dynamics simulation* :cite:p:`IMDv1:2001` * - 2 - - No official specification, but sample protocol and API implementation available [IMDv2]_. + - No official specification, but sample protocol and API implementation available :cite:p:`IMDv2` * - 3 - This document @@ -298,7 +298,7 @@ be combined before being sent to the simulation engine to be applied. .. note:: - Though this packet is sent by the reciever, the rule that all body packets are + Though this packet is sent by the receiver, the rule that all body packets are sent in the native endianness of the machine running the simulation engine still applies here. The receiver must use the endianness it gets from the :ref:`handshake` and swap the endianness of the indices and forces @@ -542,5 +542,4 @@ engine into these units. References ---------- -.. [IMDv1] https://doi.org/10.1145/364338.364398 -.. [IMDv2] https://www.ks.uiuc.edu/Research/vmd/imd/ \ No newline at end of file +.. bibliography:: diff --git a/readthedocs.yaml b/readthedocs.yaml new file mode 100644 index 0000000..8e57773 --- /dev/null +++ b/readthedocs.yaml @@ -0,0 +1,16 @@ +# readthedocs.yaml + +version: 2 + +build: + os: ubuntu-22.04 + tools: + python: "mambaforge-4.10" + +python: + install: + - method: pip + path: . + +conda: + environment: docs/requirements.yaml \ No newline at end of file From 5f246f524b24ef97cc33dc695aa975fe8c065f20 Mon Sep 17 00:00:00 2001 From: Lawson Woods Date: Wed, 9 Oct 2024 13:41:08 -0700 Subject: [PATCH 09/24] typo --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 48e49be..6da6c5e 100644 --- a/README.md +++ b/README.md @@ -97,4 +97,4 @@ Copyright (c) 2024, Lawson Project based on the [MDAnalysis Cookiecutter](https://github.com/MDAnalysis/cookiecutter-mda) version 0.1. - --> + From a809ab495786d05de5ffa48595ea45deaa7f456b Mon Sep 17 00:00:00 2001 From: Lawson Woods Date: Fri, 11 Oct 2024 10:01:05 -0700 Subject: [PATCH 10/24] add citations --- docs/source/refs.bib | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) create mode 100644 docs/source/refs.bib diff --git a/docs/source/refs.bib b/docs/source/refs.bib new file mode 100644 index 0000000..b27fdd5 --- /dev/null +++ b/docs/source/refs.bib @@ -0,0 +1,22 @@ + +@inproceedings{IMDv1:2001, +author = {Stone, John E. and Gullingsrud, Justin and Schulten, Klaus}, +title = {A system for interactive molecular dynamics simulation}, +year = {2001}, +isbn = {1581132921}, +publisher = {Association for Computing Machinery}, +address = {New York, NY, USA}, +url = {https://doi.org/10.1145/364338.364398}, +doi = {10.1145/364338.364398}, +booktitle = {Proceedings of the 2001 Symposium on Interactive 3D Graphics}, +pages = {191–194}, +numpages = {4}, +keywords = {steered molecular dynamics, molecular dynamics, haptic feedback}, +series = {I3D '01} +} + +@misc{IMDv2, +tite = {Interactive Molecular Dynamics Simulation}, +howpublished = {\url{https://www.ks.uiuc.edu/Research/vmd/imd/}}, +note = {Accessed: 2024-10-09} +} \ No newline at end of file From 17d4707ac338f763f5182c67cc203f616eeb32db Mon Sep 17 00:00:00 2001 From: Lawson Woods Date: Fri, 11 Oct 2024 13:13:01 -0700 Subject: [PATCH 11/24] added stackable, streamable analysis --- imdclient/__init__.py | 4 + imdclient/backends.py | 352 ++++++++ imdclient/results.py | 332 +++++++ imdclient/streamanalysis.py | 1056 +++++++++++++++++++++++ imdclient/streambase.py | 35 +- imdclient/tests/test_imdreader.py | 9 + imdclient/tests/test_stream_analysis.py | 78 ++ 7 files changed, 1854 insertions(+), 12 deletions(-) create mode 100644 imdclient/backends.py create mode 100644 imdclient/results.py create mode 100644 imdclient/streamanalysis.py create mode 100644 imdclient/tests/test_stream_analysis.py diff --git a/imdclient/__init__.py b/imdclient/__init__.py index 99c7f0e..0210eb2 100644 --- a/imdclient/__init__.py +++ b/imdclient/__init__.py @@ -6,5 +6,9 @@ from .IMDREADER import IMDReader from importlib.metadata import version +from .streamanalysis import AnalysisBase, StackableAnalysis +from MDAnalysis.analysis import base + +base.AnalysisBase = AnalysisBase __version__ = version("imdclient") diff --git a/imdclient/backends.py b/imdclient/backends.py new file mode 100644 index 0000000..cf9e241 --- /dev/null +++ b/imdclient/backends.py @@ -0,0 +1,352 @@ +# Copy of backends from MDA 2.8.0 +"""Analysis backends --- :mod:`MDAnalysis.analysis.backends` +============================================================ + +.. versionadded:: 2.8.0 + + +The :mod:`backends` module provides :class:`BackendBase` base class to +implement custom execution backends for +:meth:`MDAnalysis.analysis.base.AnalysisBase.run` and its +subclasses. + +.. SeeAlso:: :ref:`parallel-analysis` + +.. _backends: + +Backends +-------- + +Three built-in backend classes are provided: + +* *serial*: :class:`BackendSerial`, that is equivalent to using no + parallelization and is the default + +* *multiprocessing*: :class:`BackendMultiprocessing` that supports + parallelization via standard Python :mod:`multiprocessing` module + and uses default :mod:`pickle` serialization + +* *dask*: :class:`BackendDask`, that uses the same process-based + parallelization as :class:`BackendMultiprocessing`, but different + serialization algorithm via `dask `_ (see `dask + serialization algorithms + `_ for details) + +Classes +------- + +""" +import warnings +from typing import Callable +import importlib.util + + +def is_installed(modulename: str): + """Checks if module is installed + + Parameters + ---------- + modulename : str + name of the module to be tested + + + .. versionadded:: 2.8.0 + """ + return importlib.util.find_spec(modulename) is not None + + +class BackendBase: + """Base class for backend implementation. + + Initializes an instance and performs checks for its validity, such as + ``n_workers`` and possibly other ones. + + Parameters + ---------- + n_workers : int + number of workers (usually, processes) over which the work is split + + Examples + -------- + .. code-block:: python + + from MDAnalysis.analysis.backends import BackendBase + + class ThreadsBackend(BackendBase): + def apply(self, func, computations): + from multiprocessing.dummy import Pool + + with Pool(processes=self.n_workers) as pool: + results = pool.map(func, computations) + return results + + import MDAnalysis as mda + from MDAnalysis.tests.datafiles import PSF, DCD + from MDAnalysis.analysis.rms import RMSD + + u = mda.Universe(PSF, DCD) + ref = mda.Universe(PSF, DCD) + + R = RMSD(u, ref) + + n_workers = 2 + backend = ThreadsBackend(n_workers=n_workers) + R.run(backend=backend, unsupported_backend=True) + + .. warning:: + Using `ThreadsBackend` above will lead to erroneous results, since it + is an educational example. Do not use it for real analysis. + + + .. versionadded:: 2.8.0 + + """ + + def __init__(self, n_workers: int): + self.n_workers = n_workers + self._validate() + + def _get_checks(self): + """Get dictionary with ``condition: error_message`` pairs that ensure the + validity of the backend instance + + Returns + ------- + dict + dictionary with ``condition: error_message`` pairs that will get + checked during ``_validate()`` run + """ + return { + isinstance(self.n_workers, int) + and self.n_workers + > 0: f"n_workers should be positive integer, got {self.n_workers=}", + } + + def _get_warnings(self): + """Get dictionary with ``condition: warning_message`` pairs that ensure + the good usage of the backend instance + + Returns + ------- + dict + dictionary with ``condition: warning_message`` pairs that will get + checked during ``_validate()`` run + """ + return dict() + + def _validate(self): + """Check correctness (e.g. ``dask`` is installed if using ``backend='dask'``) + and good usage (e.g. ``n_workers=1`` if backend is serial) of the backend + + Raises + ------ + ValueError + if one of the conditions in :meth:`_get_checks` is ``True`` + """ + for check, msg in self._get_checks().items(): + if not check: + raise ValueError(msg) + for check, msg in self._get_warnings().items(): + if not check: + warnings.warn(msg) + + def apply(self, func: Callable, computations: list) -> list: + """map function `func` to all tasks in the `computations` list + + Main method that will get called when using an instance of + ``BackendBase``. It is equivalent to running ``[func(item) for item in + computations]`` while using the parallel backend capabilities. + + Parameters + ---------- + func : Callable + function to be called on each of the tasks in computations list + computations : list + computation tasks to apply function to + + Returns + ------- + list + list of results of the function + + """ + raise NotImplementedError + + +class BackendSerial(BackendBase): + """A built-in backend that does serial execution of the function, without any + parallelization. + + Parameters + ---------- + n_workers : int + Is ignored in this class, and if ``n_workers`` > 1, a warning will be + given. + + + .. versionadded:: 2.8.0 + """ + + def _get_warnings(self): + """Get dictionary with ``condition: warning_message`` pairs that ensure + the good usage of the backend instance. Here, it checks if the number + of workers is not 1, otherwise gives warning. + + Returns + ------- + dict + dictionary with ``condition: warning_message`` pairs that will get + checked during ``_validate()`` run + """ + return { + self.n_workers + == 1: "n_workers is ignored when executing with backend='serial'" + } + + def apply(self, func: Callable, computations: list) -> list: + """ + Serially applies `func` to each task object in ``computations``. + + Parameters + ---------- + func : Callable + function to be called on each of the tasks in computations list + computations : list + computation tasks to apply function to + + Returns + ------- + list + list of results of the function + """ + return [func(task) for task in computations] + + +class BackendMultiprocessing(BackendBase): + """A built-in backend that executes a given function using the + :meth:`multiprocessing.Pool.map ` method. + + Parameters + ---------- + n_workers : int + number of processes in :class:`multiprocessing.Pool + ` to distribute the workload + between. Must be a positive integer. + + Examples + -------- + + .. code-block:: python + + from MDAnalysis.analysis.backends import BackendMultiprocessing + import multiprocessing as mp + + backend_obj = BackendMultiprocessing(n_workers=mp.cpu_count()) + + + .. versionadded:: 2.8.0 + + """ + + def apply(self, func: Callable, computations: list) -> list: + """Applies `func` to each object in ``computations`` using `multiprocessing`'s `Pool.map`. + + Parameters + ---------- + func : Callable + function to be called on each of the tasks in computations list + computations : list + computation tasks to apply function to + + Returns + ------- + list + list of results of the function + """ + from multiprocessing import Pool + + with Pool(processes=self.n_workers) as pool: + results = pool.map(func, computations) + return results + + +class BackendDask(BackendBase): + """A built-in backend that executes a given function with *dask*. + + Execution is performed with the :func:`dask.compute` function of + :class:`dask.delayed.Delayed` object (created with + :func:`dask.delayed.delayed`) with ``scheduler='processes'`` and + ``chunksize=1`` (this ensures uniform distribution of tasks among + processes). Requires the `dask package `_ + to be `installed `_. + + Parameters + ---------- + n_workers : int + number of processes in to distribute the workload + between. Must be a positive integer. Workers are actually + :class:`multiprocessing.pool.Pool` processes, but they use a different and + more flexible `serialization protocol + `_. + + Examples + -------- + + .. code-block:: python + + from MDAnalysis.analysis.backends import BackendDask + import multiprocessing as mp + + backend_obj = BackendDask(n_workers=mp.cpu_count()) + + + .. versionadded:: 2.8.0 + + """ + + def apply(self, func: Callable, computations: list) -> list: + """Applies `func` to each object in ``computations``. + + Parameters + ---------- + func : Callable + function to be called on each of the tasks in computations list + computations : list + computation tasks to apply function to + + Returns + ------- + list + list of results of the function + """ + from dask.delayed import delayed + import dask + + computations = [delayed(func)(task) for task in computations] + results = dask.compute( + computations, + scheduler="processes", + chunksize=1, + num_workers=self.n_workers, + )[0] + return results + + def _get_checks(self): + """Get dictionary with ``condition: error_message`` pairs that ensure the + validity of the backend instance. Here checks if ``dask`` module is + installed in the environment. + + Returns + ------- + dict + dictionary with ``condition: error_message`` pairs that will get + checked during ``_validate()`` run + """ + base_checks = super()._get_checks() + checks = { + is_installed("dask"): ( + "module 'dask' is missing. Please install 'dask': " + "https://docs.dask.org/en/stable/install.html" + ) + } + return base_checks | checks diff --git a/imdclient/results.py b/imdclient/results.py new file mode 100644 index 0000000..949bc4e --- /dev/null +++ b/imdclient/results.py @@ -0,0 +1,332 @@ +# Copy of MDAnalysis.analysis.results from 2.8.0 + +"""Analysis results and their aggregation --- :mod:`MDAnalysis.analysis.results` +================================================================================ + +Module introduces two classes, :class:`Results` and :class:`ResultsGroup`, +used for storing and aggregating data in +:meth:`MDAnalysis.analysis.base.AnalysisBase.run()`, respectively. + + +Classes +------- + +The :class:`Results` class is an extension of a built-in dictionary +type, that holds all assigned attributes in :attr:`self.data` and +allows for access either via dict-like syntax, or via class-like syntax: + +.. code-block:: python + + from MDAnalysis.analysis.results import Results + r = Results() + r.array = [1, 2, 3, 4] + assert r['array'] == r.array == [1, 2, 3, 4] + + +The :class:`ResultsGroup` can merge multiple :class:`Results` objects. +It is mainly used by :class:`MDAnalysis.analysis.base.AnalysisBase` class, +that uses :meth:`ResultsGroup.merge()` method to aggregate results from +multiple workers, initialized during a parallel run: + +.. code-block:: python + + from MDAnalysis.analysis.results import Results, ResultsGroup + import numpy as np + + r1, r2 = Results(), Results() + r1.masses = [1, 2, 3, 4, 5] + r2.masses = [0, 0, 0, 0] + r1.vectors = np.arange(10).reshape(5, 2) + r2.vectors = np.arange(8).reshape(4, 2) + + group = ResultsGroup( + lookup = { + 'masses': ResultsGroup.flatten_sequence, + 'vectors': ResultsGroup.ndarray_vstack + } + ) + + r = group.merge([r1, r2]) + assert r.masses == list((*r1.masses, *r2.masses)) + assert (r.vectors == np.vstack([r1.vectors, r2.vectors])).all() +""" + +from collections import UserDict +import numpy as np +from typing import Callable, Sequence + + +class Results(UserDict): + r"""Container object for storing results. + + :class:`Results` are dictionaries that provide two ways by which values + can be accessed: by dictionary key ``results["value_key"]`` or by object + attribute, ``results.value_key``. :class:`Results` stores all results + obtained from an analysis after calling :meth:`~AnalysisBase.run()`. + + The implementation is similar to the :class:`sklearn.utils.Bunch` + class in `scikit-learn`_. + + .. _`scikit-learn`: https://scikit-learn.org/ + .. _`sklearn.utils.Bunch`: https://scikit-learn.org/stable/modules/generated/sklearn.utils.Bunch.html + + Raises + ------ + AttributeError + If an assigned attribute has the same name as a default attribute. + + ValueError + If a key is not of type ``str`` and therefore is not able to be + accessed by attribute. + + Examples + -------- + >>> from MDAnalysis.analysis.base import Results + >>> results = Results(a=1, b=2) + >>> results['b'] + 2 + >>> results.b + 2 + >>> results.a = 3 + >>> results['a'] + 3 + >>> results.c = [1, 2, 3, 4] + >>> results['c'] + [1, 2, 3, 4] + + + .. versionadded:: 2.0.0 + + .. versionchanged:: 2.8.0 + Moved :class:`Results` to :mod:`MDAnalysis.analysis.results` + """ + + def _validate_key(self, key): + if key in dir(self): + raise AttributeError( + f"'{key}' is a protected dictionary attribute" + ) + elif isinstance(key, str) and not key.isidentifier(): + raise ValueError(f"'{key}' is not a valid attribute") + + def __init__(self, *args, **kwargs): + kwargs = dict(*args, **kwargs) + if "data" in kwargs.keys(): + raise AttributeError(f"'data' is a protected dictionary attribute") + self.__dict__["data"] = {} + self.update(kwargs) + + def __setitem__(self, key, item): + self._validate_key(key) + super().__setitem__(key, item) + + def __setattr__(self, attr, val): + if attr == "data": + super().__setattr__(attr, val) + else: + self.__setitem__(attr, val) + + def __getattr__(self, attr): + try: + return self[attr] + except KeyError as err: + raise AttributeError( + f"'Results' object has no attribute '{attr}'" + ) from err + + def __delattr__(self, attr): + try: + del self[attr] + except KeyError as err: + raise AttributeError( + f"'Results' object has no attribute '{attr}'" + ) from err + + def __getstate__(self): + return self.data + + def __setstate__(self, state): + self.data = state + + +class ResultsGroup: + """ + Management and aggregation of results stored in :class:`Results` instances. + + A :class:`ResultsGroup` is an optional description for :class:`Result` "dictionaries" + that are used in analysis classes based on :class:`AnalysisBase`. For each *key* in a + :class:`Result` it describes how multiple pieces of the data held under the key are + to be aggregated. This approach is necessary when parts of a trajectory are analyzed + independently (e.g., in parallel) and then need to me merged (with :meth:`merge`) to + obtain a complete data set. + + Parameters + ---------- + lookup : dict[str, Callable], optional + aggregation functions lookup dict, by default None + + Examples + -------- + + .. code-block:: python + + from MDAnalysis.analysis.results import ResultsGroup, Results + group = ResultsGroup(lookup={'mass': ResultsGroup.float_mean}) + obj1 = Results(mass=1) + obj2 = Results(mass=3) + assert {'mass': 2.0} == group.merge([obj1, obj2]) + + + .. code-block:: python + + # you can also set `None` for those attributes that you want to skip + lookup = {'mass': ResultsGroup.float_mean, 'trajectory': None} + group = ResultsGroup(lookup) + objects = [Results(mass=1, skip=None), Results(mass=3, skip=object)] + assert group.merge(objects, require_all_aggregators=False) == {'mass': 2.0} + + .. versionadded:: 2.8.0 + """ + + def __init__(self, lookup: dict[str, Callable] = None): + self._lookup = lookup + + def merge( + self, objects: Sequence[Results], require_all_aggregators: bool = True + ) -> Results: + """Merge multiple Results into a single Results instance. + + Merge multiple :class:`Results` instances into a single one, using the + `lookup` dictionary to determine the appropriate aggregator functions for + each named results attribute. If the resulting object only contains a single + element, it just returns it without using any aggregators. + + Parameters + ---------- + objects : Sequence[Results] + Multiple :class:`Results` instances with the same data attributes. + require_all_aggregators : bool, optional + if True, raise an exception when no aggregation function for a + particular argument is found. Allows to skip aggregation for the + parameters that aren't needed in the final object -- + see :class:`ResultsGroup`. + + Returns + ------- + Results + merged :class:`Results` + + Raises + ------ + ValueError + if no aggregation function for a key is found and ``require_all_aggregators=True`` + """ + if len(objects) == 1: + merged_results = objects[0] + return merged_results + + merged_results = Results() + for key in objects[0].keys(): + agg_function = self._lookup.get(key, None) + if agg_function is not None: + results_of_t = [obj[key] for obj in objects] + merged_results[key] = agg_function(results_of_t) + elif require_all_aggregators: + raise ValueError(f"No aggregation function for {key=}") + return merged_results + + @staticmethod + def flatten_sequence(arrs: list[list]): + """Flatten a list of lists into a list + + Parameters + ---------- + arrs : list[list] + list of lists + + Returns + ------- + list + flattened list + """ + return [item for sublist in arrs for item in sublist] + + @staticmethod + def ndarray_sum(arrs: list[np.ndarray]): + """sums an ndarray along ``axis=0`` + + Parameters + ---------- + arrs : list[np.ndarray] + list of input arrays. Must have the same shape. + + Returns + ------- + np.ndarray + sum of input arrays + """ + return np.array(arrs).sum(axis=0) + + @staticmethod + def ndarray_mean(arrs: list[np.ndarray]): + """calculates mean of input ndarrays along ``axis=0`` + + Parameters + ---------- + arrs : list[np.ndarray] + list of input arrays. Must have the same shape. + + Returns + ------- + np.ndarray + mean of input arrays + """ + return np.array(arrs).mean(axis=0) + + @staticmethod + def float_mean(floats: list[float]): + """calculates mean of input float values + + Parameters + ---------- + floats : list[float] + list of float values + + Returns + ------- + float + mean value + """ + return np.array(floats).mean() + + @staticmethod + def ndarray_hstack(arrs: list[np.ndarray]): + """Performs horizontal stack of input arrays + + Parameters + ---------- + arrs : list[np.ndarray] + input numpy arrays + + Returns + ------- + np.ndarray + result of stacking + """ + return np.hstack(arrs) + + @staticmethod + def ndarray_vstack(arrs: list[np.ndarray]): + """Performs vertical stack of input arrays + + Parameters + ---------- + arrs : list[np.ndarray] + input numpy arrays + + Returns + ------- + np.ndarray + result of stacking + """ + return np.vstack(arrs) diff --git a/imdclient/streamanalysis.py b/imdclient/streamanalysis.py new file mode 100644 index 0000000..281e733 --- /dev/null +++ b/imdclient/streamanalysis.py @@ -0,0 +1,1056 @@ +# -*- Mode: python; tab-width: 4; indent-tabs-mode:nil; coding:utf-8 -*- +# vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4 +# +# MDAnalysis --- https://www.mdanalysis.org +# Copyright (c) 2006-2017 The MDAnalysis Development Team and contributors +# (see the file AUTHORS for the full list of names) +# +# Released under the GNU Public Licence, v2 or any higher version +# +# Please cite your use of MDAnalysis in published work: +# +# R. J. Gowers, M. Linke, J. Barnoud, T. J. E. Reddy, M. N. Melo, S. L. Seyler, +# D. L. Dotson, J. Domanski, S. Buchoux, I. M. Kenney, and O. Beckstein. +# MDAnalysis: A Python package for the rapid analysis of molecular dynamics +# simulations. In S. Benthall and S. Rostrup editors, Proceedings of the 15th +# Python in Science Conference, pages 102-109, Austin, TX, 2016. SciPy. +# doi: 10.25080/majora-629e541a-00e +# +# N. Michaud-Agrawal, E. J. Denning, T. B. Woolf, and O. Beckstein. +# MDAnalysis: A Toolkit for the Analysis of Molecular Dynamics Simulations. +# J. Comput. Chem. 32 (2011), 2319--2327, doi:10.1002/jcc.21787 +# +"""Analysis building blocks --- :mod:`MDAnalysis.analysis.base` +============================================================ + +MDAnalysis provides building blocks for creating analysis classes. One can +think of each analysis class as a "tool" that performs a specific analysis over +the trajectory frames and stores the results in the tool. + +Analysis classes are derived from :class:`AnalysisBase` by subclassing. This +inheritance provides a common workflow and API for users and makes many +additional features automatically available (such as frame selections and a +verbose progressbar). The important points for analysis classes are: + +#. Analysis tools are Python classes derived from :class:`AnalysisBase`. +#. When instantiating an analysis, the :class:`Universe` or :class:`AtomGroup` + that the analysis operates on is provided together with any other parameters + that are kept fixed for the specific analysis. +#. The analysis is performed with :meth:`~AnalysisBase.run` method. It has a + common set of arguments such as being able to select the frames the analysis + is performed on. The `verbose` keyword argument enables additional output. A + progressbar is shown by default that also shows an estimate for the + remaining time until the end of the analysis. +#. Results are always stored in the attribute :attr:`AnalysisBase.results`, + which is an instance of :class:`Results`, a kind of dictionary that allows + allows item access via attributes. Each analysis class decides what and how + to store in :class:`Results` and needs to document it. For time series, the + :attr:`AnalysisBase.times` contains the time stamps of the analyzed frames. + + +Example of using a standard analysis tool +----------------------------------------- + +For example, the :class:`MDAnalysis.analysis.rms.RMSD` performs a +root-mean-square distance analysis in the following way: + +.. code-block:: python + + import MDAnalysis as mda + from MDAnalysisTests.datafiles import TPR, XTC + + from MDAnalysis.analysis import rms + + u = mda.Universe(TPR, XTC) + + # (2) instantiate analysis + rmsd = rms.RMSD(u, select='name CA') + + # (3) the run() method can select frames in different ways + # run on all frames (with progressbar) + rmsd.run(verbose=True) + + # or start, stop, and step can be used + rmsd.run(start=2, stop=8, step=2) + + # a list of frames to run the analysis on can be passed + rmsd.run(frames=[0,2,3,6,9]) + + # a list of booleans the same length of the trajectory can be used + rmsd.run(frames=[True, False, True, True, False, False, True, False, + False, True]) + + # (4) analyze the results, e.g., plot + t = rmsd.times + y = rmsd.results.rmsd[:, 2] # RMSD at column index 2, see docs + + import matplotlib.pyplot as plt + plt.plot(t, y) + plt.xlabel("time (ps)") + plt.ylabel("RMSD (Å)") + + +Writing new analysis tools +-------------------------- + +In order to write new analysis tools, derive a class from :class:`AnalysisBase` +and define at least the :meth:`_single_frame` method, as described in +:class:`AnalysisBase`. + +.. SeeAlso:: + + The chapter `Writing your own trajectory analysis`_ in the *User Guide* + contains a step-by-step example for writing analysis tools with + :class:`AnalysisBase`. + +.. _`Writing your own trajectory analysis`: + https://userguide.mdanalysis.org/stable/examples/analysis/custom_trajectory_analysis.html + + +If your analysis is operating independently on each frame, you might consider +making it **parallelizable** via adding a :meth:`get_supported_backends` method, +and appropriate aggregation function for each of its results. For example, if +you have your :meth:`_single_frame` method storing important values under +:attr:`self.results.timeseries`, you will write: + +.. code-block:: python + + class MyAnalysis(AnalysisBase): + _analysis_algorithm_is_parallelizable = True + + @classmethod + def get_supported_backends(cls): + return ('serial', 'multiprocessing', 'dask',) + + + def _get_aggregator(self): + return ResultsGroup(lookup={'timeseries': ResultsGroup.ndarray_vstack}) + +See :mod:`MDAnalysis.analysis.results` for more on aggregating results. + +.. SeeAlso:: + + :ref:`parallel-analysis` + + + +Classes +------- + +The :class:`MDAnalysis.results.Results` and :class:`AnalysisBase` classes +are the essential building blocks for almost all MDAnalysis tools in the +:mod:`MDAnalysis.analysis` module. They aim to be easily useable and +extendable. + +:class:`AnalysisFromFunction` and the :func:`analysis_class` functions are +simple wrappers that make it even easier to create fully-featured analysis +tools if only the single-frame analysis function needs to be written. + +""" +import inspect +import itertools +import logging +import warnings +from functools import partial +from typing import Iterable, Union + +import numpy as np +from MDAnalysis import coordinates +from MDAnalysis.core.groups import AtomGroup +from MDAnalysis.lib.log import ProgressBar + +from .backends import ( + BackendDask, + BackendMultiprocessing, + BackendSerial, + BackendBase, +) +from .results import Results, ResultsGroup + +from .streambase import StreamReaderBase + +logger = logging.getLogger(__name__) + + +class AnalysisBase(object): + r"""Base class for defining multi-frame analysis + + The class is designed as a template for creating multi-frame analyses. + This class will automatically take care of setting up the trajectory + reader for iterating, and it offers to show a progress meter. + Computed results are stored inside the :attr:`results` attribute. + + To define a new Analysis, :class:`AnalysisBase` needs to be subclassed + and :meth:`_single_frame` must be defined. It is also possible to define + :meth:`_prepare` and :meth:`_conclude` for pre- and post-processing. + All results should be stored as attributes of the + :class:`MDAnalysis.analysis.results.Results` container. + + Parameters + ---------- + trajectory : MDAnalysis.coordinates.base.ReaderBase + A trajectory Reader + verbose : bool, optional + Turn on more logging and debugging + + Attributes + ---------- + times: numpy.ndarray + array of Timestep times. Only exists after calling + :meth:`AnalysisBase.run` + frames: numpy.ndarray + array of Timestep frame indices. Only exists after calling + :meth:`AnalysisBase.run` + results: :class:`Results` + results of calculation are stored after call + to :meth:`AnalysisBase.run` + + + Example + ------- + .. code-block:: python + + from MDAnalysis.analysis.base import AnalysisBase + + class NewAnalysis(AnalysisBase): + def __init__(self, atomgroup, parameter, **kwargs): + super(NewAnalysis, self).__init__(atomgroup.universe.trajectory, + **kwargs) + self._parameter = parameter + self._ag = atomgroup + + def _prepare(self): + # OPTIONAL + # Called before iteration on the trajectory has begun. + # Data structures can be set up at this time + self.results.example_result = [] + + def _single_frame(self): + # REQUIRED + # Called after the trajectory is moved onto each new frame. + # store an example_result of `some_function` for a single frame + self.results.example_result.append(some_function(self._ag, + self._parameter)) + + def _conclude(self): + # OPTIONAL + # Called once iteration on the trajectory is finished. + # Apply normalisation and averaging to results here. + self.results.example_result = np.asarray(self.example_result) + self.results.example_result /= np.sum(self.result) + + Afterwards the new analysis can be run like this + + .. code-block:: python + + import MDAnalysis as mda + from MDAnalysisTests.datafiles import PSF, DCD + + u = mda.Universe(PSF, DCD) + + na = NewAnalysis(u.select_atoms('name CA'), 35) + na.run(start=10, stop=20) + print(na.results.example_result) + # results can also be accessed by key + print(na.results["example_result"]) + + + .. versionchanged:: 1.0.0 + Support for setting `start`, `stop`, and `step` has been removed. These + should now be directly passed to :meth:`AnalysisBase.run`. + + .. versionchanged:: 2.0.0 + Added :attr:`results` + + .. versionchanged:: 2.8.0 + Added ability to run analysis in parallel using either a + built-in backend (`multiprocessing` or `dask`) or a custom + `backends.BackendBase` instance with an implemented `apply` method + that is used to run the computations. + """ + + @classmethod + def get_supported_backends(cls): + """Tuple with backends supported by the core library for a given class. + User can pass either one of these values as ``backend=...`` to + :meth:`run()` method, or a custom object that has ``apply`` method + (see documentation for :meth:`run()`): + + - 'serial': no parallelization + - 'multiprocessing': parallelization using `multiprocessing.Pool` + - 'dask': parallelization using `dask.delayed.compute()`. Requires + installation of `mdanalysis[dask]` + + If you want to add your own backend to an existing class, pass a + :class:`backends.BackendBase` subclass (see its documentation to learn + how to implement it properly), and specify ``unsupported_backend=True``. + + Returns + ------- + tuple + names of built-in backends that can be used in :meth:`run(backend=...)` + + + .. versionadded:: 2.8.0 + """ + return ("serial",) + + # class authors: override _analysis_algorithm_is_parallelizable + # in derived classes and only set to True if you have confirmed + # that your algorithm works reliably when parallelized with + # the split-apply-combine approach (see docs) + _analysis_algorithm_is_parallelizable = False + _analysis_algorithm_is_streamable = True + + @property + def parallelizable(self): + """Boolean mark showing that a given class can be parallelizable with + split-apply-combine procedure. Namely, if we can safely distribute + :meth:`_single_frame` to multiple workers and then combine them with a + proper :meth:`_conclude` call. If set to ``False``, no backends except + for ``serial`` are supported. + + .. note:: If you want to check parallelizability of the whole class, without + explicitly creating an instance of the class, see + :attr:`_analysis_algorithm_is_parallelizable`. Note that you + setting it to other value will break things if the algorithm + behind the analysis is not trivially parallelizable. + + + Returns + ------- + bool + if a given ``AnalysisBase`` subclass instance + is parallelizable with split-apply-combine, or not + + + .. versionadded:: 2.8.0 + """ + return self._analysis_algorithm_is_parallelizable + + def __init__(self, trajectory, verbose=False, **kwargs): + self._streamed = False + if isinstance(trajectory, StreamReaderBase): + self._streamed = True + self._trajectory = trajectory + self._verbose = verbose + self.results = Results() + + def _define_run_frames( + self, trajectory, start=None, stop=None, step=None, frames=None + ) -> Union[slice, np.ndarray]: + """Defines limits for the whole run, as passed by self.run() arguments + + Parameters + ---------- + trajectory : mda.Reader + a trajectory Reader + start : int, optional + start frame of analysis, by default None + stop : int, optional + stop frame of analysis, by default None + step : int, optional + number of frames to skip between each analysed frame, by default None + frames : array_like, optional + array of integers or booleans to slice trajectory; cannot be + combined with ``start``, ``stop``, ``step``; by default None + + Returns + ------- + Union[slice, np.ndarray] + Appropriate slicer for the trajectory that would give correct iteraction + order via trajectory[slicer] + + Raises + ------ + ValueError + if *both* `frames` and at least one of ``start``, ``stop``, or ``step`` + is provided (i.e. set to not ``None`` value). + + + .. versionadded:: 2.8.0 + """ + self._trajectory = trajectory + if frames is not None: + if not all(opt is None for opt in [start, stop, step]): + raise ValueError( + "start/stop/step cannot be combined with frames" + ) + slicer = frames + else: + start, stop, step = trajectory.check_slice_indices( + start, stop, step + ) + slicer = slice(start, stop, step) + self.start, self.stop, self.step = start, stop, step + return slicer + + def _prepare_sliced_trajectory(self, slicer: Union[slice, np.ndarray]): + """Prepares sliced trajectory for use in subsequent parallel computations: + namely, assigns self._sliced_trajectory and its appropriate attributes, + self.n_frames, self.frames and self.times. + + Parameters + ---------- + slicer : Union[slice, np.ndarray] + appropriate slicer for the trajectory + + + .. versionadded:: 2.8.0 + """ + self._sliced_trajectory = self._trajectory[slicer] + + self.n_frames = len(self._sliced_trajectory) + self.frames = np.zeros(self.n_frames, dtype=int) + self.times = np.zeros(self.n_frames) + + def _setup_frames( + self, trajectory, start=None, stop=None, step=None, frames=None + ): + """Pass a Reader object and define the desired iteration pattern + through the trajectory + + Parameters + ---------- + trajectory : mda.Reader + A trajectory Reader + start : int, optional + start frame of analysis + stop : int, optional + stop frame of analysis + step : int, optional + number of frames to skip between each analysed frame + frames : array_like, optional + array of integers or booleans to slice trajectory; cannot be + combined with ``start``, ``stop``, ``step`` + + .. versionadded:: 2.2.0 + + Raises + ------ + ValueError + if *both* `frames` and at least one of ``start``, ``stop``, or + ``frames`` is provided (i.e., set to another value than ``None``) + + + .. versionchanged:: 1.0.0 + Added .frames and .times arrays as attributes + + .. versionchanged:: 2.2.0 + Added ability to iterate through trajectory by passing a list of + frame indices in the `frames` keyword argument + + .. versionchanged:: 2.8.0 + Split function into two: :meth:`_define_run_frames` and + :meth:`_prepare_sliced_trajectory`: first one defines the limits + for the whole run and is executed once during :meth:`run` in + :meth:`_setup_frames`, second one prepares sliced trajectory for + each of the workers and gets executed twice: one time in + :meth:`_setup_frames` for the whole trajectory, second time in + :meth:`_compute` for each of the computation groups. + """ + slicer = self._define_run_frames(trajectory, start, stop, step, frames) + self._prepare_sliced_trajectory(slicer) + + def _single_frame(self): + """Calculate data from a single frame of trajectory + + Don't worry about normalising, just deal with a single frame. + Attributes accessible during your calculations: + + - ``self._frame_index``: index of the frame in results array + - ``self._ts`` -- Timestep instance + - ``self._sliced_trajectory`` -- trajectory that you're iterating over + - ``self.results`` -- :class:`MDAnalysis.analysis.results.Results` instance + holding run results initialized in :meth:`_prepare`. + """ + raise NotImplementedError("Only implemented in child classes") + + def _prepare(self): + """ + Set things up before the analysis loop begins. + + Notes + ----- + ``self.results`` is initialized already in :meth:`self.__init__` with an + empty instance of :class:`MDAnalysis.analysis.results.Results` object. + You can still call your attributes as if they were usual ones, + ``Results`` just keeps track of that to be able to run a proper + aggregation after a parallel run, if necessary. + """ + pass # pylint: disable=unnecessary-pass + + def _conclude(self): + """Finalize the results you've gathered. + + Called at the end of the :meth:`run` method to finish everything up. + + Notes + ----- + Aggregation of results from individual workers happens in + :meth:`self.run()`, so here you have to implement everything as if you + had a non-parallel run. If you want to enable proper aggregation for + parallel runs for you analysis class, implement ``self._get_aggregator`` + and check :mod:`MDAnalysis.analysis.results` for how to use it. + """ + pass # pylint: disable=unnecessary-pass + + def _compute( + self, + indexed_frames: np.ndarray, + verbose: bool = None, + *, + progressbar_kwargs={}, + ) -> "AnalysisBase": + """Perform the calculation on a balanced slice of frames + that have been setup prior to that using _setup_computation_groups() + + Parameters + ---------- + indexed_frames : np.ndarray + np.ndarray of (n, 2) shape, where first column is frame iteration + indices and second is frame numbers + + verbose : bool, optional + Turn on verbosity + + progressbar_kwargs : dict, optional + ProgressBar keywords with custom parameters regarding progress bar + position, etc; see :class:`MDAnalysis.lib.log.ProgressBar` + for full list. + + + .. versionadded:: 2.8.0 + """ + logger.info("Choosing frames to analyze") + # if verbose unchanged, use class default + verbose = ( + getattr(self, "_verbose", False) if verbose is None else verbose + ) + + frames = indexed_frames[:, 1] + + logger.info("Starting preparation") + self._prepare_sliced_trajectory(slicer=frames) + self._prepare() + if len(frames) == 0: # if `frames` were empty in `run` or `stop=0` + return self + + for idx, ts in enumerate( + ProgressBar( + self._sliced_trajectory, verbose=verbose, **progressbar_kwargs + ) + ): + self._frame_index = idx # accessed later by subclasses + self._ts = ts + self.frames[idx] = ts.frame + self.times[idx] = ts.time + self._single_frame() + logger.info("Finishing up") + return self + + def _setup_computation_groups( + self, + n_parts: int, + start: int = None, + stop: int = None, + step: int = None, + frames: Union[slice, np.ndarray] = None, + ) -> list[np.ndarray]: + """ + Splits the trajectory frames, defined by ``start/stop/step`` or + ``frames``, into ``n_parts`` even groups, preserving their indices. + + Parameters + ---------- + n_parts : int + number of parts to split the workload into + start : int, optional + start frame + stop : int, optional + stop frame + step : int, optional + step size for analysis (1 means to read every frame) + frames : array_like, optional + array of integers or booleans to slice trajectory; ``frames`` can + only be used *instead* of ``start``, ``stop``, and ``step``. Setting + *both* ``frames`` and at least one of ``start``, ``stop``, ``step`` + to a non-default value will raise a :exc:`ValueError`. + + Raises + ------ + ValueError + if *both* ``frames`` and at least one of ``start``, ``stop``, or + ``frames`` is provided (i.e., set to another value than ``None``) + + Returns + ------- + computation_groups : list[np.ndarray] + list of (n, 2) shaped np.ndarrays with frame indices and numbers + + + .. versionadded:: 2.8.0 + """ + if frames is None: + start, stop, step = self._trajectory.check_slice_indices( + start, stop, step + ) + used_frames = np.arange(start, stop, step) + elif not all(opt is None for opt in [start, stop, step]): + raise ValueError("start/stop/step cannot be combined with frames") + else: + used_frames = frames + + if all(isinstance(obj, bool) for obj in used_frames): + arange = np.arange(len(used_frames)) + used_frames = arange[used_frames] + + # similar to list(enumerate(frames)) + enumerated_frames = np.vstack( + [np.arange(len(used_frames)), used_frames] + ).T + if len(enumerated_frames) == 0: + return [np.empty((0, 2), dtype=np.int64)] + elif len(enumerated_frames) < n_parts: + # Issue #4685 + n_parts = len(enumerated_frames) + warnings.warn( + f"Set `n_parts` to {n_parts} to match the total " + "number of frames being analyzed" + ) + + return np.array_split(enumerated_frames, n_parts) + + def _configure_backend( + self, + backend: Union[str, BackendBase], + n_workers: int, + unsupported_backend: bool = False, + ) -> BackendBase: + """Matches a passed backend string value with class attributes + :attr:`parallelizable` and :meth:`get_supported_backends()` + to check if downstream calculations can be performed. + + Parameters + ---------- + backend : Union[str, BackendBase] + backend to be used: + - ``str`` is matched to a builtin backend (one of "serial", + "multiprocessing" and "dask") + - ``BackendBase`` subclass is checked for the presence of + an :meth:`apply` method + n_workers : int + positive integer with number of workers (processes, in case of + built-in backends) to split the work between + unsupported_backend : bool, optional + if you want to run your custom backend on a parallelizable class + that has not been tested by developers, by default ``False`` + + Returns + ------- + BackendBase + instance of a ``BackendBase`` class that will be used for computations + + Raises + ------ + ValueError + if :attr:`parallelizable` is set to ``False`` but backend is + not ``serial`` + ValueError + if :attr:`parallelizable` is ``True`` and custom backend instance is used + without specifying ``unsupported_backend=True`` + ValueError + if your trajectory has associated parallelizable transformations + but backend is not serial + ValueError + if ``n_workers`` was specified twice -- in the run() method and durin + ``__init__`` of a custom backend + ValueError + if your backend object instance doesn't have an ``apply`` method + + + .. versionadded:: 2.8.0 + """ + builtin_backends = { + "serial": BackendSerial, + "multiprocessing": BackendMultiprocessing, + "dask": BackendDask, + } + + backend_class = builtin_backends.get(backend, backend) + supported_backend_classes = [ + builtin_backends.get(b) for b in self.get_supported_backends() + ] + + # check for serial-only classes + if not self.parallelizable and backend_class is not BackendSerial: + raise ValueError(f"Can not parallelize class {self.__class__}") + + # make sure user enabled 'unsupported_backend=True' for custom classes + if ( + not unsupported_backend + and self.parallelizable + and backend_class not in supported_backend_classes + ): + raise ValueError( + ( + f"Must specify 'unsupported_backend=True'" + f"if you want to use a custom {backend_class=} for {self.__class__}" + ) + ) + + # check for the presence of parallelizable transformations + if backend_class is not BackendSerial and any( + not t.parallelizable for t in self._trajectory.transformations + ): + raise ValueError( + ( + "Trajectory should not have " + "associated unparallelizable transformations" + ) + ) + + # conclude mapping from string to backend class if it's a builtin backend + if isinstance(backend, str): + return backend_class(n_workers=n_workers) + + # make sure we haven't specified n_workers twice + if ( + isinstance(backend, BackendBase) + and n_workers is not None + and hasattr(backend, "n_workers") + and backend.n_workers != n_workers + ): + raise ValueError( + ( + f"n_workers specified twice: in {backend.n_workers=}" + f"and in run({n_workers=}). Remove it from run()" + ) + ) + + # or pass along an instance of the class itself + # after ensuring it has apply method + if not isinstance(backend, BackendBase) or not hasattr( + backend, "apply" + ): + raise ValueError( + ( + f"{backend=} is invalid: should have 'apply' method " + "and be instance of MDAnalysis.analysis.backends.BackendBase" + ) + ) + return backend + + def run( + self, + start: int = None, + stop: int = None, + step: int = None, + frames: Iterable = None, + verbose: bool = None, + n_workers: int = None, + n_parts: int = None, + backend: Union[str, BackendBase] = None, + *, + unsupported_backend: bool = False, + progressbar_kwargs=None, + ): + """Perform the calculation + + Parameters + ---------- + start : int, optional + start frame of analysis + stop : int, optional + stop frame of analysis + step : int, optional + number of frames to skip between each analysed frame + frames : array_like, optional + array of integers or booleans to slice trajectory; ``frames`` can + only be used *instead* of ``start``, ``stop``, and ``step``. Setting + *both* ``frames`` and at least one of ``start``, ``stop``, ``step`` + to a non-default value will raise a :exc:`ValueError`. + + .. versionadded:: 2.2.0 + verbose : bool, optional + Turn on verbosity + + progressbar_kwargs : dict, optional + ProgressBar keywords with custom parameters regarding progress bar + position, etc; see :class:`MDAnalysis.lib.log.ProgressBar` + for full list. Available only for ``backend='serial'`` + backend : Union[str, BackendBase], optional + By default, performs calculations in a serial fashion. + Otherwise, user can choose a backend: ``str`` is matched to a + builtin backend (one of ``serial``, ``multiprocessing`` and + ``dask``), or a :class:`MDAnalysis.analysis.results.BackendBase` + subclass. + + .. versionadded:: 2.8.0 + n_workers : int + positive integer with number of workers (processes, in case of + built-in backends) to split the work between + + .. versionadded:: 2.8.0 + n_parts : int, optional + number of parts to split computations across. Can be more than + number of workers. + + .. versionadded:: 2.8.0 + unsupported_backend : bool, optional + if you want to run your custom backend on a parallelizable class + that has not been tested by developers, by default False + + .. versionadded:: 2.8.0 + + + .. versionchanged:: 2.2.0 + Added ability to analyze arbitrary frames by passing a list of + frame indices in the `frames` keyword argument. + + .. versionchanged:: 2.5.0 + Add `progressbar_kwargs` parameter, + allowing to modify description, position etc of tqdm progressbars + + .. versionchanged:: 2.8.0 + Introduced ``backend``, ``n_workers``, ``n_parts`` and + ``unsupported_backend`` keywords, and refactored the method logic to + support parallelizable execution. + """ + # default to serial execution + backend = "serial" if backend is None else backend + + progressbar_kwargs = ( + {} if progressbar_kwargs is None else progressbar_kwargs + ) + if (progressbar_kwargs or verbose) and not ( + backend == "serial" or isinstance(backend, BackendSerial) + ): + raise ValueError( + "Can not display progressbar with non-serial backend" + ) + + if self._streamed: + if backend != "serial": + raise ValueError( + "Can not run streamed analysis with non-serial backend" + ) + if frames is not None: + raise ValueError( + "Can not run streamed analysis with frames argument" + ) + if start is not None or stop is not None: + raise ValueError( + "Can not run streamed analysis with start/stop arguments" + ) + self._streamed_run( + step=step, + verbose=verbose, + progressbar_kwargs=progressbar_kwargs, + ) + return self + + # if number of workers not specified, try getting the number from + # the backend instance if possible, or set to 1 + if n_workers is None: + n_workers = ( + backend.n_workers + if isinstance(backend, BackendBase) + and hasattr(backend, "n_workers") + else 1 + ) + + # set n_parts and check that is has a reasonable value + n_parts = n_workers if n_parts is None else n_parts + + # do this as early as possible to check client parameters + # before any computations occur + executor = self._configure_backend( + backend=backend, + n_workers=n_workers, + unsupported_backend=unsupported_backend, + ) + if ( + hasattr(executor, "n_workers") and n_parts < executor.n_workers + ): # using executor's value here for non-default executors + warnings.warn( + ( + f"Analysis not making use of all workers: " + f"{executor.n_workers=} is greater than {n_parts=}" + ) + ) + + # start preparing the run + worker_func = partial( + self._compute, + progressbar_kwargs=progressbar_kwargs, + verbose=verbose, + ) + self._setup_frames( + trajectory=self._trajectory, + start=start, + stop=stop, + step=step, + frames=frames, + ) + + computation_groups = self._setup_computation_groups( + start=start, stop=stop, step=step, frames=frames, n_parts=n_parts + ) + + # get all results from workers in other processes. + # we need `AnalysisBase` classes + # since they hold `frames`, `times` and `results` attributes + remote_objects: list["AnalysisBase"] = executor.apply( + worker_func, computation_groups + ) + self.frames = np.hstack([obj.frames for obj in remote_objects]) + self.times = np.hstack([obj.times for obj in remote_objects]) + + # aggregate results from results obtained in remote workers + remote_results = [obj.results for obj in remote_objects] + results_aggregator = self._get_aggregator() + self.results = results_aggregator.merge(remote_results) + + self._conclude() + return self + + def _get_aggregator(self) -> ResultsGroup: + """Returns a default aggregator that takes entire results + if there is a single object, and raises ValueError otherwise + + Returns + ------- + ResultsGroup + aggregating object + + + .. versionadded:: 2.8.0 + """ + return ResultsGroup(lookup=None) + + def _streamed_run(self, step=None, verbose=False, progressbar_kwargs={}): + + self._sliced_trajectory = ( + self._trajectory[::step] if step is not None else self._trajectory + ) + self._prepare() + self.frames = [] + self.times = [] + + for idx, ts in enumerate( + ProgressBar( + self._sliced_trajectory, + verbose=verbose, + total=float("inf"), + **progressbar_kwargs, + ) + ): + self._frame_index = idx # accessed later by subclasses + self._ts = ts + self.frames.append(ts.frame) + self.times.append(ts.time) + self._single_frame() + + logger.info("Finishing up") + self.frames = np.array(self.frames) + self.times = np.array(self.times) + self._conclude() + + +class StackableAnalysis(AnalysisBase): + + def __init__(self, trajectory, analyses, verbose=False, **kwargs): + super().__init__(trajectory, verbose=verbose, **kwargs) + self._analyses = analyses + if len(self._analyses) == 0: + raise ValueError("No analyses provided") + + for analysis in self._analyses: + if analysis._trajectory is not self._trajectory: + raise ValueError("All analyses must use the same trajectory") + + def _compute(self, indexed_frames, verbose=None, progressbar_kwargs={}): + logger.info("Choosing frames to analyze") + # if verbose unchanged, use class default + verbose = ( + getattr(self, "_verbose", False) if verbose is None else verbose + ) + + frames = indexed_frames[:, 1] + + logger.info("Starting preparation") + self._prepare_sliced_trajectory(slicer=frames) + for analysis in self._analyses: + analysis.frames = self.frames + analysis.times = self.times + self._prepare() + + if len(frames) == 0: # if `frames` were empty in `run` or `stop=0` + return self + + for idx, ts in enumerate( + ProgressBar( + self._sliced_trajectory, verbose=verbose, **progressbar_kwargs + ) + ): + + self._frame_index = idx # accessed later by subclasses + self._ts = ts + self.frames[idx] = ts.frame + self.times[idx] = ts.time + for analysis in self._analyses: + analysis._ts = ts + analysis._frame_index = self._frame_index + self._single_frame() + + self._conclude() + + logger.info("Finishing up") + return self + + def _single_frame(self): + for analysis in self._analyses: + analysis._single_frame() + + def _prepare(self): + for analysis in self._analyses: + analysis._prepare() + + def _conclude(self): + for analysis in self._analyses: + analysis._conclude() + + def _streamed_run(self, step=None, verbose=False, progressbar_kwargs={}): + self._sliced_trajectory = ( + self._trajectory[::step] if step is not None else self._trajectory + ) + self._prepare() + self.frames = [] + self.times = [] + + for analysis in self._analyses: + analysis.frames = self.frames + analysis.times = self.times + + for idx, ts in enumerate( + ProgressBar( + self._sliced_trajectory, + verbose=verbose, + total=float("inf"), + **progressbar_kwargs, + ) + ): + self._frame_index = idx # accessed later by subclasses + self._ts = ts + self.frames.append(ts.frame) + self.times.append(ts.time) + for analysis in self._analyses: + analysis._ts = ts + analysis._frame_index = self._frame_index + self._single_frame() + + logger.info("Finishing up") + self.frames = np.array(self.frames) + self.times = np.array(self.times) + self._conclude() diff --git a/imdclient/streambase.py b/imdclient/streambase.py index a8e1a7a..ddc04f9 100644 --- a/imdclient/streambase.py +++ b/imdclient/streambase.py @@ -151,27 +151,38 @@ def __setstate__(self, state: object): ) -class StreamFrameIteratorSliced: +class StreamFrameIteratorSliced(FrameIteratorBase): - def __init__(self, reader, step): - self._reader = reader + def __init__(self, trajectory, step): + super().__init__(trajectory) self._step = step - self._idx = 0 def __iter__(self): # Calling reopen tells reader # it can't be reopened again - self._reader._reopen() + self.trajectory._reopen() return self def __next__(self): try: # Burn the timesteps until we reach the desired step - while self._idx % self._step != 0: - self._idx += 1 - self._reader._read_next_timestep() + # Don't use next() to avoid unnecessary transformations + while self.trajectory._frame % self.step != 0: + self.trajectory_read_next_timestep() except (EOFError, IOError): - raise StopIteration - else: - self._idx += 1 - return self._reader.next() + # Don't rewind here like we normally would + raise StopIteration from None + + return self.trajectory.next() + + def __len__(self): + raise RuntimeError( + "{} has unknown length".format(self.__class__.__name__) + ) + + def __getitem__(self, frame): + raise RuntimeError("Sliced iterator does not support indexing") + + @property + def step(self): + return self._step diff --git a/imdclient/tests/test_imdreader.py b/imdclient/tests/test_imdreader.py index 60ac64b..15a0fc9 100644 --- a/imdclient/tests/test_imdreader.py +++ b/imdclient/tests/test_imdreader.py @@ -635,3 +635,12 @@ def test_iterate_start_stop_raises_error(self, reader): with pytest.raises(ValueError): for ts in reader[1:3]: pass + + def test_subslice_fi_all_after_iteration_raises_error(self, reader): + sliced_reader = reader[:] + for ts in sliced_reader: + pass + sub_sliced_reader = sliced_reader[::1] + with pytest.raises(RuntimeError): + for ts in sub_sliced_reader: + pass diff --git a/imdclient/tests/test_stream_analysis.py b/imdclient/tests/test_stream_analysis.py new file mode 100644 index 0000000..365c45b --- /dev/null +++ b/imdclient/tests/test_stream_analysis.py @@ -0,0 +1,78 @@ +import imdclient +from MDAnalysisTests.datafiles import ( + COORDINATES_TOPOLOGY, + COORDINATES_TRR, + COORDINATES_H5MD, +) +import MDAnalysis as mda +from .utils import ( + get_free_port, + create_default_imdsinfo_v3, +) + +# NOTE: replaceme with imdclient.tests.server +from .server import InThreadIMDServer +from MDAnalysisTests.coordinates.base import ( + MultiframeReaderTest, + BaseReference, + BaseWriterTest, + assert_timestep_almost_equal, +) +from MDAnalysis.analysis.rms import RMSF +from numpy.testing import ( + assert_almost_equal, + assert_array_almost_equal, + assert_equal, + assert_allclose, +) +import numpy as np +import logging +import pytest +from MDAnalysis.transformations import translate +import pickle + +# NOTE: removeme +from imdclient.IMDREADER import IMDReader + + +class TestStreamAnalysis: + + @pytest.fixture + def port(self): + return get_free_port() + + @pytest.fixture + def universe(self): + return mda.Universe(COORDINATES_TOPOLOGY, COORDINATES_H5MD) + + @pytest.fixture + def imdsinfo(self): + return create_default_imdsinfo_v3() + + @pytest.fixture + def imd_universe(self, universe, imdsinfo, port): + server = InThreadIMDServer(universe.trajectory) + server.set_imdsessioninfo(imdsinfo) + server.handshake_sequence("localhost", port, first_frame=True) + + imd_universe = mda.Universe(COORDINATES_TOPOLOGY, f"localhost:{port}") + server.send_frames(1, 5) + + yield imd_universe + server.cleanup() + + def test_rmsf(self, imd_universe, universe): + imd_rmsf = RMSF(imd_universe.atoms).run() + rmsf = RMSF(universe.atoms).run() + + assert_almost_equal(imd_rmsf.results.rmsf, rmsf.results.rmsf) + + def test_stack_rmsf(self, imd_universe, universe): + r1 = RMSF(imd_universe.atoms) + r2 = RMSF(imd_universe.atoms) + imdclient.StackableAnalysis(imd_universe.trajectory, [r1, r2]).run() + + rmsf = RMSF(universe.atoms).run() + + assert_almost_equal(r1.results.rmsf, rmsf.results.rmsf) + assert_almost_equal(r2.results.rmsf, rmsf.results.rmsf) From d7f4acd7854bea3254503a7b40adfe937ef23dd7 Mon Sep 17 00:00:00 2001 From: Oliver Beckstein Date: Fri, 11 Oct 2024 14:36:58 -0700 Subject: [PATCH 12/24] reST fixes mainly using :ref: throughout --- docs/source/protocol_v3.rst | 38 ++++++++++++++++++------------------- 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/docs/source/protocol_v3.rst b/docs/source/protocol_v3.rst index d0a5b9b..5ef12f6 100644 --- a/docs/source/protocol_v3.rst +++ b/docs/source/protocol_v3.rst @@ -154,7 +154,7 @@ Disconnect Sent from the receiver to the simulation engine any time after the `Session info`_ has been sent to indicate that the simulation engine should -close the connected socket. Whether the simulation engine blocks execution until another connection is +close the connected socket. Whether the simulation engine pauses execution until another connection is made is an implementation decision. .. code-block:: none @@ -169,12 +169,12 @@ Energies ^^^^^^^^ Sent from the simulation engine to the receiver each IMD frame if -energies were previously specified for this session in `Session info`_. +energies were previously specified for this session in :ref:`session-info`. .. note:: While the integration step is included in this packet, this is a result of inheriting the IMD energy block from IMDv2. It is recommended - to make use of the 64-bit integer integration step value from the time packet + to make use of the 64-bit integer integration step value from the :ref:`time packet