Consider a physical system consisting of $k$ massive particles at positions in $\mathbb{R}^3$ which we can assemble in a vector in $\mathbf{q}\in\mathbb{R}^m$ where $m=3k$. Likewise, the velocities of these particles at a given instant of time can be described by a vector $\mathbf{v}\in\mathbb{R}^m$. The corresponding linear momentum then is the vector
$$\mathbf{p}=M\mathbf{v}$$
where $M\in \mathbb{R}^{m\times m}$ is a diagonal matrix with entries (partitioned into triples of equal consecutive entries) that record the masses of the particles. At each instant of time the state of the system is then given by a vector
$$\left(\begin{array}{c}\mathbf{p}\\\mathbf{q}\end{array}\right)\in \mathbf{R}^{2m}.$$
The evolution of the system is described by a smooth map
$$\mathbf{q}:\mathbb{R}\to\mathbb{R}^m.$$
Suppose that the only forces acting on the particles come from a fixed potential energy function $U:\mathbb{R}^m\to \mathbb{R}$, so the potential energy of the system in position $\mathbf{q}$ is $U(\mathbf{q})$. Then Newton’s laws tell us that the position $\mathbf{q}(t), t\in [a,b]$ evolves according to the equation of motion
$$M\ddot{\mathbf{q}} = -\mbox{grad}_{\mathbf{q}}\,U$$
For the state vector $\left(\begin{array}{c}\mathbf{p}\\\mathbf{q}\end{array}\right)$ this amounts to the first order differential equation
$$\left(\begin{array}{c}\mathbf{p} \\ \mathbf{q}\end{array}\right)^{\!\small\bullet}=\left(\begin{array}{c}-\mbox{grad}_{\mathbf{q}}\,U \\ M^{-1}\,\mathbf{p}\end{array}\right).$$
We view the right-hand side as a vector field $V:\mathbb{R}^{2m}\to \mathbb{R}^{2m}$ with
$$V\left(\begin{array}{c}\mathbf{p} \\ \mathbf{q}\end{array}\right)=\left(\begin{array}{c}-\mbox{grad}_{\mathbf{q}}\,U \\ M^{-1}\,\mathbf{p}\end{array}\right).$$
This particular vector field has striking special properties, but before discussing these let us switch to a more general context: Suppose $n\in \mathbb{N}$ and let $V:\mathbb{R}^n\to\mathbb{R}^n$ be a smooth vector field. Suppose that for all $\mathbb{x}\in \mathbb{R}^n$ the initial value problem
\begin{align*}\gamma(0) &=x \\\\ \dot{\gamma}(t)&=V_{\gamma(t)}\end{align*}
has a solution $\gamma:\mathbb{R}\to\mathbb{R}^n$. Then for all $t\in \mathbb{R}$ we can define the flow maps
$$g_t:\mathbb{R}^n\to \mathbb{R}^n$$
by the property that
$$g_0 =\textrm{id}_{\mathbb{R}^n}$$
and for all $x\in \mathbb{R}^n$ and $t\in \mathbb{R}$ we have
$$\frac{d}{d\tau}{\large|_{\tau=t}}\,\,g_\tau(x)=V_{g_t(x)}.$$
Now we are ready to give the definitions needed for understanding what is special about the vector field $V$ that we started with. We begin with some Linear Algebra:
Let $J$ denote the following $2m\times 2m$ matrix:
$$J=\left(\begin{array}{c} 0 & -I \\ I & 0\end{array}\right).$$
Definition 1: The skew-symmetric bilinear form
\begin{align*}\sigma:\mathbb{R}^{2m} \times \mathbb{R}^{2m}\to \mathbb{R} \\\\ \sigma(\mathbf{u},\mathbf{v})=\langle J\mathbf{u},\mathbf{v}\rangle = -\mathbf{u}^t J \mathbf{v}\end{align*}
is called the standard symplectic form on $\mathbb{R}^{2m}$.
Definition 2: A matrix $A\in \mathbb{R}^{2m\times 2m}$ is called symplectic if for all $\mathbf{u},\mathbf{v}\in\mathbb{R}^{2m}$ we have
$$\sigma(A\mathbf{u},A\mathbf{v})=\sigma(\mathbf{u},\mathbf{v})$$
or equivalently, if
$$A^t JA=J.$$
Definition 3: A matrix $Y\in \mathbb{R}^{2m\times 2m}$ is called infinitesimally symplectic if for all $\mathbf{u},\mathbf{v}\in\mathbb{R}^{2m}$ we have
$$\sigma(Y\mathbf{u},\mathbf{v})+\sigma(\mathbf{u},Y\mathbf{v})=0$$
or equivalently, if
$$Y^tJ+JY=0.$$
The symplectic $2m\times 2m$ matrices form a group $\mbox{SP}(m,\mathbb{R})$ whose Lie algebra is the vector space $\mbox{sp}(m,\mathbb{R})$ of all infinitesimally symplectic matrices. The only thing we will need is the following
Theorem 1:
(a) Let $A:(-\epsilon,\epsilon)\to\mbox{SP}(m,\mathbb{R})$ be a smooth map. Then for all $t\in (-\epsilon,\epsilon)$ the matrix $A(t)$ is invertible and the matrix-valued function $Y:=\dot{A}A^{-1}$ takes values in $\mbox{sp}(m,\mathbb{R})$.
(b) Let $Y:(-\epsilon,\epsilon)\to\mbox{sp}(m,\mathbb{R})$ be a smooth map. Then there is a unique smooth map $A:(-\epsilon,\epsilon)\to\mbox{SP}(m,\mathbb{R})$ such that $A(0)=I$ and $\dot{A}A^{-1}=Y$.
$$\dot{A}=YA.$$
Proof: If $A^tJA=J$ for all $t$ then $\det A(t) \neq 0$ for all $t$ and differentiation gives us
$$0=\dot{A}^t J A +A^t J \dot{A}=A^t (Y^tJ+JY)A.$$
This proves (a). For (b) let $A:(-\epsilon,\epsilon)\to \mathbb{R}^{2m \times 2m}$ be the unique solution of $\dot{A}=YA$ with the initial condition $A(0)=A_0$. Then the matrix-valued function $B=A^tJA-A$ satisfies $B(0)=0$ and
$$\dot{B}=A^t(Y^tJ+JY)A=0.$$
$\blacksquare$
Equipped with the necessary Linear Algebra we now can move on to Differential Geometry. Let $M\subset \mathbb{R}^{2m}$ be a compact domain with smooth boundary.
Definition 2: A diffeomorphism map $g:\mathbb{R}^{2m} \to \mathbb{R}^{2m}$ is called symplectic if for all $x\in M$ its derivative of $g'(x)$ is symplectic.
Definition 3: A vector field $V:\mathbb{R}^{2m}\to \mathbb{R}^{2m}$ is called infinitesimally symplectic if for all $x\in M$ its derivative §V'(x) is infinitesimally symplectic.
Theorem 2:
(a) Let $g_t:\mathbb{R}^{2m}\to\mathbb{R}^{2m}$ for $t\in (-\epsilon,\epsilon)$ be a smooth family of symplectic diffeomorphisms . Then for all $t\in (-\epsilon,\epsilon)$ the vector field $V_t:\mathbb{R}^{2m}\to\mathbb{R}^{2m}$ given by
$$V_t(g_t(x))=\frac{\partial}{\partial t}g_t(x)$$
is infinitesimally symplectic.
(b) Let $V_t:\mathbb{R}^{2m}\to\mathbb{R}^{2m}$ for $t\in (-\epsilon,\epsilon)$ be a smooth family of infinitesimally symplectic Lipschitz vector fields . Then there is a unique smooth family of symplectic diffeomorphisms $g_t:\mathbb{R}^{2m}\to\mathbb{R}^{2m}$ such that $g_0=\mbox{id}_{\mathbb{R}^{2m}}$ and $g_t$ is related to $V_t$ by in the same way as in (a).
Proof: Let $g_t$ and $V_t$ be as described under (a). Chose $\mathbf{x} \in \mathbb{R}^{2m}$. Define $A: (-\epsilon,\epsilon)\to \mbox{SP}(m,\mathbb{R})$ by
$$A(t)=g_t'(x).$$
and $Y: (-\epsilon,\epsilon)\to\mathbb{R}^{2m\times 2m}$ by
$$Y(t)=V'(g_t(x)).$$
Then for all $\mathbf{u}\in \mathbb{R}^n$
\begin{align*}Y(t)A(t)\mathbf{u} &=V_t'(g_t(x))g_t'(x)\mathbf{u} \\\\&=\frac{\partial}{\partial s}{\large|_{s=0}}\,\,V_t(g_t(\mathbf{x}+s\mathbf{u})) \\\\ &=\frac{\partial}{\partial s}{\large|_{s=0}}\,\,\frac{\partial}{\partial t}{\large|_{\tau=t}}\,\,g_\tau(\mathbf{x}+s\mathbf{u})\\\\ &=\frac{\partial}{\partial t}{\large|_{\tau=t}}\,\,\frac{\partial}{\partial s}{\large|_{s=0}}\,\,g_\tau(\mathbf{x}+s\mathbf{u}) \\\\&=\frac{\partial}{\partial t}{\large|_{\tau=t}}\,\,g_\tau'(\mathbf{x})\mathbf{u}\\\\&=\dot{A}(t)\mathbf{u}.\end{align*}
Our present claim (a) now follows from part (a) of Theorem 1. For (b) note that the existence and uniqueness of a smooth family of diffeomorphisms $g_t$ is a consequence of the Picard-Lindelöf theorem for ordinary differential equations. We have only to show that all $g_t$ are symplectic. This in turn follows from the same calculation as above and part (a) of Theorem 1.
$\blacksquare$
Let us now come back to Newton’s equations of motion from the beginning of this post.
Theorem 3: Let $U:\mathbb{R}^m \to \mathbb{R}$ be a smooth function and $M$ an invertible symmetric $m\times m$ – matrix. Then the vector field $V:\mathbb{R}^{2m}\to \mathbb{R}^{2m}$ defines as
$$V\left(\begin{array}{c}\mathbf{p} \\ \mathbf{q}\end{array}\right)=\left(\begin{array}{c}-\mbox{grad}_{\mathbf{q}}\,U \\ M^{-1}\,\mathbf{p}\end{array}\right)$$
is infinitesimally symplectic.
Proof: We have
$$V’\left(\begin{array}{c}\mathbf{p} \\ \mathbf{q}\end{array}\right)=\left(\begin{array}{cc}0 & -\mbox{Hess}_{\mathbf{q}}\,U \\ M^{-1} & 0\end{array}\right).$$
Since both $\mbox{Hess}_{\mathbf{q}}\,U$ and $M^{-1}$ are symmetric matrices, we obtain our desired conclusion:
$$\left(V’\left(\begin{array}{c}\mathbf{p} \\ \mathbf{q}\end{array}\right) \right)^t J+JV’\left(\begin{array}{c}\mathbf{p} \\ \mathbf{q}\end{array}\right)=0.$$
$\blacksquare$