{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Handling a poorly conditioned covariance matrix\n",
    "\n",
    "Sometimes, you may end up with a very clearly incorrect result from `kinisi`. \n",
    "For example, consider the MSD plot below. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import numpy as np\n",
    "import scipp as sc\n",
    "import kinisi\n",
    "from kinisi.analyze import DiffusionAnalyzer\n",
    "from ase.io import read\n",
    "from MDAnalysis import Universe\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "credible_intervals = [[16, 84], [2.5, 97.5], [0.15, 99.85]]\n",
    "alpha = [0.6, 0.4, 0.2]\n",
    "\n",
    "path_to_file = os.path.join(os.path.dirname(kinisi.__file__), \n",
    "                            'tests/inputs/LiPS.exyz')\n",
    "atoms = read(path_to_file, format='extxyz', index=':')\n",
    "\n",
    "cell_dimensions = []\n",
    "for frame in atoms:\n",
    "    lengths = frame.cell.lengths()\n",
    "    angles = frame.cell.angles()\n",
    "    cell = [*lengths, *angles]\n",
    "    cell_dimensions.append(cell)\n",
    "\n",
    "u = Universe(path_to_file, \n",
    "             path_to_file, \n",
    "             format='XYZ', \n",
    "             topology_format='XYZ', \n",
    "             dt=20.0/1000)\n",
    "for ts, dims in zip(u.trajectory, cell_dimensions):\n",
    "    ts.dimensions = dims\n",
    "\n",
    "params = {'specie': 'LI',\n",
    "          'time_step': 0.001 * sc.Unit('ps'),\n",
    "          'step_skip': 20 * sc.units.dimensionless,\n",
    "          'progress': False\n",
    "          }\n",
    "\n",
    "diff = DiffusionAnalyzer.from_universe(u, **params)\n",
    "diff.diffusion(1.5 * sc.Unit('ps'))\n",
    "\n",
    "fig, ax = plt.subplots()\n",
    "ax.plot(diff.dt.values, diff.msd.values, 'k-')\n",
    "for i, ci in enumerate(credible_intervals):\n",
    "    ax.fill_between(diff.dt.values,\n",
    "                      *np.percentile(diff.distributions, ci, axis=1),\n",
    "                      alpha=alpha[i],\n",
    "                      color='#0173B2',\n",
    "                      lw=0)\n",
    "ax.set_xlabel(f'Time / {diff.dt.unit}')\n",
    "ax.set_ylabel(f'MSD / {diff.msd.unit}')\n",
    "ax.set_xlim(0, None)\n",
    "ax.set_ylim(0, None)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Obviously something has gone wrong in this analysis. \n",
    "Unfortunately, this issue arises due to the numerical precision of computers, and there really isn't much that can be done to stop it. \n",
    "However, we can work around this issue. \n",
    "\n",
    "This problem arises due to the [condition number](https://en.wikipedia.org/wiki/Condition_number) of the covariance matrix being very high."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.linalg.cond(diff.diff.covariance_matrix.values)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This means that when the matrix is inverted, the result is very sensitive to small changes, on the scale of machine precision. \n",
    "\n",
    "`kinisi` tries its best to make sure that the matrix has a low enough condition number that this doesn't happen. \n",
    "But unfortunately, it is not always possible to achieve a low enough condition number. \n",
    "Therefore, we need to find a different solution. \n",
    "\n",
    "So far, the best solution that we have found has been to recondition the matrix in an effort to reduce the noise. \n",
    "We are working on a full assessment of the effectiveness of this approach and hope to have a preprint on the subject soon. \n",
    "To use the reconditioning, run. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "diff.diffusion(1.5 * sc.Unit('ps'), recondition=True)\n",
    "\n",
    "fig, ax = plt.subplots()\n",
    "ax.plot(diff.dt.values, diff.msd.values, 'k-')\n",
    "for i, ci in enumerate(credible_intervals):\n",
    "    ax.fill_between(diff.dt.values,\n",
    "                      *np.percentile(diff.distributions, ci, axis=1),\n",
    "                      alpha=alpha[i],\n",
    "                      color='#0173B2',\n",
    "                      lw=0)\n",
    "ax.set_xlabel(f'Time / {diff.dt.unit}')\n",
    "ax.set_ylabel(f'MSD / {diff.msd.unit}')\n",
    "ax.set_xlim(0, None)\n",
    "ax.set_ylim(0, None)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You will notice that this can significantly reduce the condition number. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.linalg.cond(diff.diff.covariance_matrix.values)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "However, this is not always successful. \n",
    "The next approach to resolve this issue is to change the time intervals, dt, that is used. \n",
    "Because we estimate the full covariance matrix, within reason, computing the MSD at every possible time interval is usually overkill. \n",
    "Instead, we can use a longer time interval sets, currently the time intervals are 0.02 ps apart."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "diff.dt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will reduce that so there is 0.1 ps between the time intervals, which we can do by setting a custom set of time intervals.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "new_dt = sc.arange(dim='time interval', start=0.1 * sc.Unit('ps'),\n",
    "                   stop=4.1 * sc.Unit('ps'),\n",
    "                   step=0.1 * sc.Unit('ps'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The new `dt` must be a subset of the old `dt` values, the cell below checks this. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(np.all(np.any(np.isclose(new_dt.values[:, None], \n",
    "                               diff.dt.values[None, :]), \n",
    "                    axis=1)))\n",
    "\n",
    "params['dt'] = new_dt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Running this through again, we can see that we get a much more reasonable value. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "diff = DiffusionAnalyzer.from_universe(u, **params)\n",
    "diff.diffusion(1.5 * sc.Unit('ps'))\n",
    "\n",
    "fig, ax = plt.subplots()\n",
    "ax.plot(diff.dt.values, diff.msd.values, 'k-')\n",
    "for i, ci in enumerate(credible_intervals):\n",
    "    ax.fill_between(diff.dt.values,\n",
    "                      *np.percentile(diff.distributions, ci, axis=1),\n",
    "                      alpha=alpha[i],\n",
    "                      color='#0173B2',\n",
    "                      lw=0)\n",
    "ax.set_xlabel(f'Time / {diff.dt.unit}')\n",
    "ax.set_ylabel(f'MSD / {diff.msd.unit}')\n",
    "ax.set_xlim(0, None)\n",
    "ax.set_ylim(0, None)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We welcome any user feedback on this problem, i.e., if you have a better way to mitigate it or remove it completely.  "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "kinisi-dev",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}