{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "# Diffusion coefficient from a VASP file\n",
    "\n",
    "[Previously](./vasp_msd.html), we looked at obtaining accurate estimates for the mean-squared displacement with `kinisi`. \n",
    "Here, we show using the same class to evaluate the diffusion coefficient, using the `kinisi` [methodology](./methodology.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from kinisi.analyze import DiffusionAnalyzer\n",
    "from pymatgen.io.vasp import Xdatcar\n",
    "np.random.seed(42)\n",
    "rng = np.random.RandomState(42)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There the `p_params` dictionary describes details about the simulation, and are documented in the [parser module](./parser.html) (also see the [MSD Notebook](./vasp_msd.html))."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "p_params = {'specie': 'Li',\n",
    "            'time_step': 2.0,\n",
    "            'step_skip': 50}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "While the `b_params` dictionary describes the details of the bootstrapping process, these are documented in the [diffusion module](./diffusion.html#kinisi.diffusion.MSDBootstrap). Here, we indicate that we only want to investigate diffusion in the *xy*-plane of the system."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "u_params = {'dimension': 'xy'}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "xd = Xdatcar('./example_XDATCAR.gz')\n",
    "diff = DiffusionAnalyzer.from_Xdatcar(xd, parser_params=p_params, uncertainty_params=u_params)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the above cells, we parse and determine the uncertainty on the mean-squared displacement as a function of the timestep. \n",
    "We should visualise this, to check that we are observing diffusion in our material and to determine the timescale at which this diffusion begins. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.errorbar(diff.dt, diff.msd, diff.msd_std)\n",
    "plt.ylabel('MSD/Å$^2$')\n",
    "plt.xlabel('$\\Delta t$/ps')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can visualise this on a log-log scale, which helps to reveal the diffusive regime (the region where the gradient stops changing)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.errorbar(diff.dt, diff.msd, diff.msd_std)\n",
    "plt.axvline(2, color='g')\n",
    "plt.ylabel('MSD/Å$^2$')\n",
    "plt.xlabel('$\\Delta t$/ps')\n",
    "plt.yscale('log')\n",
    "plt.xscale('log')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The green line at 2 ps appears to be a reasonable estimate of the start of the diffusive regime. \n",
    "Therefore, we want to pass `2` as the argument to the diffusion analysis below. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "diff.diffusion(2, diffusion_params={'random_state': rng})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This method estimates the correlation matrix between the timesteps and uses likelihood sampling to find the diffusion coefficient, $D$ and intercept (the units are cm\\ :sup:`2`\\ /s and Å\\ :sup:`2`  respectively).\n",
    "Now we can probe these objects.\n",
    "\n",
    "We can get the median and 95 % confidence interval using, "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "diff.D.n, diff.D.ci()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "diff.intercept.n, diff.intercept.ci()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or we can get all of the sampled values from one of these objects. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "diff.D.samples"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "Having completed the analysis, we can save the object for use later (such as downstream by a plotting script). "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbsphinx": "hidden",
    "tags": []
   },
   "source": [
    "These hidden cells exist to remove any existing `example.hdf` file on builiding."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "nbsphinx": "hidden",
    "tags": []
   },
   "outputs": [],
   "source": [
    "!rm example.hdf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "diff.save('example.hdf')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data is saved in a HDF5 file format which helps us efficiently store our data. \n",
    "We can then load the data with the following class method. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "loaded_diff = DiffusionAnalyzer.load('example.hdf')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can plot the data with the credible intervals from the $D$ and $D_{\\text{offset}}$ distribution."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "credible_intervals = [[16, 84], [2.5, 97.5], [0.15, 99.85]]\n",
    "alpha = [0.6, 0.4, 0.2]\n",
    "\n",
    "plt.plot(loaded_diff.dt, loaded_diff.msd, 'k-')\n",
    "for i, ci in enumerate(credible_intervals):\n",
    "    plt.fill_between(loaded_diff.dt,\n",
    "                     *np.percentile(loaded_diff.distribution, ci, axis=1),\n",
    "                     alpha=alpha[i],\n",
    "                     color='#0173B2',\n",
    "                     lw=0)\n",
    "plt.ylabel('MSD/Å$^2$')\n",
    "plt.xlabel('$\\Delta t$/ps')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Additionally, we can visualise the distribution of the diffusion coefficient that has been determined."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.hist(loaded_diff.D.samples, density=True)\n",
    "plt.axvline(loaded_diff.D.n, c='k')\n",
    "plt.xlabel('$D$/cm$^2$s$^{-1}$')\n",
    "plt.ylabel('$p(D$/cm$^2$s$^{-1})$')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or the joint probability distribution for the diffusion coefficient and intercept."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from corner import corner"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "corner(loaded_diff.flatchain, labels=['$D$/cm$^2$s$^{-1}$', 'intercept/Å$^2$'])\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Above, the posterior distribution is plotted as a function of $\\Delta t$.\n",
    "`kinisi` can also compute the [posterior predictive distribution](https://en.wikipedia.org/wiki/Posterior_predictive_distribution) of mean-squared displacement data that would be expected from simulations, given the set of linear models for the _ensemble_ MSD described by the posterior distribution, above.\n",
    "Calculating this posterior predictive distribution involves an additional sampling procedure that samples probable simulated MSDs for each of the probable linear models in the (previously sampled) posterior distribution.\n",
    "The posterior predictive distribution captures uncertainty in the true ensemble MSD and uncertainty in the simulated MSD data expected for each ensemble MSD model, and so has a larger variance than the ensemble MSD posterior distribution."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ppd = loaded_diff.posterior_predictive({'n_posterior_samples': 128,\n",
    "                                        'progress': False})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.plot(loaded_diff.dt, loaded_diff.msd, 'k-')\n",
    "for i, ci in enumerate(credible_intervals):\n",
    "    plt.fill_between(loaded_diff.dt[14:],\n",
    "                     *np.percentile(ppd.T, ci, axis=1),\n",
    "                     alpha=alpha[i],\n",
    "                     color='#0173B2',\n",
    "                     lw=0)\n",
    "plt.ylabel('MSD/Å$^2$')\n",
    "plt.xlabel('$\\Delta t$/ps')\n",
    "plt.show()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}