{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "bfc0c1b1-3769-4945-adaf-8336fc7117ba",
   "metadata": {},
   "source": [
    "# Comparison with pymatgen\n",
    "\n",
    "The `pymatgen` project also has [tools capable of calculating the mean-squared displacement and diffusion coefficient](https://pymatgen.org/addons#add-ons-for-analysis) from a relevant input. \n",
    "So why should you use `kinisi` over `pymatgen`?\n",
    "\n",
    "The simple answer is that the approach taken by `kinisi`, which is outlined in the [methodology](./methodology.html), uses a higher precision approach to estimate the diffusion coefficent and offers an accurate estimate in the variance of the mean-squared displacements and diffusion coefficient from a single simulation. \n",
    "\n",
    "In this notebook, we will compare the results from `pymatgen` and `kinisi`. \n",
    "First we will import the `kinisi` and `pymatgen` `DiffusionAnalyzer` classes. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9a4f9657-1ac8-4e7c-90d2-3970bce9b318",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from kinisi.analyze import DiffusionAnalyzer as KinisiDiffusionAnalyzer\n",
    "from pymatgen.analysis.diffusion.analyzer import DiffusionAnalyzer as PymatgenDiffusionAnalyzer\n",
    "from pymatgen.io.vasp import Xdatcar\n",
    "np.random.seed(42)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "29e4e68d-8032-4e41-bd6c-86624f663ba9",
   "metadata": {},
   "source": [
    "The `kinisi.DiffusionAnalyzer` API was based on the `pymatgen` equivalent, therefore, the two take the same inputs and can parse the `Xdatcar.structures`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ff13aa9f-fc85-484b-8b09-37932b45aec1",
   "metadata": {},
   "outputs": [],
   "source": [
    "p_params = {'specie': 'Li',\n",
    "            'time_step': 2.0,\n",
    "            'step_skip': 50\n",
    "            }"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "759490f5-26ed-47d2-9704-9d32465efbc1",
   "metadata": {},
   "outputs": [],
   "source": [
    "xd = Xdatcar('./example_XDATCAR.gz')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4fa3cfb3-bd45-407a-a0d7-6ad4ffe3e5a8",
   "metadata": {},
   "source": [
    "We can then run both the `pymagten` analysis and the `kinisi` analysis (the `pymatgen` requires and additional `temperature` keyword which is not used in this example). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fdb0a4aa-6ba4-48ab-9f65-b0ab2b8e15a4",
   "metadata": {},
   "outputs": [],
   "source": [
    "pymatgen_diff = PymatgenDiffusionAnalyzer.from_structures(\n",
    "    xd.structures, temperature=300, **p_params)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5d8cd6e2-11d4-4926-9119-a7d81da4f3f5",
   "metadata": {},
   "outputs": [],
   "source": [
    "p_params['progress'] = False\n",
    "u_params = {'progress': False}\n",
    "\n",
    "kinisi_diff = KinisiDiffusionAnalyzer.from_pymatgen(\n",
    "    xd.structures, parser_params=p_params, uncertainty_params=u_params)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cc65f185-3e64-40e9-8200-696f9366715a",
   "metadata": {},
   "source": [
    "Now we can plot the mean-squared displacement from each to check agreement, the `pymatgen` time units are femtoseconds so these are adjusted. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f3aa84e8-0876-42e3-92c0-28164ac2dd00",
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6c34728b-6aa1-495f-a2c5-31cba27b0b30",
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.plot(pymatgen_diff.dt / 1000, pymatgen_diff.msd, label='pymatgen')\n",
    "plt.plot(kinisi_diff.dt, kinisi_diff.msd, label='kinisi')\n",
    "plt.legend()\n",
    "plt.ylabel('MSD/Å$^2$')\n",
    "plt.xlabel(r'$\\Delta t$/ps')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3af71036-f616-416f-9a11-082135f00b9e",
   "metadata": {},
   "source": [
    "We can see that the results overlap almost entirely.\n",
    "\n",
    "However, this doesn't show the benefits for using `kinisi` over `pymatgen`. \n",
    "The first benefit is that `kinisi` will accurately estimate the variance in the observed mean-squared displacements, giving error bars for the above plot. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "82b2379b-5dbd-4d57-937c-1b84e318bf6d",
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.errorbar(kinisi_diff.dt, kinisi_diff.msd, kinisi_diff.msd_std, c='#ff7f0e')\n",
    "plt.ylabel('MSD/Å$^2$')\n",
    "plt.xlabel(r'$\\Delta t$/ps')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "50fa7a3d-1171-4896-a2ce-36370a7a9620",
   "metadata": {},
   "source": [
    "The second benefit is that `kinisi` will estimate the diffusion coefficient with an accurate uncertainty. \n",
    "`pymatgen` also estimates this uncertainty, however, `pymatgen` assumes that the data is independent and applies [weighted least squares](https://en.wikipedia.org/wiki/Weighted_least_squares). \n",
    "However, mean-squared displacement observations are inherently dependent (as discussed in the [thought experiment in the methodology](https://kinisi.readthedocs.io/en/latest/methodology.html#Understanding-the-correlation-between-measurements)), so `kinisi` accounts for this and applied a [generalised least squares](https://en.wikipedia.org/wiki/Generalized_least_squares) style approach. \n",
    "This means that the estimated variance in the diffusion coefficient from `kinisi` is accurate (while, `pymatgen` will heavily underestimate the value) and given the [BLUE](https://en.wikipedia.org/wiki/Gauss–Markov_theorem#Generalized_least_squares_estimator) nature of the GLS approach, `kinisi` has a higher probability of determining a value for the diffusion coefficient closer to the true diffusion coefficient. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b53004e8-4550-4a7d-9b7f-42c478529496",
   "metadata": {},
   "outputs": [],
   "source": [
    "kinisi_diff.diffusion(kinisi_diff.ngp_max, {'progress': False})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f40d9b74-15d8-4971-90b2-4a8771fb7698",
   "metadata": {},
   "outputs": [],
   "source": [
    "from uncertainties import ufloat"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "65bdd80a-9396-40e3-b45b-a09862a79534",
   "metadata": {},
   "outputs": [],
   "source": [
    "print('D from pymatgen:', \n",
    "      ufloat(pymatgen_diff.diffusivity, pymatgen_diff.diffusivity_std_dev))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4933f40b-1b26-4de5-be07-f34e7c108b27",
   "metadata": {},
   "outputs": [],
   "source": [
    "print('D from kinisi:', \n",
    "      ufloat(np.mean(kinisi_diff.D), np.std(kinisi_diff.D, ddof=1)))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "555bc479-d641-47d7-8421-5dea4ab14555",
   "metadata": {},
   "source": [
    "The comparison between weighted and generalised least squared estimators will be discussed in full in a future publication covering the methodology of `kinisi`."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "kinisi",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}