{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "yeh-hummer-intro",
   "metadata": {},
   "source": [
    "# Finite-size corrections with Yeh-Hummer\n",
    "\n",
    "When performing molecular dynamics simulations with periodic boundary conditions, the calculated diffusion coefficients are affected by finite-size effects. The `kinisi` package includes tools to apply the Yeh-Hummer finite-size correction, which extrapolates the diffusion coefficient to the infinite system size limit.\n",
    "\n",
    "In this tutorial, we will demonstrate how to use the `YehHummer` class to correct diffusion coefficients obtained from simulations with different box sizes.\n",
    "\n",
    "<div class=\"alert alert-info\">\n",
    "\n",
    "**Note**\n",
    "\n",
    "The Yeh-Hummer correction is based on the work by Yeh & Hummer (J. Phys. Chem. B 2004, 108, 15873-15879). The method assumes that the finite-size effects follow a linear relationship with the inverse box length.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "imports",
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import scipp as sc\n",
    "\n",
    "from kinisi.yeh_hummer import YehHummer\n",
    "\n",
    "np.random.seed(42)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "data-setup",
   "metadata": {},
   "source": [
    "## Setting up the data\n",
    "\n",
    "For this tutorial, we will use the TIP3P water diffusion data from the original Yeh & Hummer paper. This data represents diffusion coefficients calculated from molecular dynamics simulations of water at different box sizes.\n",
    "\n",
    "For obtaining diffusion coefficients from your simulations, you can use the `kinisi.analyze.DiffusionAnalyzer` (from `DiffusionAnalyzer.D`) by following the steps outlined in the [Diffusion coefficient from a VASP file](./vasp_d.html) tutorial."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "tip4p-data",
   "metadata": {},
   "outputs": [],
   "source": [
    "# TIP3P water data from Yeh & Hummer paper\n",
    "box_lengths = np.array([18.58, 23.42, 29.51, 37.19, 46.86])  # Angstroms\n",
    "D_values = np.array([4.884e-5, 5.123e-5, 5.315e-5, 5.466e-5, 5.590e-5])  # cm^2/s\n",
    "D_errors = np.array([0.032e-5, 0.027e-5, 0.014e-5, 0.011e-5, 0.013e-5])  # cm^2/s"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dataarray-creation",
   "metadata": {},
   "source": [
    "We need to organize this data into a `scipp.DataArray` with the appropriate coordinates and variances."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "create-dataarray",
   "metadata": {},
   "outputs": [],
   "source": [
    "td = sc.DataArray(\n",
    "    data=sc.array(dims=['system'], values=D_values, variances=D_errors**2, unit='cm^2/s'),\n",
    "    coords={'box_length': sc.Variable(dims=['system'], values=box_lengths, unit='angstrom')},\n",
    ")\n",
    "td"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "yeh-hummer-object",
   "metadata": {},
   "source": [
    "## Creating the YehHummer object\n",
    "\n",
    "With our data prepared, we can now create a `YehHummer` object. The temperature is required for the viscosity calculation. You can optionally provide bounds for the fitting parameters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "create-yeh-hummer",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create YehHummer object with temperature 298 K\n",
    "yh = YehHummer(td, temperature=sc.scalar(value=298, unit='K'))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "mcmc-sampling",
   "metadata": {},
   "source": [
    "## MCMC sampling for uncertainty estimation\n",
    "\n",
    "To obtain full probability distributions for the parameters, we can perform Markov chain Monte Carlo (MCMC) sampling:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "run-mcmc",
   "metadata": {},
   "outputs": [],
   "source": [
    "yh.mcmc(n_samples=500, n_walkers=16)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "parameter-distributions",
   "metadata": {},
   "source": [
    "After MCMC sampling, we can visualize the probability distributions for the fitted parameters:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "corner-plot",
   "metadata": {},
   "outputs": [],
   "source": [
    "from corner import corner\n",
    "\n",
    "corner(\n",
    "    np.array([i.values for i in yh.flatchain.values()]).T,\n",
    "    labels=[' / '.join([k, str(v.unit)]) for k, v in yh.flatchain.items()],\n",
    ")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "results-access",
   "metadata": {},
   "source": [
    "## Accessing the results\n",
    "\n",
    "The infinite-system diffusion coefficient and shear viscosity are now available as probability distributions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "show-d-infinite",
   "metadata": {},
   "outputs": [],
   "source": [
    "yh.D_infinite"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "show-viscosity",
   "metadata": {},
   "outputs": [],
   "source": [
    "yh.shear_viscosity"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "plotting-results",
   "metadata": {},
   "source": [
    "## Plotting the results\n",
    "\n",
    "We can now plot the linear relationship between the diffusion coefficient and the inverse box length with credible intervals.\n",
    "For the estimation of the diffusion coefficient $D_\\infty$ we extrapolate to infinite box size $\\frac{1}{L} \\to 0$:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "plot-results",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots()\n",
    "\n",
    "credible_intervals = [[16, 84], [2.5, 97.5], [0.15, 99.85]]\n",
    "alpha = [0.6, 0.4, 0.2]\n",
    "\n",
    "inv_L_data = 1 / yh.box_lengths.values\n",
    "max_inv_L = np.max(inv_L_data)\n",
    "inv_L_extended = np.linspace(0, max_inv_L * 1.1, 50)\n",
    "\n",
    "ax.errorbar(\n",
    "    inv_L_data,\n",
    "    yh.diffusion.values,\n",
    "    np.sqrt(yh.diffusion.variances),\n",
    "    marker='o',\n",
    "    ls='',\n",
    "    color='k',\n",
    "    zorder=10,\n",
    ")\n",
    "\n",
    "D_0_samples = yh.data_group['D_0'].values\n",
    "slope_samples = yh.data_group['slope'].values\n",
    "n_samples = len(D_0_samples)\n",
    "predictions_extended = np.zeros((len(inv_L_extended), n_samples))\n",
    "\n",
    "for i in range(n_samples):\n",
    "    predictions_extended[:, i] = D_0_samples[i] - slope_samples[i] * inv_L_extended\n",
    "\n",
    "for i, ci in enumerate(credible_intervals):\n",
    "    ax.fill_between(\n",
    "        inv_L_extended,\n",
    "        *np.percentile(predictions_extended, ci, axis=1),\n",
    "        alpha=alpha[i],\n",
    "        color='#0173B2',\n",
    "        lw=0,\n",
    "    )\n",
    "\n",
    "ax.set_xlabel(f'1 / L / {yh.box_lengths.unit}')\n",
    "ax.set_ylabel(f'D  / {yh.diffusion.unit}')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "understanding-results",
   "metadata": {},
   "source": [
    "## Understanding the results\n",
    "\n",
    "The Yeh-Hummer correction provides two important quantities:\n",
    "\n",
    "1. **Infinite-system diffusion coefficient ($D_{\\infty}$)**: This is the extrapolated diffusion coefficient for an infinite system.\n",
    "\n",
    "2. **Shear viscosity ($\\eta$)**: This is estimated from the slope of the linear relationship:\n",
    "\n",
    "$$D_{\\text{PBC}} = D_\\infty - \\frac{k_B T \\xi}{6\\pi\\eta L}$$\n",
    "\n",
    "where:\n",
    "- $D_{\\text{PBC}}$ is the diffusion coefficient from the periodic simulation\n",
    "- $D_\\infty$ is the infinite-system diffusion coefficient\n",
    "- $k_B$ is the Boltzmann constant\n",
    "- $T$ is the temperature\n",
    "- $\\xi$ constant from Ewald summation (2.837297 for cubic boxes)\n",
    "- $\\eta$ is the shear viscosity\n",
    "- $L$ is the box length"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "kinisi",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}