Comparison with pymatgen#

The pymatgen project also has tools capable of calculating the mean-squared displacement and diffusion coefficient from a relevant input. So why should you use kinisi over pymatgen?

The simple answer is that the approach taken by kinisi, which is outlined in the methodology, uses a higher precision approach to estimate the diffusion coefficent and offers an accurate estimate in the variance of the mean-squared displacements and diffusion coefficient from a single simulation.

In this notebook, we will compare the results from pymatgen and kinisi. First we will import the kinisi and pymatgen DiffusionAnalyzer classes.

[1]:
import numpy as np
from kinisi.analyze import DiffusionAnalyzer as KinisiDiffusionAnalyzer
from pymatgen.analysis.diffusion.analyzer import DiffusionAnalyzer as PymatgenDiffusionAnalyzer
from pymatgen.io.vasp import Xdatcar
np.random.seed(42)

The kinisi.DiffusionAnalyzer API was based on the pymatgen equivalent, therefore, the two take the same inputs and can parse the Xdatcar.structures.

[2]:
p_params = {'specie': 'Li',
            'time_step': 2.0,
            'step_skip': 50
            }
[3]:
xd = Xdatcar('./example_XDATCAR.gz')
/home/docs/checkouts/readthedocs.org/user_builds/kinisi/envs/latest/lib/python3.12/site-packages/pymatgen/io/vasp/outputs.py:4496: EncodingWarning: We strongly encourage explicit `encoding`, and we would use UTF-8 by default as per PEP 686
  file_len = sum(1 for _ in zopen(filename, mode="rt"))
/home/docs/checkouts/readthedocs.org/user_builds/kinisi/envs/latest/lib/python3.12/site-packages/pymatgen/io/vasp/outputs.py:4499: EncodingWarning: We strongly encourage explicit `encoding`, and we would use UTF-8 by default as per PEP 686
  with zopen(filename, mode="rt") as file:

We can then run both the pymagten analysis and the kinisi analysis (the pymatgen requires and additional temperature keyword which is not used in this example).

[4]:
pymatgen_diff = PymatgenDiffusionAnalyzer.from_structures(
    xd.structures, temperature=300, **p_params)
[5]:
p_params['progress'] = False
u_params = {'progress': False}

kinisi_diff = KinisiDiffusionAnalyzer.from_pymatgen(
    xd.structures, parser_params=p_params, uncertainty_params=u_params)
/home/docs/checkouts/readthedocs.org/user_builds/kinisi/envs/latest/lib/python3.12/site-packages/kinisi/parser.py:116: UserWarning: Converting triclinic cell to orthorhombic this may have unexpected results. Triclinic a, b, c are not equilivalent to orthorhombic x, y, z.
  warnings.warn(
/home/docs/checkouts/readthedocs.org/user_builds/kinisi/envs/latest/lib/python3.12/site-packages/kinisi/parser.py:437: UserWarning: UserWarning: Be aware that the expected input unit for 'time_step' is femtoseconds, not picoseconds.
  warnings.warn("UserWarning: Be aware that the expected input unit for 'time_step' is femtoseconds, not picoseconds.")

Now we can plot the mean-squared displacement from each to check agreement, the pymatgen time units are femtoseconds so these are adjusted.

[6]:
import matplotlib.pyplot as plt
[7]:
plt.plot(pymatgen_diff.dt / 1000, pymatgen_diff.msd, label='pymatgen')
plt.plot(kinisi_diff.dt, kinisi_diff.msd, label='kinisi')
plt.legend()
plt.ylabel('MSD/Å$^2$')
plt.xlabel(r'$\Delta t$/ps')
plt.show()
_images/pymatgen_10_0.png

We can see that the results overlap almost entirely.

However, this doesn’t show the benefits for using kinisi over pymatgen. The first benefit is that kinisi will accurately estimate the variance in the observed mean-squared displacements, giving error bars for the above plot.

[8]:
plt.errorbar(kinisi_diff.dt, kinisi_diff.msd, kinisi_diff.msd_std, c='#ff7f0e')
plt.ylabel('MSD/Å$^2$')
plt.xlabel(r'$\Delta t$/ps')
plt.show()
_images/pymatgen_12_0.png

The second benefit is that kinisi will estimate the diffusion coefficient with an accurate uncertainty. pymatgen also estimates this uncertainty, however, pymatgen assumes that the data is independent and applies weighted least squares. However, mean-squared displacement observations are inherently dependent (as discussed in the thought experiment in the methodology), so kinisi accounts for this and applied a generalised least squares style approach. This means that the estimated variance in the diffusion coefficient from kinisi is accurate (while, pymatgen will heavily underestimate the value) and given the BLUE nature of the GLS approach, kinisi has a higher probability of determining a value for the diffusion coefficient closer to the true diffusion coefficient.

[9]:
kinisi_diff.diffusion(kinisi_diff.ngp_max, {'progress': False})
[10]:
from uncertainties import ufloat
[11]:
print('D from pymatgen:',
      ufloat(pymatgen_diff.diffusivity, pymatgen_diff.diffusivity_std_dev))
D from pymatgen: (1.33307+/-0.00008)e-05
[12]:
print('D from kinisi:',
      ufloat(np.mean(kinisi_diff.D), np.std(kinisi_diff.D, ddof=1)))
D from kinisi: (1.53+/-0.07)e-05

The comparison between weighted and generalised least squared estimators will be discussed in full in a future publication covering the methodology of kinisi.