Jump diffusion coefficient from a VASP file#
Previously, we looked at obtaining accurate estimates for the self-diffusion coefficient with kinisi. Here, we look at how we can use kinisi to compute the jump diffusion coefficient.
[1]:
import numpy as np
import matplotlib.pyplot as plt
import scipp as sc
from kinisi.analyze import JumpDiffusionAnalyzer
from pymatgen.io.vasp import Xdatcar
np.random.seed(42)
rng = np.random.RandomState(42)
As with the previous example, params dictionary will describe the details about the simulation.
[2]:
params = {'specie': 'Li',
'time_step': 2.0 * sc.Unit('fs'),
'step_skip': 50 * sc.Unit('dimensionless'),
'progress': False
}
Again, we will add an additional key-value pair to the dictionary. This time, it is the “number of system particles”.
The computation of the jump diffusion coefficient requires the mean-squared total displacement (MSTD) to be found. The MSTD is the displacement of the centre of mass of all selected particles (the specie) in the simulation. This means that for a given simulation, there is only a single “atom” or system particle observed. Just a single system particle means that the statistics for the jump diffusion coefficient can be quite poor.
We can improve the statistics in the computation of the jump diffusion coefficient by splitting the simulation up into some number of individual simulations (in space) and therefore system particles. Then the MSTD over each of these system particles can be found, leading to reduced noise in the uncertainty estimation. The system_particles parameter defined how many times the simulation should be split, here we use 32 system particles.
[3]:
params['system_particles']= 32
[4]:
xd = Xdatcar('./example_XDATCAR.gz')
diff = JumpDiffusionAnalyzer.from_xdatcar(xd, **params)
In the above cells, we parse and determine the uncertainty on the mean-squared displacement as a function of the timestep. We should visualise this, to check that we are observing diffusion in our material and to determine the timescale at which this diffusion begins.
[5]:
fig, ax = plt.subplots()
ax.errorbar(diff.dt.values, diff.mstd.values, np.sqrt(diff.mstd.values))
ax.set_xlabel(f'Time / {diff.dt.unit}')
ax.set_ylabel(f'MSD / {diff.mstd.unit}')
ax.set_xlim(0, None)
ax.set_ylim(0, None)
plt.show()
We compare the variance with when there are fewer system particles used below. Note, the logarithmic y-axis.
[6]:
params['system_particles']= 4
diff_fewer = JumpDiffusionAnalyzer.from_xdatcar(xd, **params)
fig, ax = plt.subplots()
ax.plot(diff.dt.values, diff.mstd.variances, label='16 system particles')
ax.plot(diff_fewer.dt.values, diff_fewer.mstd.variances, label='8 system particles')
ax.set_xlabel(f'Time / {diff.dt.unit}')
ax.set_ylabel(f'MSTD variance / {(1 * diff.mstd.unit ** 2).unit}')
ax.set_xlim(0, None)
# ax.set_ylim(0, None)
ax.set_yscale('log')
ax.legend()
[6]:
<matplotlib.legend.Legend at 0x7f5510a47110>
This same process is relevant to the computation of the conductivity analyser, as the mean-squared charge distribution (MSCD) is the displacement of charge in the system and, therefore, suffers similarly as there is only a single “system particle” in the simulation.
We can visualise the data with 16 system particles on a log-log scale, which helps to reveal the diffusive regime (the region where the gradient stops changing).
[7]:
fix, ax = plt.subplots()
ax.errorbar(diff.dt.values, diff.mstd.values, np.sqrt(diff.mstd.values))
ax.axvline(4000, color='g')
ax.set_xlabel(f'Time / {diff.dt.unit}')
ax.set_ylabel(f'MSTD / {diff.mstd.unit}')
ax.set_yscale('log')
ax.set_xscale('log')
plt.show()
The green line at 4000 fs appears to be a reasonable estimate of the start of the diffusive regime. Therefore, we want to pass 4000 * sc.Units('fs') as the argument to the diffusion analysis below. At this stage, we pass the random_state argument to ensure reproducibility.
[8]:
start_of_diffusion = 4000 * sc.Unit('fs')
diff.jump_diffusion(start_of_diffusion, progress=False, random_state=rng)
This method estimates the correlation matrix between the timesteps and uses posterior sampling to find the jump diffusion coefficient, \(D*\) and intercept. We can find the mean of the marginal posterior samples of \(D*\):
[9]:
diff.D_J
[9]:
- (samples: 3200)float64cm^2/s(9.1+/-2.1)e-06
Values:
array([8.53122505e-06, 8.60297807e-06, 1.00684311e-05, ..., 7.87586184e-06, 1.16698080e-05, 7.95596515e-06], shape=(3200,))
Or we can get all of the sampled values from one of these objects.
[10]:
diff.D_J.values
[10]:
array([8.53122505e-06, 8.60297807e-06, 1.00684311e-05, ...,
7.87586184e-06, 1.16698080e-05, 7.95596515e-06], shape=(3200,))
The same for the intercept can be found for diff.intercept. A histogram of the marginal posterior probability distribution for \(D*\) can be plotted as shown below.
[11]:
fig, ax = plt.subplots()
ax.hist(diff.D_J.values, density=True)
ax.axvline(sc.mean(diff.D_J).value, c='k')
ax.set_xlabel(f'D* / {diff.D_J.unit}')
ax.set_ylabel(f'p(D*) / {(1 / diff.D_J.unit).unit}')
plt.show()
Having completed the analysis, we can save the object for use later (such as downstream by a plotting script).
[13]:
diff.to_hdf5('example.hdf')
The data is saved in a HDF5 file format which helps us efficiently store our data. We can then load the data with the following class method.
[14]:
loaded_diff = JumpDiffusionAnalyzer.from_hdf5('example.hdf')
We can plot the data with the credible intervals from the \(D\) and \(D_{\text{offset}}\) distribution.
[15]:
credible_intervals = [[16, 84], [2.5, 97.5], [0.15, 99.85]]
alpha = [0.6, 0.4, 0.2]
fig, ax = plt.subplots()
ax.plot(loaded_diff.dt.values, loaded_diff.mstd.values, 'k-')
for i, ci in enumerate(credible_intervals):
ax.fill_between(loaded_diff.dt.values,
*np.percentile(loaded_diff.distributions, ci, axis=1),
alpha=alpha[i],
color='#0173B2',
lw=0)
ax.set_xlabel(f'Time / {diff.dt.unit}')
ax.set_ylabel(f'MSD / {diff.mstd.unit}')
ax.set_xlim(0, None)
ax.set_ylim(0, None)
plt.show()
We can then do things like plot the joint probability distribution for the diffusion coefficient and intercept.
[16]:
from corner import corner
[17]:
corner(np.array([i.values for i in loaded_diff.flatchain.values()]).T,
labels=[' / '.join([k, str(v.unit)])
for k, v in loaded_diff.flatchain.items()])
plt.show()