Serious problem: incomplete calculations labeled as final


#1

It is well known that the final energy of a VASP relaxation run is meaningless due to basis set changes during relaxation, and the MaterialsProject has been so popular and relied upon exactly because all the data was supposed to strictly follow the best practices, including multiple-task runs, at minimum including relaxation+static sequence, as well as strict error checking procedures. Looks like there are now entries in MP that do NOT have a static run, yet labeled as a properly calculated material used for further analysis. An example is mp-1079094, which has a single GGA relaxation task (involving a HUGE cell shape change!) yet selected as a ground state.

The issue appears to be very serious. In particular, mp-1079094 shows as the ground state for Sb2Te3, likely because of ~1ev/atom (!!!) error in the final energy. Specifically, mp-1079094 is a rare high-P phase (a single ICSD entry), which, however, is calculated to be >1 eV/atom lower in energy than the common phase (mp-1201, a few dozen ICSD entries). Plotting a phase diagram along GeTe - Sb2Te3 makes it very clear that mp-1079094 is a problematic entry: there is a clear line connecting mp-938 (GeTe) and mp-1201 (common Sb2Te3), and energies of most Ge-Sb-Te polymorphs lie within 0.2eV/atom from this line (with a few higher-energy structures, forming a very typical distribution), yet there is a single entry 1eV below that line, and that entry is the mp-1079094, which we know should be wrong because it uses the energy from a relaxation run…

It is very concerning, as looks like the basic data integrity principles may have been compromised…


#2

Hi @SVB,

This is not a response to the overall issue raised here, since we’re still investigating. However, I wanted to clarify the following point:

An example is mp-1079094, which has a single GGA relaxation task (involving a HUGE cell shape change!)

Every “optimization” task on Materials Project is actually performed as a double optimization run, precisely to account for cases such as this where there is a huge cell shape change or huge volume change. So in the case of mp-1079094, although there was overall a dramatic change from initial and final volume, the change over the course of the second optimization step was actually much smaller (~0.05%). This alone shouldn’t account for the discrepancy you’ve flagged.

Myself or another member of the MP team will update this thread as we investigate this issue further. Thanks for bring this to our attention.

Best,

Matt


#3

Thanks Matt, this is relieving. I wasn’t able to tell from looking at the task that there were two relaxation runs (BTW is there a way I can tell?), but that should indeed have taken care of most of the basis set error. I would guess the issue is more subtle then (problem with a VASP run? maybe k-mesh symmetrization failing?). Thanks a lot for investigating this!


#4

Yes, each task has its own task details page, which for this particular task is available here: https://materialsproject.org/tasks/mp-1079094/

You can see there are two graphs underneath, “iterative steps in first relaxation” and “iterative steps in second relaxation”.

However, for the issue you reported, unfortunately there definitely is an issue here, mostly related to a subset of the new materials added in the most recent database release (v2019.02). We’re aware of the cause and hope to fix it in the next database release, and will include an explanation of what happened along with a list of tasks affected.


#5

Thanks Matt, I wasn’t realizing the two plots implied two reruns. Pretty obvious in retrospect, though.

Is there a way to mask those “new” materials from the queried data for the sake of querying for e_above_hull and other phase-diagram-related analysis? The only solution that I came up with so far is to seek for the specific known-to-produce-problems (e.g. mp-1079094) materials in PhaseDiagram.qhull_data, exclude it from there, and then manually pass it to qhull and redo the entire phase diagram analysis… Inconvenient, error-prone, and requires major reworking in all the existing scripts and workflows… Are there any simpler solutions, maybe?

Thanks!


#6

Just wanted to check if there is any updates / solution to the issue with the wrong entries/ compromised data? Phase diagrams involving Sb2Te3 are still all totally wrong… I just downloaded the MP files and reran the calculations for a few Sb2Te3 isomorphs: for mp-1201, I reproduce the MP energy (and the downloaded geometry is indeed fully relaxed), whereas for mp-1079094, the downloaded (presumably fully relaxed?) structure undergoes huge relaxation, after which it has energy ~0.2eV/atom higher than mp-1201, rather than 1eV/atom lower as MP shows.

I don’t quite understand why such entries are not taken offline, rolling down to the older database version if necessary? The data presented at MP site (and API) are just terribly wrong, and each individual incorrect compound potentially spoils ALL phase diagrams, reactions, etc., in the chemical systems involving it as a subsystem! So far mp-1079094 is the only one I have seen a 1eV/atom error, but clearly there are other cases: e.g. plotting Mg-Al phase diagram suggests a problem with mp-1185596; I take it from your earlier reply that there are also other cases you’ve encountered. Why keep a compromised data online? It makes the entire database unreliable (read “useless”)…

Thanks,
–Sergey


#7

Hi @SVB,

We’re still discussing this internally. Rolling back to a previous database version until this issue is resolved is definitely an option, but would also mean that certain mp-ids would go temporarily “missing”, which is problematic for other reasons. This might however be the best option for the time being.

Beyond this specific issue, we’re also discussing ways to better share historical versions of the database, and to allow users to opt-in to preview new versions of the database, so that we can better handle this in future if any similar issues occur.

Resolving this is absolutely a priority for us. Thanks for your patience while we get it fixed.

Matt