Error querying for a specific material ID


#1

querying for mp-868324 using pymatgen API reproducibly produces error today. Works fine for other mpIDs, and I believe it worked for this specific mpID as well about a week ago. The material also looks fine on the website, but not via pymatgen query:

IndexError Traceback (most recent call last)
in ()
----> 1 trystr=MPRester().get_structure_by_material_id( “mp-868324”)

…/anaconda/lib/python2.7/site-packages/pymatgen/matproj/rest.pyc in get_structure_by_material_id(self, material_id, final)
379 prop = “final_structure” if final else “initial_structure”
380 data = self.get_data(material_id, prop=prop)
–> 381 return data[0][prop]
382
383 def get_entry_by_material_id(self, material_id, compatible_only=True,

IndexError: list index out of range


UPDATE: a few other materials producing the same problem:
mp-868324, mvc-6622 , mvc-6101 , mvc-5897 , mvc-1832 , mvc-4908 , mvc-5559 , mp-867731
All of them used to be queried fine just a couple of weeks ago, and in fact were returned as ground states for specific chemical systems. Would be nice if there is a way to rely on the data returned by API for more than just a week… :slight_smile:


#2

Hi @SVB,

These materials and tasks are still present in the database.

Each material is a group of task ids, of which one is chosen to be the canonical material id.

We make every effort to make sure that the canonical material id remains the same throughout database updates, even as new tasks are added to that individual material. However, there was a bug that affected some tasks (including mvc tasks) in particular that led to the material id changing, as far as I know this bug has been corrected, but the material ids you accessed a few weeks ago are likely among those affected.

The good news is this is very easy to fix. You can either:

  1. Find the new canonical material id using get_material_id_from_task_id e.g.:
mpr.get_materials_id_from_task_id('mp-868324')  # returns mp-861852

or

  1. You can change your query to look in the task_ids list (advanced use):
docs = mpr.query({'task_ids':'mp-868324'}, ['structure'])

This returns the structure for the material which contains the mp-868324 task. To explain how this works, you can see that for this material, the database entry contains:

{'material_id': 'mp-861852', 'task_ids': ['mp-861852', 'mp-863934', 'mp-868324', 'mp-869053', 'mp-869224', 'mp-863263', 'mp-869019']}

(The reason a single material ID has multiple tasks associated with it is as we add a calculation for a new property, e.g. elastic tensor or band structure, a new calculation (task) has to be performed and this is given a unique identifier with the task_id, and is then grouped together with other tasks for the same material)

I hope this helps, and I apologize for the confusion! You are definitely not the only person confused by this, and we’re trying to make efforts to make sure material ids do not change wherever possible.

Best,

Matt


#3

Thanks Matt,
A try-except handler with your workaround seems to work fine. (I assume there is a reason you don’t want to implement it on the server side for backward compatibility?)

BTW is this issue maybe related to the other issue I just reported ?

Thanks,
–Sergey


#4

Matt, if I can ask one more question:

Could you please clarify if I could also pull the data specifically for a particular task? Specifically, there seems to be an issue with mp-5986 that is supposed to be the tetragonal phase of BaTiO3 but returns a different structure after the database upgrade (I complained about this issue in a separate thread).

However, if I check the web page for task mp-5986 it correctly shows the tetragonal phase. So presumably I could just pull the data for that specific task as a workaround to get the desired tetragonal structure. However, as you mentioned, mpr.query({'task_ids': ...}, ['structure'])returns the structure assigned to the material_id, not task_id. I tried using instead mpr.query({'task_ids':'mp-5986'}, ['final_structure']) but it also returns the final structure for the material, NOT the one for the task (the latter should have a=b=4.002,c=4.216, alpha=beta=gamma=90 according to the web page ). How can I pull the results of a particular task, not overriden by the material_id results?

Thanks a lot!
–Sergey

P.S. Also, is there maybe a way to access the old database version, or has it been taken offline permanently?


#5

Hi Sergey,

Could you please clarify if I could also pull the data specifically for a particular task?

mpr.get_task_data() returns some information on individual tasks, we are planning to expand our capabilities to be able to run a query on the tasks directly (e.g. mpr.query_task()) also, but this is not yet available.

Specifically, there seems to be an issue with mp-5986 that is supposed to be the tetragonal phase of BaTiO3 but returns a different structure after the database upgrade

Due to issues with numerical precision and other related factors, we perform symmetry analysis and structure matching subject to certain tolerances when grouping tasks together (the code that does this is all open source, I can find the specific lines if this is useful), however there’s no one perfect cut-off. I haven’t examined this specific material but if I had to guess, I’d say that it’s grouping with the tetragonal tasks after this step.

P.S. Also, is there maybe a way to access the old database version, or has it been taken offline permanently?

We have recently started keeping previous database versions stored and available live, however I’m not sure if these are publicly accessible in any way yet – maybe @dwinston can comment further?

Hope this helps,

Matt


#6

We have recently started keeping previous database versions stored and available live, however I’m not sure if these are publicly accessible in any way yet – maybe @dwinston can comment further?

This is in the works, but we’re still trying to figure out how to make this accessible for everyone. Initially, we’ll likely just distribute links to the database dumps, but there might be a live look-back in the future.


#7

but there might be a live look-back in the future.

There’s no reason we couldn’t add a new API option or MPRester.query() argument to specify which materials collection to query, with a default to the latest database release, and a new API route to return which past releases are available for live querying.