Downloading all piezoelectric materials



I’m writing to inquire about downloading the structure-property metadata for the 941 piezoelectric materials analyzed in de Jong et al.’s 2015 paper in Nature Scientific Data. Rather than querying each structure based on specific properties, we are interested in downloading and analyzing the entire set of 941. Is these an efficient way for us to do this?

Thank you!


Yes, you can use our API to get the data you need. Using pymatgen.MPRester,

from pymatgen import MPRester
mpr = MPRester()
data = mpr.query({'piezo': {'$exists': True}}, ['material_id', 'piezo'])

You can read more about use of our API, in particular the powerful /rest/v2/query endpoint in conjunction with MongoDB query syntax, here.


Hi @dwinston, Is it possible to do this from the Materials project website directly without needing to write the code? I understand that one can use the API and obtain a json file containing all the information. Now, Can I obtain the structure files (.cif) files directly in a specified location (as is usually the case when one downloads from the website). For example, I want to download all the materials whose band gap lies between 0.5 to 1.2 eV or the materials which belongs to the Zintl phase compounds. How would I approach this problem? Is there a way (in general) for such requirement?

[*** Probably this is an extension to the original question]


Hi @George_Yumnam, there is currently no quick way via the website to obtain all CIFs for a specified set of structures. This is most flexibly and reproducibly done via the API.

I have written a small iPython notebook walking through building a query for Zintl compounds, fetching data (including CIF strings) for those compounds, and using pymatgen to write out CIF files in a zip archive.

To ease issues with running the code locally on your computer, I have registered a Jupyter notebook binder that you can launch in your browser. Click on the get_zintl_cifs.ipynb link after launching to open the notebook and walk through it. If you’re unfamiliar with the interface, you can click on Help at top once the notebook is open, and then User Interface Tour.

For your band gap query, replacing

{'chemsys': {'$in': zintl_systems()}}


{'band_gap': {'$gte': 0.5, '$lte': 1.2}}

in the notebook will do the trick. The query syntax is that of MongoDB, and hierarchical documentation on the data we have available for querying is at our API reference repository.

This approach is the most general way to get nearly anything you want from our database. The website offers convenient shortcuts for typical use cases. I do see your use case as common enough to warrant additional features on the website. We’ll try to work in those features soon.


Hey @dwinston, thank you very much for this wonderful code. I however see that there is an error with the iteration in docs as written in this code while launching it from our local jupyter notebook. I noticed that it worked fine while running it from the notebook binder launched from the link you’ve provided. However, this is the error which I get while launching it from my own local notebook:

TypeError                                 Traceback (most recent call last)
<ipython-input-7-4ed7a36bbf0b> in <module>()
      1 mpr=MPRester("----")
----> 3 docs = mpr.query({'chemsys': {'$in': zintl_systems()}}, 
                           ['material_id', 'pretty_formula','cif'])
      5 with ZipFile('', 'w') as f:

<ipython-input-6-c7ebf31168bc> in zintl_systems()
      5         of the form [...,"Na-Si",...,"Na-Tl",...]
      6     """
----> 7     first_el = {el.symbol for el in Element
      8                 if el.is_alkali or el.is_alkaline}
      9     second_el = {el.symbol for el in Element

TypeError: 'type' object is not iterable


My guess is that the discrepancy between the binder environment and your local environment is your version of pymatgen. Please try upgrading (pip install -U pymatgen). I think that Element was not iterable in past versions.


Thanks a lot! The script worked fine with the new version of pymatgen (4.3.0).

However, there are some bugs while running the code. (attached below) ('Connection broken: 
    IncompleteRead(6990 bytes read, 3250 more expected)', 
    IncompleteRead(6990 bytes read, 3250 more expected))

Is this normal, or is this due to slow connection?

“I require two-three times execution of the script to make it run completely.”


It’s not normal. It’s a low-level network error due to an unstable connection.


Thanks a lot once again @dwinston


This Discussion is very useful to query the data sets from materials project. Also those notebooks from dwinston are awesome.