Download single property for all compounds in the database

The current API seem to be mostly tailored toward downloading all information about a single compound. Would it be possible to download a single piece of information about all the compounds?

In particular the request from this site

https://www.materialsproject.org/rest/v1/materials//vasp/density?API_KEY=XXX

with empty material parameter would return all materials in the entire database, but it fails since the request is too large. How would I go about creating this request? Is it possible to download the backend database in its entirety?

My applications is machine learning and I need a rather large data set to build up my models

For future reference, I did find a workable solution:

from pymatgen import MPRester                                                     
import urllib.request                                                             
import json                                                                       
                                                                                  
if __name__ == "__main__":                                                        
    MAPI_KEY = "XXXXX"  # You must change this to your Materials API key! (or set MAPI_KEY env variable)
                                                                                  
    # fetch list of a list of all available materials                             
    with urllib.request.urlopen('https://www.materialsproject.org/rest/v1/materials//mids') as myurl:
        data = json.loads(myurl.read().decode())                                  
        material_ids = data['response'] # 75,000'ish material IDs are returned 
                                                                                  
                                                                                  
    with MPRester(MAPI_KEY) as m: # object for connecting to MP Rest interface 
        criteria={'material_id': {'$in':material_ids[:4]}} # to avoid straining the servers, this is only using the first 4 materials
        properties=['energy', 'pretty_formula']            # list a few quanteties of interest
        data = m.query(criteria, properties)                                      
        print(data)                                                               
2 Likes

Hi Vikingscientist,

You were hitting the size limit on returned results, which keeps the API from getting overloaded. The API is well-suited to return the information you’re looking for, but you have to break you query up into smaller batches to avoid this limit.

Whenever I need to do something similar to what you’re trying to do, I first query for all the mp-id’s using the MPRester and store them in a python list. After that, I iterate through the list of mp-id’s and query for the properties of interest about 1000 materials at a time, depending on the property.

r = MPRester():
mp_ids = r.query({}, [“material_id”])
chunk_size = 1000
sublists = [mp_ids[i:i+chunk_size] for i in range(0, len(mp_ids), chunk_size)]

Then you can query for each sublist:

results = []
for sublist in sublists:
results = results + r.query({“material_id”:{"$in": sublist}}, [“pretty_formula”, “structure”])

2 Likes

@Vikingscientist Hello, sorry if this question is a bit late. How do I properly use the code you just posted in the database. Any help would be greatly appreciated