Trying to use the pymatgen MPRester query gives me an error about “504 Gateway Time-out”.
This is the code:
rmeur ~ $ cat a.py
from pymatgen import MPRester
api = MPRester('XXXXXXXXXXX')
res = api.query({"elasticity": {"$exists": True}},
properties=['task_id', 'pretty_formula', 'elasticity'])
print(len(res))
rmeur ~ $ python3 ./a.py
Traceback (most recent call last):
File "/Users/fx/anaconda3/lib/python3.7/site-packages/pymatgen/ext/matproj.py", line 152, in _make_request
.format(response.status_code))
pymatgen.ext.matproj.MPRestError: REST query returned with error status code 504
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./a.py", line 5, in <module>
properties=['task_id', 'pretty_formula', 'elasticity'])
File "/Users/fx/anaconda3/lib/python3.7/site-packages/pymatgen/ext/matproj.py", line 737, in query
mp_decode=mp_decode)
File "/Users/fx/anaconda3/lib/python3.7/site-packages/pymatgen/ext/matproj.py", line 157, in _make_request
raise MPRestError(msg)
pymatgen.ext.matproj.MPRestError: REST query returned with error status code 504. Content: b'<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">\n<html><head>\n<title>504 Gateway Time-out</title>\n</head><body>\n<h1>Gateway Time-out</h1>\n<p>The gateway did not receive a timely response\nfrom the upstream server or application.</p>\n<hr>\n<address>Apache/2.2.15 (CentOS) Server at www.materialsproject.org Port 443</address>\n</body></html>\n'
Too much data is being requested in one query - that is why you are getting the error. The example notebook to which @ongsp linked on that issue thread was written when there were only 4,430 elastic tensors, whereas now there are nearly 14,000. I am working on a pull request (pymatgen#1324) to adapt MPRester.query to better handle large requests.
Note that we impose no query size limits or rate limits: this is a limitation of the way we serve the data dynamically. Our API server queries the database, fetches the data, and returns it to you. Because the API server has limited RAM, it simply cannot serve too-large data requests in a single query. If we e.g. buffered and streamed the results to a temporary file, then we could give you a link to the file and you could download a giant file, but we favor the simplicity of in-RAM dynamic queries. I am adding functionality to MPRester.query to chunk queries.
Given the requests over the web are too big, taking too long, timing out… is there a way to clone/download the whole database, to reduce query issues (and network time)?