Error 504 Gateway Time-out


#1

Trying to use the pymatgen MPRester query gives me an error about “504 Gateway Time-out”.
This is the code:

rmeur ~ $ cat a.py 
from pymatgen import MPRester
api = MPRester('XXXXXXXXXXX')

res = api.query({"elasticity": {"$exists": True}},
                properties=['task_id', 'pretty_formula', 'elasticity'])
print(len(res))
rmeur ~ $ python3 ./a.py
Traceback (most recent call last):
  File "/Users/fx/anaconda3/lib/python3.7/site-packages/pymatgen/ext/matproj.py", line 152, in _make_request
    .format(response.status_code))
pymatgen.ext.matproj.MPRestError: REST query returned with error status code 504

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./a.py", line 5, in <module>
    properties=['task_id', 'pretty_formula', 'elasticity'])
  File "/Users/fx/anaconda3/lib/python3.7/site-packages/pymatgen/ext/matproj.py", line 737, in query
    mp_decode=mp_decode)
  File "/Users/fx/anaconda3/lib/python3.7/site-packages/pymatgen/ext/matproj.py", line 157, in _make_request
    raise MPRestError(msg)
pymatgen.ext.matproj.MPRestError: REST query returned with error status code 504. Content: b'<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">\n<html><head>\n<title>504 Gateway Time-out</title>\n</head><body>\n<h1>Gateway Time-out</h1>\n<p>The gateway did not receive a timely response\nfrom the upstream server or application.</p>\n<hr>\n<address>Apache/2.2.15 (CentOS) Server at www.materialsproject.org Port 443</address>\n</body></html>\n'

I originally reported it to pymatgen, who said “ This is a Materials Project issue.” https://github.com/materialsproject/pymatgen/issues/1331


#2

HI @fxcoudert,

Too much data is being requested in one query - that is why you are getting the error. The example notebook to which @ongsp linked on that issue thread was written when there were only 4,430 elastic tensors, whereas now there are nearly 14,000. I am working on a pull request (pymatgen#1324) to adapt MPRester.query to better handle large requests.

Note that we impose no query size limits or rate limits: this is a limitation of the way we serve the data dynamically. Our API server queries the database, fetches the data, and returns it to you. Because the API server has limited RAM, it simply cannot serve too-large data requests in a single query. If we e.g. buffered and streamed the results to a temporary file, then we could give you a link to the file and you could download a giant file, but we favor the simplicity of in-RAM dynamic queries. I am adding functionality to MPRester.query to chunk queries.


#3

Given the requests over the web are too big, taking too long, timing out… is there a way to clone/download the whole database, to reduce query issues (and network time)?


#4

If anybody is interested, manually chunking the requests using the $in operator is a workaround:

$ PYTHONUNBUFFERED=1 python3 ./a.py
Found 13934 entries
++++++++++++++++++++++++++++
Confirmed: 13934 entries
$ cat a.py 
from pymatgen import MPRester
api = MPRester('XXXXXXXXXXXXX')

res = api.query({"elasticity": {"$exists": True}}, properties=['material_id'])
entries = [x['material_id'] for x in res]
print(f'Found {len(entries)} entries')

size = 500
chunks = [entries[i:i+size] for i in range(0, len(entries), size)]
res = []

for chunk in chunks:
    print('+', end='')
    x = api.query({"material_id": {"$in": chunk}}, properties=['task_id', 'pretty_formula', 'elasticity'])
    res.extend(x)

print('')
print(f'Confirmed: {len(res)} entries')

#5

I didn’t know about Python 3.6+ f-strings! Thanks @fxcoudert!