mirror of
https://github.com/simon987/nyaa.git
synced 2025-12-13 15:19:03 +00:00
Pad info_hash in ElasticSearch sync scripts
python-mysql-replication (or PyMySQL) would return less than 20 bytes for info-hashes that had null bytes near the end, leaving incomplete hashes in the ES index. Without delving too deep into the real issue (be it lack of understanding MySQL storing binary data or a bug in the libraries), thankfully we can just pad the fixed-size info-hashes to be 20 bytes. Padding in import_to_es.py may be erring on the side of caution, but safe is established to be better than sorry. (SQLAlchemy is unaffected by this bug) Fixes #456
This commit is contained in:
@@ -21,6 +21,9 @@ app = create_app('config')
|
||||
es = Elasticsearch(timeout=30)
|
||||
ic = IndicesClient(es)
|
||||
|
||||
def pad_bytes(in_bytes, size):
|
||||
return in_bytes + (b'\x00' * max(0, size - len(in_bytes)))
|
||||
|
||||
# turn into thing that elasticsearch indexes. We flatten in
|
||||
# the stats (seeders/leechers) so we can order by them in es naturally.
|
||||
# we _don't_ dereference uploader_id to the user's display name however,
|
||||
@@ -42,7 +45,7 @@ def mk_es(t, index_name):
|
||||
"created_time": t.created_time,
|
||||
# not analyzed but included so we can render magnet links
|
||||
# without querying sql again.
|
||||
"info_hash": t.info_hash.hex(),
|
||||
"info_hash": pad_bytes(t.info_hash, 20).hex(),
|
||||
"filesize": t.filesize,
|
||||
"uploader_id": t.uploader_id,
|
||||
"main_category_id": t.main_category_id,
|
||||
|
||||
Reference in New Issue
Block a user