48 Commits

Author SHA1 Message Date
e8f0f96148 More bug fixes... 2020-01-25 12:54:26 -05:00
ae0fb9b1a6 Docker tweaking & bug fixes 2020-01-25 10:16:24 -05:00
simon
31877283b3 Bug fixes, ES7 2019-06-14 14:25:41 -04:00
simon987
06ae89f4d2 Only queue http tasks (temp) 2019-04-06 09:07:17 -04:00
simon987
2046b36f9a Bug fixes 2019-03-28 20:29:34 -04:00
simon987
d69ed65a0c Rewrite export.py, add diagram 2019-03-27 22:09:08 -04:00
simon987
b9f25630b4 Switch to postgresql, finish minimum viable task_tracker/ws_bucket integration 2019-03-27 19:34:05 -04:00
simon987
4ffe805b8d Use task_tracker for task tracking 2019-03-24 20:23:05 -04:00
simon987
00e3fd7340 Remove task tracking 2019-03-09 13:26:05 -05:00
simon987
32c1c861ad hotfix attempt 3 pt. 2 2019-02-02 11:52:27 -05:00
simon987
59b1d249ba hotfix attempt 3 2019-02-02 11:45:46 -05:00
simon987
0ff6ea1682 hotfix attempt 2 2019-02-02 10:58:09 -05:00
simon987
7f857d641f Change ES settings, big refactor, removed recaptcha 2019-01-13 12:48:39 -05:00
terorie
1ac3b97d7e Crawl stats: time format + sorting (#10)
* Nicer stats
* Fix right align
* No leading day zeros
* Fix right-align padding
2018-12-14 09:30:06 -05:00
Simon
e89eb6e3e0 Fixes #9 2018-12-06 10:05:35 -05:00
Simon
edf1849bac Create new rescan task when no queued tasks pt2 2018-11-16 23:01:28 -05:00
Simon
4c51598441 Get queued tasks temporarily returns only non-ftp websites 2018-11-16 22:59:26 -05:00
Simon
6e80791264 Search filter 2018-11-16 16:49:23 -05:00
Simon
db26e851a4 API endpoint to cancel task 2018-10-26 18:13:47 -04:00
Simon
1d3318f6e2 Create new rescan task when no queued tasks 2018-09-29 12:01:02 -04:00
Simon
42d858b62a Queue can be emptied more easily pt.2 2018-08-09 17:14:17 -04:00
Simon
5a084cb857 Queue can be emptied more easily 2018-08-09 17:12:43 -04:00
Simon
cf96d1697d Fixed bug when submitting 2018-07-16 20:34:42 -04:00
Simon
a8a658f55b Crawl server names that are numeric now show up in stats page 2018-07-15 21:33:37 -04:00
Simon
fe1d29aaea Crawl tasks are now fetched by the crawlers instead of pushed by the server 2018-07-14 17:31:18 -04:00
Simon
711e8282ef 'Go to random website' button, and navigation in the website list 2018-07-08 10:42:14 -04:00
Simon
5383ad6aea Searches are not saved to database 2018-06-27 15:29:50 -04:00
Simon
5fd00f22af Task logs now stored on main server 2018-06-24 20:32:02 -04:00
Simon
1ac510ff53 Slots can be updated without removing & adding 2018-06-24 09:39:44 -04:00
Simon
14d384e366 Decentralised crawling should work in theory + temporary fix for going further than the maximum 10k results elasticsearch allows by default 2018-06-21 19:44:27 -04:00
Simon
cf51bb381c Added top websites scatter graph 2018-06-20 12:21:34 -04:00
Simon
7400bdc2a9 Added admin blacklist control in dashboard 2018-06-20 11:28:06 -04:00
Simon
35837463cd Added admin clear & delete buttons for websites 2018-06-20 10:48:51 -04:00
Simon
e54609972c Overwrite document on re-index, update website last_modified on task complete, delete website files on index complete 2018-06-19 11:24:28 -04:00
Simon
e5e38a6faf Elasticsearch export to csv 2018-06-19 09:48:44 -04:00
Simon
83f4b8def9 Enhanced search results page 2018-06-18 15:01:49 -04:00
Simon
9bde8cb629 uWSGI config and bugfix with file extensions 2018-06-13 14:11:27 -04:00
Simon
e91572a06f Homepage stats now work with elasticsearch 2018-06-12 23:19:57 -04:00
Simon
4b60ac62fc Added website url & date in search results & fixed threading problem 2018-06-12 17:48:15 -04:00
Simon
d61fd75890 Tasks can now be queued from the web interface. Tasks are dispatched to the crawl server(s) 2018-06-12 13:44:03 -04:00
Simon
a25976d24a Generate and delete API tokens 2018-06-09 12:41:28 -04:00
Simon
dc0cde61a0 Basic admin page 2018-06-08 11:40:54 -04:00
Simon
306b0ed0fe Added option to choose results per page 2018-06-07 13:19:41 -04:00
Simon
06d3a09e11 Quick hack for search order options 2018-06-07 11:22:35 -04:00
Simon
0b1d76f478 Added blacklist feature (untested) 2018-06-06 10:17:30 -04:00
Simon
270ab1335a Added reply to comments option, fixed some bugs 2018-06-02 17:26:15 -04:00
Simon
bb872a9248 Changed from mime to extension for graph and added script to clear invalid websites 2018-05-31 10:51:59 -04:00
Simon
ad645490f6 Initial commit 2018-05-28 20:35:04 -04:00