od-database

mirror of https://github.com/simon987/od-database.git synced 2025-04-18 18:06:44 +00:00

Author	SHA1	Message	Date
Simon	14d384e366	Decentralised crawling should work in theory + temporary fix for going further than the maximum 10k results elasticsearch allows by default	2018-06-21 19:44:27 -04:00
Simon	e54609972c	Overwrite document on re-index, update website last_modified on task complete, delete website files on index complete	2018-06-19 11:24:28 -04:00
Simon	8768e39f08	Added stats page	2018-06-18 19:56:25 -04:00
Simon	400abc9a3c	Added crawl logs page	2018-06-18 11:41:26 -04:00
Simon	adb94cf326	Should fix memory usage problem when crawling	2018-06-14 23:36:54 -04:00
Simon	81fde6cc30	Bug fixes with html parsing	2018-06-14 20:02:06 -04:00
Simon	f3c7b551d2	Some adjustments to make it work on Stretch server	2018-06-14 17:09:05 -04:00
Simon	dffd032659	Indexing after crawling is a bit more efficient	2018-06-14 16:41:43 -04:00
Simon	83ca579ec7	Started working on post-crawl callbacks and basic auth for crawl servers	2018-06-14 15:05:56 -04:00
Simon	2fe81e4b06	Crawl server now holds at most max_workers + 1 tasks in pool to minimize waiting time and to avoid loss of too many tasks in case of crash/restart	2018-06-12 22:28:36 -04:00
Simon	24ef493245	Websites being indexed now show up on the homepage	2018-06-12 21:51:02 -04:00
Simon	e266a50197	Website stats now works with elasticsearch	2018-06-12 20:17:30 -04:00
Simon	1718bb91ca	Files are indexed into ES when task is complete	2018-06-12 15:45:00 -04:00
Simon	d61fd75890	Tasks can now be queued from the web interface. Tasks are dispatched to the crawl server(s)	2018-06-12 13:44:03 -04:00
Simon	6d48f1f780	Task crawl result now logged in a database	2018-06-12 11:03:45 -04:00
Simon	d849227798	barebones crawl_server microservice	2018-06-11 19:00:43 -04:00

16 Commits