Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							4f5f0f76be
							
						
					 | 
					
						
						
							
							Small adjustments for csv export
						
						
						
						
						
						
					 | 
					
						2018-06-19 10:01:15 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							e5e38a6faf
							
						
					 | 
					
						
						
							
							Elasticsearch export to csv
						
						
						
						
						
						
					 | 
					
						2018-06-19 09:48:44 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							81d52a4551
							
						
					 | 
					
						
						
							
							Changed UI to fit the-eye.eu
						
						
						
						
						
						
					 | 
					
						2018-06-18 22:37:05 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							677bfa03ea
							
						
					 | 
					
						
						
							
							Another fix for encoding problems
						
						
						
						
						
						
					 | 
					
						2018-06-18 20:30:18 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							788d3749d4
							
						
					 | 
					
						
						
							
							Homepage now compatible with new stats
						
						
						
						
						
						
					 | 
					
						2018-06-18 20:04:49 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							8768e39f08
							
						
					 | 
					
						
						
							
							Added stats page
						
						
						
						
						
						
					 | 
					
						2018-06-18 19:56:25 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							7923647ea3
							
						
					 | 
					
						
						
							
							Made the ftp crawler work with the latest changes
						
						
						
						
						
						
					 | 
					
						2018-06-18 15:46:03 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							83f4b8def9
							
						
					 | 
					
						
						
							
							Enhanced search results page
						
						
						
						
						
						
					 | 
					
						2018-06-18 15:01:49 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							8a73142ff8
							
						
					 | 
					
						
						
							
							Support for more than just utf-8 and removed some debug info
						
						
						
						
						
						
					 | 
					
						2018-06-18 13:44:19 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							7c47b0f00c
							
						
					 | 
					
						
						
							
							Added delta column in crawl logs
						
						
						
						
						
						
					 | 
					
						2018-06-18 12:21:00 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							b63c7190c3
							
						
					 | 
					
						
						
							
							Improved external link detection
						
						
						
						
						
						
					 | 
					
						2018-06-18 12:14:05 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							400abc9a3c
							
						
					 | 
					
						
						
							
							Added crawl logs page
						
						
						
						
						
						
					 | 
					
						2018-06-18 11:41:26 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							99d64b658b
							
						
					 | 
					
						
						
							
							Disabled thread pool for headers requests in listing
						
						
						
						
						
						
					 | 
					
						2018-06-18 10:33:33 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							b97b8f6784
							
						
					 | 
					
						
						
							
							Temporary fix for decoding errors
						
						
						
						
						
						
					 | 
					
						2018-06-17 22:17:21 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							344e7274d7
							
						
					 | 
					
						
						
							
							Simplified url joining and splitting, switched from lxml to html.parser, various memory usage optimizations
						
						
						
						
						
						
					 | 
					
						2018-06-17 22:10:46 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							07d51a75cc
							
						
					 | 
					
						
						
							
							Increased queue.get() timeouts
						
						
						
						
						
						
					 | 
					
						2018-06-17 10:07:06 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							e6175c84c9
							
						
					 | 
					
						
						
							
							Re-added timeout that was accidentally deleted
						
						
						
						
						
						
					 | 
					
						2018-06-16 22:20:15 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							1283cc9599
							
						
					 | 
					
						
						
							
							Should fix memory usage problem when crawling (part three)
						
						
						
						
						
						
					 | 
					
						2018-06-16 20:32:50 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							86144935e3
							
						
					 | 
					
						
						
							
							Attempt to fix Unicode errors part two
						
						
						
						
						
						
					 | 
					
						2018-06-16 15:30:44 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							c309aa25c8
							
						
					 | 
					
						
						
							
							Attempt to fix unicode decode errors
						
						
						
						
						
						
					 | 
					
						2018-06-16 15:20:23 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							9d0a0a8b42
							
						
					 | 
					
						
						
							
							Should fix memory usage problem when crawling (part two)
						
						
						
						
						
						
					 | 
					
						2018-06-16 14:53:48 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							adb94cf326
							
						
					 | 
					
						
						
							
							Should fix memory usage problem when crawling
						
						
						
						
						
						
					 | 
					
						2018-06-14 23:36:54 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							9aed18c2d2
							
						
					 | 
					
						
						
							
							Should fix timeout error when indexing
						
						
						
						
						
						
					 | 
					
						2018-06-14 20:07:50 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							81fde6cc30
							
						
					 | 
					
						
						
							
							Bug fixes with html parsing
						
						
						
						
						
						
					 | 
					
						2018-06-14 20:02:06 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							f3c7b551d2
							
						
					 | 
					
						
						
							
							Some adjustments to make it work on Stretch server
						
						
						
						
						
						
					 | 
					
						2018-06-14 17:09:05 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							dffd032659
							
						
					 | 
					
						
						
							
							Indexing after crawling is a bit more efficient
						
						
						
						
						
						
					 | 
					
						2018-06-14 16:41:43 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							83ca579ec7
							
						
					 | 
					
						
						
							
							Started working on post-crawl callbacks and basic auth for crawl servers
						
						
						
						
						
						
					 | 
					
						2018-06-14 15:05:56 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							1bd58468eb
							
						
					 | 
					
						
						
							
							Bug fixes for FTP crawler
						
						
						
						
						
						
					 | 
					
						2018-06-13 15:54:45 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							9bde8cb629
							
						
					 | 
					
						
						
							
							uWSGI config and bugfix with file extensions
						
						
						
						
						
						
					 | 
					
						2018-06-13 14:11:27 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							e91572a06f
							
						
					 | 
					
						
						
							
							Homepage stats now work with elasticsearch
						
						
						
						
						
						
					 | 
					
						2018-06-12 23:19:57 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							2fe81e4b06
							
						
					 | 
					
						
						
							
							Crawl server now holds at most max_workers + 1 tasks in pool to minimize waiting time and to avoid loss of too many tasks in case of crash/restart
						
						
						
						
						
						
					 | 
					
						2018-06-12 22:28:36 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							24ef493245
							
						
					 | 
					
						
						
							
							Websites being indexed now show up on the homepage
						
						
						
						
						
						
					 | 
					
						2018-06-12 21:51:02 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							bccb1d0dfd
							
						
					 | 
					
						
						
							
							Website link list works with elasticsearch
						
						
						
						
						
						
					 | 
					
						2018-06-12 21:26:44 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							e266a50197
							
						
					 | 
					
						
						
							
							Website stats now works with elasticsearch
						
						
						
						
						
						
					 | 
					
						2018-06-12 20:17:30 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							4b60ac62fc
							
						
					 | 
					
						
						
							
							Added website url & date in search results & fixed threading problem
						
						
						
						
						
						
					 | 
					
						2018-06-12 17:48:15 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							0127b3a51d
							
						
					 | 
					
						
						
							
							Basic searching integrated with elasticsearch + highlighting
						
						
						
						
						
						
					 | 
					
						2018-06-12 16:29:05 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							af2601ee70
							
						
					 | 
					
						
						
							
							Fixed file duplication problem
						
						
						
						
						
						
					 | 
					
						2018-06-12 15:55:52 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							1718bb91ca
							
						
					 | 
					
						
						
							
							Files are indexed into ES when task is complete
						
						
						
						
						
						
					 | 
					
						2018-06-12 15:45:00 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							6c912ea8c5
							
						
					 | 
					
						
						
							
							Completed tasks are now fetched by the TaskDispatcher
						
						
						
						
						
						
					 | 
					
						2018-06-12 14:16:05 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							d61fd75890
							
						
					 | 
					
						
						
							
							Tasks can now be queued from the web interface. Tasks are dispatched to the crawl server(s)
						
						
						
						
						
						
					 | 
					
						2018-06-12 13:44:03 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							6d48f1f780
							
						
					 | 
					
						
						
							
							Task crawl result now logged in a database
						
						
						
						
						
						
					 | 
					
						2018-06-12 11:03:45 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							011b8455a7
							
						
					 | 
					
						
						
							
							Elasticsearch search engine (search & scroll)
						
						
						
						
						
						
					 | 
					
						2018-06-11 23:06:41 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							72495275b0
							
						
					 | 
					
						
						
							
							Elasticsearch search engine (import from json)
						
						
						
						
						
						
					 | 
					
						2018-06-11 22:35:49 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							fcfd7d4acc
							
						
					 | 
					
						
						
							
							Bug fixes + export to json
						
						
						
						
						
						
					 | 
					
						2018-06-11 20:02:30 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							d849227798
							
						
					 | 
					
						
						
							
							barebones crawl_server microservice
						
						
						
						
						
						
					 | 
					
						2018-06-11 19:00:43 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							8421cc0885
							
						
					 | 
					
						
						
							
							Refactoring on http crawler
						
						
						
						
						
						
					 | 
					
						2018-06-11 16:06:56 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							7f496ce7a8
							
						
					 | 
					
						
						
							
							Slowly losing my sanity part 1: Removed scrapy dependency and moved to custom solution. Added multi-threaded ftp crawler
						
						
						
						
						
						
					 | 
					
						2018-06-11 15:46:55 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							b649b82854
							
						
					 | 
					
						
						
							
							Cleanup of custom crawler
						
						
						
						
						
						
					 | 
					
						2018-06-10 21:32:08 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							f2d914060b
							
						
					 | 
					
						
						
							
							Removed unsuitable scrapy spider and implemented custom crawler
						
						
						
						
						
						
					 | 
					
						2018-06-10 20:08:59 -04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Simon
							
						 
					 | 
					
						
						
						
						
							
						
						
							d8c16d53e6
							
						
					 | 
					
						
						
							
							FTP url validation
						
						
						
						
						
						
					 | 
					
						2018-06-10 14:32:19 -04:00 | 
					
					
						
						
							
							
							
						
					 |