13 Commits

Author SHA1 Message Date
Richard Patel
ffde1a9e5d
Timeout and results saving 2018-11-15 20:14:31 +01:00
Richard Patel
a268c6dbcf
Reduce WaitQueue usage 2018-11-12 00:38:22 +01:00
Richard Patel
4c071171eb
Exclude dups in dir instead of keeping hashes of links 2018-11-11 23:11:30 +01:00
Richard Patel
a8c27b2d21
Hash links 2018-11-06 02:01:53 +01:00
Richard Patel
ed5e35f005
Performance improvements 2018-11-06 00:34:22 +01:00
Richard Patel
77cb45dbec
Detect directory symlinks 2018-10-28 18:37:18 +01:00
Richard Patel
b1c40767e0
Remember scanned URLs 2018-10-28 17:07:30 +01:00
Richard Patel
ab5874129f
Don't retry on 401/403 2018-10-28 03:47:29 +01:00
Richard Patel
faad19f121
more stuff 2018-10-28 03:41:16 +01:00
Richard Patel
4ea5f8a410
Handle HTTP statuses 2018-10-28 03:22:25 +01:00
Richard Patel
1c33346f45
Fix crawl descent 2018-10-28 03:06:18 +01:00
Richard Patel
a507110787
Add stats interval parameter 2018-10-28 02:47:20 +02:00
Richard Patel
79f540bf29
Scheduler 2018-10-28 02:40:12 +02:00