26 Commits

Author SHA1 Message Date
Richard Patel
326e29e5e4
Reset to stable branch 2019-02-22 05:37:45 +01:00
terorie
885af5bb3b
Beta task resuming 2019-02-03 16:50:08 +01:00
Richard Patel
46c0e0bd32
Smarter HTTP error handling 2019-02-03 03:35:09 +01:00
Richard Patel
b244cdae80
Minor cleanup 2018-12-18 15:31:33 +01:00
Richard Patel
85d2aac9d4
Performance patch 2018-11-20 02:33:50 +01:00
Richard Patel
b6c0a45900
Job queue disk offloading 2018-11-20 02:03:10 +01:00
Richard Patel
6e6a4edd27
Ignore all HTTP errors 2018-11-18 14:25:06 +01:00
Richard Patel
73ba848e17
Grammar 2018-11-17 13:35:29 +01:00
Richard Patel
115983f70e
Silent HTTP errors 2018-11-17 13:22:46 +01:00
Richard Patel
bfb18d62b2
mini fix 2018-11-17 05:27:09 +01:00
Richard Patel
d596882b40
Fix ton of bugs 2018-11-17 04:18:22 +01:00
Richard Patel
f1687679ab
Unescape results & don't recrawl 404 2018-11-17 01:21:20 +01:00
Richard Patel
145d37f84a
Fix wait, add back crawl command 2018-11-17 00:49:09 +01:00
Richard Patel
ffde1a9e5d
Timeout and results saving 2018-11-15 20:14:31 +01:00
Richard Patel
a268c6dbcf
Reduce WaitQueue usage 2018-11-12 00:38:22 +01:00
Richard Patel
4c071171eb
Exclude dups in dir instead of keeping hashes of links 2018-11-11 23:11:30 +01:00
Richard Patel
a8c27b2d21
Hash links 2018-11-06 02:01:53 +01:00
Richard Patel
ed5e35f005
Performance improvements 2018-11-06 00:34:22 +01:00
Richard Patel
77cb45dbec
Detect directory symlinks 2018-10-28 18:37:18 +01:00
Richard Patel
b1c40767e0
Remember scanned URLs 2018-10-28 17:07:30 +01:00
Richard Patel
ab5874129f
Don't retry on 401/403 2018-10-28 03:47:29 +01:00
Richard Patel
faad19f121
more stuff 2018-10-28 03:41:16 +01:00
Richard Patel
4ea5f8a410
Handle HTTP statuses 2018-10-28 03:22:25 +01:00
Richard Patel
1c33346f45
Fix crawl descent 2018-10-28 03:06:18 +01:00
Richard Patel
a507110787
Add stats interval parameter 2018-10-28 02:47:20 +02:00
Richard Patel
79f540bf29
Scheduler 2018-10-28 02:40:12 +02:00