32 Commits

Author SHA1 Message Date
Simon
2d72ff3402 Micro optimization pt. 2 2018-11-05 17:38:33 -05:00
Simon
1b5e6bb7f4 Micro optimization 2018-11-05 17:08:54 -05:00
Richard Patel
add6581804 Add resource stats logging 2018-11-05 22:41:17 +01:00
Richard Patel
395a6f30b2 Fix pprof 2018-11-05 21:55:07 +01:00
Richard Patel
a4e53053b9 Add LICENSE
oi m8 got a loicense for that
2018-11-05 21:42:59 +01:00
Richard Patel
e39565377e Add pprof debug server 2018-11-05 21:39:15 +01:00
Richard Patel
77cb45dbec Detect directory symlinks 2018-10-28 18:37:18 +01:00
Richard Patel
fa37d45378 Remove too many crawler block
More logging
2018-10-28 18:17:04 +01:00
Richard Patel
bfd7302be8 Add urfave/cli app 2018-10-28 17:59:46 +01:00
Richard Patel
b1c40767e0 Remember scanned URLs 2018-10-28 17:07:30 +01:00
Richard Patel
c196b6f20d Better config 2018-10-28 14:19:09 +01:00
Richard Patel
ddfdce9d0f Refactor a bit 2018-10-28 13:43:45 +01:00
Richard Patel
7c4ed9d41e Remove WIP disclaimer 2018-10-28 03:48:33 +01:00
Richard Patel
ab5874129f Don't retry on 401/403 2018-10-28 03:47:29 +01:00
Richard Patel
faad19f121 more stuff 2018-10-28 03:41:16 +01:00
Richard Patel
4ea5f8a410 Handle HTTP statuses 2018-10-28 03:22:25 +01:00
Richard Patel
1c33346f45 Fix crawl descent 2018-10-28 03:06:18 +01:00
Richard Patel
a507110787 Add stats interval parameter 2018-10-28 02:47:20 +02:00
Richard Patel
79f540bf29 Scheduler 2018-10-28 02:40:12 +02:00
Richard Patel
5ac9fc10a1 Merge branch 'rewrite' 2018-10-27 17:27:52 +02:00
Richard Patel
941899e304 Merge branch 'queue' 2018-10-27 17:27:36 +02:00
Richard Patel
3fb4d4bde9 More logs 2018-10-27 17:25:32 +02:00
Richard Patel
76c8c13d49 Use finite state machine 2018-10-27 16:55:00 +02:00
Richard Patel
442a2cf8a7 Compare finite state machine and Regex 2018-10-27 16:53:45 +02:00
Richard Patel
9e090d109d Header state machine 2018-10-27 16:29:10 +02:00
Richard Patel
d748be72cd File HEAD requests 2018-10-27 16:22:01 +02:00
Richard Patel
2844d344ec Working listing 2018-10-27 15:00:20 +02:00
Richard Patel
6b73cf75b8 Add .gitignore 2018-10-27 04:13:05 +02:00
Richard Patel
7b98db9e78 WIP disclaimer 2018-10-27 04:12:33 +02:00
Richard Patel
abf069f946 Bits of ODDB API 2018-10-27 04:10:08 +02:00
Richard Patel
f2d2b620fa Simple queue crawler 2018-10-27 04:08:32 +02:00
Richard Patel
dc816146cc Initial commit 2018-10-27 04:07:42 +02:00