118 Commits

Author SHA1 Message Date
Richard Patel
24ee6fcba2
Quickfix: Revert FTP give back 2018-11-17 12:43:30 +01:00
Richard Patel
bfb18d62b2
mini fix 2018-11-17 05:27:09 +01:00
Richard Patel
f4054441ab
Return FTP tasks 2018-11-17 05:07:52 +01:00
Richard Patel
f8d2bf386d
Fix FTP error ignore 2018-11-17 04:57:19 +01:00
Richard Patel
f41198b00c
Ignore FTP URLs 2018-11-17 04:50:59 +01:00
Richard Patel
7fdffff58f
Update config.yml 2018-11-17 04:19:04 +01:00
Richard Patel
d596882b40
Fix ton of bugs 2018-11-17 04:18:22 +01:00
Richard Patel
0fe97a8058
Update README.md 2018-11-17 01:36:07 +01:00
Richard Patel
718f9d7fbc
Rename project 2018-11-17 01:33:15 +01:00
Richard Patel
f1687679ab
Unescape results & don't recrawl 404 2018-11-17 01:21:20 +01:00
Richard Patel
145d37f84a
Fix wait, add back crawl command 2018-11-17 00:49:09 +01:00
Richard Patel
cc777bcaeb
redblackhash: Use bytes.Compare 2018-11-16 21:17:39 +01:00
Simon
1e78cea7e7 Saved path should not contain file name 2018-11-16 13:58:12 -05:00
Richard Patel
3f85cf679b
Getting tasks 2018-11-16 04:47:08 +01:00
Richard Patel
3c39f0d621
Random hacks 2018-11-16 03:22:51 +01:00
Richard Patel
50952791c5
Almost done 2018-11-16 03:12:26 +01:00
Richard Patel
30bf98ad34
Fix tests 2018-11-16 03:02:10 +01:00
Richard Patel
ccaf758e90
Remove URL.Opaque 2018-11-16 01:53:16 +01:00
Richard Patel
f668365edb
Add tests 2018-11-16 01:51:34 +01:00
Richard Patel
1db8ff43bb
Bump version 2018-11-16 00:25:11 +01:00
Richard Patel
82234f949e
Less tokenizer allocations 2018-11-16 00:22:40 +01:00
Richard Patel
084b3a5903
Optimizing with hexa :P 2018-11-15 23:51:31 +01:00
Richard Patel
ac0b8d2d0b
Blacklist all paths with a query parameter 2018-11-15 23:36:41 +01:00
Richard Patel
ffde1a9e5d
Timeout and results saving 2018-11-15 20:14:31 +01:00
Richard Patel
a268c6dbcf
Reduce WaitQueue usage 2018-11-12 00:38:22 +01:00
Richard Patel
4c071171eb
Exclude dups in dir instead of keeping hashes of links 2018-11-11 23:11:30 +01:00
Richard Patel
9c8174dd8d
Fix header parsing 2018-11-11 18:53:17 +01:00
Richard Patel
93272e1da1
Update README.md 2018-11-06 02:41:20 +01:00
Richard Patel
0344a120ff
fasturl: Remove path escape 2018-11-06 02:15:09 +01:00
Richard Patel
6e6afd771e
fasturl: Remove query 2018-11-06 02:11:22 +01:00
Richard Patel
a8c27b2d21
Hash links 2018-11-06 02:01:53 +01:00
Richard Patel
ed5e35f005
Performance improvements 2018-11-06 00:34:22 +01:00
Richard Patel
a12bca01c8
fasturl: Discard UserInfo 2018-11-06 00:33:57 +01:00
Richard Patel
ba9c818461
fasturl: Don't parse username and password 2018-11-06 00:28:42 +01:00
Richard Patel
9cf31b1d81
fasturl: Remove fragment 2018-11-06 00:17:10 +01:00
Richard Patel
ed0d9c681f
fasturl: Replace scheme with enum 2018-11-06 00:15:12 +01:00
Richard Patel
b88d45fc21
fasturl: Remove allocs from Parse 2018-11-05 23:05:21 +01:00
Richard Patel
4989adff9f
Add net/url package 2018-11-05 22:57:57 +01:00
Richard Patel
add6581804
Add resource stats logging 2018-11-05 22:41:17 +01:00
Richard Patel
395a6f30b2
Fix pprof 2018-11-05 21:55:07 +01:00
Richard Patel
a4e53053b9
Add LICENSE
oi m8 got a loicense for that
2018-11-05 21:42:59 +01:00
Richard Patel
e39565377e
Add pprof debug server 2018-11-05 21:39:15 +01:00
Richard Patel
77cb45dbec
Detect directory symlinks 2018-10-28 18:37:18 +01:00
Richard Patel
fa37d45378
Remove too many crawler block
More logging
2018-10-28 18:17:04 +01:00
Richard Patel
bfd7302be8
Add urfave/cli app 2018-10-28 17:59:46 +01:00
Richard Patel
b1c40767e0
Remember scanned URLs 2018-10-28 17:07:30 +01:00
Richard Patel
c196b6f20d
Better config 2018-10-28 14:19:09 +01:00
Richard Patel
ddfdce9d0f
Refactor a bit 2018-10-28 13:43:45 +01:00
Richard Patel
7c4ed9d41e
Remove WIP disclaimer 2018-10-28 03:48:33 +01:00
Richard Patel
ab5874129f
Don't retry on 401/403 2018-10-28 03:47:29 +01:00