Richard Patel
|
4b8275c7bf
|
Add parser tests
|
2018-12-18 15:31:09 +01:00 |
|
Richard Patel
|
86ec78cae1
|
Add TCP timeout option
|
2018-11-20 03:29:10 +01:00 |
|
Richard Patel
|
03a487f393
|
Fix crawl loop
|
2018-11-18 18:45:06 +01:00 |
|
Richard Patel
|
a71157b4d8
|
Add User-Agent parameter
|
2018-11-18 14:24:04 +01:00 |
|
Richard Patel
|
6793086c22
|
Ignore HTTPS errors
|
2018-11-18 00:37:30 +01:00 |
|
Richard Patel
|
d596882b40
|
Fix ton of bugs
|
2018-11-17 04:18:22 +01:00 |
|
Richard Patel
|
718f9d7fbc
|
Rename project
|
2018-11-17 01:33:15 +01:00 |
|
Richard Patel
|
f1687679ab
|
Unescape results & don't recrawl 404
|
2018-11-17 01:21:20 +01:00 |
|
Simon
|
1e78cea7e7
|
Saved path should not contain file name
|
2018-11-16 13:58:12 -05:00 |
|
Richard Patel
|
82234f949e
|
Less tokenizer allocations
|
2018-11-16 00:22:40 +01:00 |
|
Richard Patel
|
084b3a5903
|
Optimizing with hexa :P
|
2018-11-15 23:51:31 +01:00 |
|
Richard Patel
|
ac0b8d2d0b
|
Blacklist all paths with a query parameter
|
2018-11-15 23:36:41 +01:00 |
|
Richard Patel
|
ffde1a9e5d
|
Timeout and results saving
|
2018-11-15 20:14:31 +01:00 |
|
Richard Patel
|
4c071171eb
|
Exclude dups in dir instead of keeping hashes of links
|
2018-11-11 23:11:30 +01:00 |
|
Richard Patel
|
9c8174dd8d
|
Fix header parsing
|
2018-11-11 18:53:17 +01:00 |
|
Richard Patel
|
a8c27b2d21
|
Hash links
|
2018-11-06 02:01:53 +01:00 |
|
Richard Patel
|
ed5e35f005
|
Performance improvements
|
2018-11-06 00:34:22 +01:00 |
|
Richard Patel
|
77cb45dbec
|
Detect directory symlinks
|
2018-10-28 18:37:18 +01:00 |
|
Richard Patel
|
bfd7302be8
|
Add urfave/cli app
|
2018-10-28 17:59:46 +01:00 |
|
Richard Patel
|
b1c40767e0
|
Remember scanned URLs
|
2018-10-28 17:07:30 +01:00 |
|
Richard Patel
|
ddfdce9d0f
|
Refactor a bit
|
2018-10-28 13:43:45 +01:00 |
|
Richard Patel
|
79f540bf29
|
Scheduler
|
2018-10-28 02:40:12 +02:00 |
|
Richard Patel
|
3fb4d4bde9
|
More logs
|
2018-10-27 17:25:32 +02:00 |
|
Richard Patel
|
76c8c13d49
|
Use finite state machine
|
2018-10-27 16:55:00 +02:00 |
|
Richard Patel
|
442a2cf8a7
|
Compare finite state machine and Regex
|
2018-10-27 16:53:45 +02:00 |
|
Richard Patel
|
9e090d109d
|
Header state machine
|
2018-10-27 16:29:10 +02:00 |
|
Richard Patel
|
d748be72cd
|
File HEAD requests
|
2018-10-27 16:22:01 +02:00 |
|
Richard Patel
|
2844d344ec
|
Working listing
|
2018-10-27 15:00:20 +02:00 |
|
Richard Patel
|
f2d2b620fa
|
Simple queue crawler
|
2018-10-27 04:08:32 +02:00 |
|