Commit Graph

  • 3fc8837dd7
    Add output files to .gitignore Richard Patel 2018-11-20 03:51:42 +01:00
  • f9a0d6bffe
    Bump to v1.1.0 v1.1.0 Richard Patel 2018-11-20 03:46:36 +01:00
  • 4dbe2aef2b
    Add job buffer size parameter Richard Patel 2018-11-20 03:42:32 +01:00
  • 86ec78cae1
    Add TCP timeout option Richard Patel 2018-11-20 03:29:10 +01:00
  • b846498030
    Delete URL queues after crawling Richard Patel 2018-11-20 03:05:43 +01:00
  • 4f3140a39f
    Fix queue_count in log Richard Patel 2018-11-20 02:49:03 +01:00
  • 85d2aac9d4
    Performance patch Richard Patel 2018-11-20 02:33:50 +01:00
  • b6c0a45900
    Job queue disk offloading Richard Patel 2018-11-20 02:03:10 +01:00
  • d332f06659
    Limit retries to 10 Richard Patel 2018-11-18 21:05:26 +01:00
  • 1625d6c888
    Bump to v1.0.2 v1.0.2 Richard Patel 2018-11-18 18:53:57 +01:00
  • 03a487f393
    Fix crawl loop Richard Patel 2018-11-18 18:45:06 +01:00
  • ac8221b109
    Retry /task/upload Richard Patel 2018-11-18 18:33:26 +01:00
  • 8ed2cf3b93
    Bump to v1.0.1 v1.0.1 Richard Patel 2018-11-18 14:49:07 +01:00
  • f3620262fc
    Add log file support Richard Patel 2018-11-18 14:46:52 +01:00
  • dc4e4212a0
    Add freebsd to release.sh Richard Patel 2018-11-18 14:38:18 +01:00
  • 6e6a4edd27
    Ignore all HTTP errors Richard Patel 2018-11-18 14:25:06 +01:00
  • a71157b4d8
    Add User-Agent parameter Richard Patel 2018-11-18 14:24:04 +01:00
  • 6dbec8c789
    Add release script v1.0 Richard Patel 2018-11-18 02:36:22 +01:00
  • 605f6db5a5
    Don't call /task/upload for websites with no results Richard Patel 2018-11-18 01:42:57 +01:00
  • d593ba2d0b
    Bump to 1.0 Richard Patel 2018-11-18 00:54:58 +01:00
  • 6793086c22
    Ignore HTTPS errors Richard Patel 2018-11-18 00:37:30 +01:00
  • 4464f34779
    Add recheck and timeout parameters Richard Patel 2018-11-18 00:29:29 +01:00
  • 339175220d
    Refactor uploading & chunk size parameter Richard Patel 2018-11-18 00:15:08 +01:00
  • 1e6687c519
    Upload result ignoring errors Richard Patel 2018-11-17 15:04:20 +01:00
  • 8060556089
    Fix: make crawled dir Richard Patel 2018-11-17 13:36:35 +01:00
  • 73ba848e17
    Grammar Richard Patel 2018-11-17 13:35:29 +01:00
  • 115983f70e
    Silent HTTP errors Richard Patel 2018-11-17 13:18:08 +01:00
  • 9210996b4c
    Fix multiple part file upload Richard Patel 2018-11-17 12:51:30 +01:00
  • 7b29da9340
    Fix file uploads Richard Patel 2018-11-17 12:47:16 +01:00
  • 24ee6fcba2
    Quickfix: Revert FTP give back Richard Patel 2018-11-17 12:43:30 +01:00
  • bfb18d62b2
    mini fix Richard Patel 2018-11-17 05:27:09 +01:00
  • f4054441ab
    Return FTP tasks Richard Patel 2018-11-17 05:07:52 +01:00
  • f8d2bf386d
    Fix FTP error ignore Richard Patel 2018-11-17 04:54:29 +01:00
  • f41198b00c
    Ignore FTP URLs Richard Patel 2018-11-17 04:50:59 +01:00
  • 7fdffff58f
    Update config.yml Richard Patel 2018-11-17 04:19:04 +01:00
  • d596882b40
    Fix ton of bugs Richard Patel 2018-11-17 04:18:22 +01:00
  • 0fe97a8058
    Update README.md Richard Patel 2018-11-17 01:36:07 +01:00
  • 718f9d7fbc
    Rename project Richard Patel 2018-11-17 01:33:15 +01:00
  • f1687679ab
    Unescape results & don't recrawl 404 Richard Patel 2018-11-17 01:21:20 +01:00
  • 145d37f84a
    Fix wait, add back crawl command Richard Patel 2018-11-17 00:49:09 +01:00
  • cc777bcaeb
    redblackhash: Use bytes.Compare Richard Patel 2018-11-16 21:17:39 +01:00
  • 1e78cea7e7 Saved path should not contain file name Simon 2018-11-16 13:58:12 -05:00
  • 3f85cf679b
    Getting tasks Richard Patel 2018-11-16 04:47:08 +01:00
  • 8f6f8fd17f
    fasthttp uri fasthttpuri Richard Patel 2018-11-16 04:10:45 +01:00
  • 3c39f0d621
    Random hacks Richard Patel 2018-11-16 03:22:51 +01:00
  • 50952791c5
    Almost done Richard Patel 2018-11-16 03:12:26 +01:00
  • 30bf98ad34
    Fix tests Richard Patel 2018-11-16 03:02:10 +01:00
  • ccaf758e90
    Remove URL.Opaque Richard Patel 2018-11-16 01:53:16 +01:00
  • f668365edb
    Add tests Richard Patel 2018-11-16 01:51:34 +01:00
  • 1db8ff43bb
    Bump version Richard Patel 2018-11-16 00:25:11 +01:00
  • 82234f949e
    Less tokenizer allocations Richard Patel 2018-11-16 00:22:40 +01:00
  • 084b3a5903
    Optimizing with hexa :P Richard Patel 2018-11-15 23:51:31 +01:00
  • ac0b8d2d0b
    Blacklist all paths with a query parameter Richard Patel 2018-11-15 23:36:41 +01:00
  • ffde1a9e5d
    Timeout and results saving Richard Patel 2018-11-15 20:14:31 +01:00
  • a268c6dbcf
    Reduce WaitQueue usage Richard Patel 2018-11-12 00:38:22 +01:00
  • 4c071171eb
    Exclude dups in dir instead of keeping hashes of links Richard Patel 2018-11-11 23:11:30 +01:00
  • 9c8174dd8d
    Fix header parsing Richard Patel 2018-11-11 18:53:17 +01:00
  • 93272e1da1
    Update README.md Richard Patel 2018-11-06 02:41:20 +01:00
  • 0344a120ff
    fasturl: Remove path escape Richard Patel 2018-11-06 02:15:09 +01:00
  • 6e6afd771e
    fasturl: Remove query Richard Patel 2018-11-06 02:11:22 +01:00
  • a8c27b2d21
    Hash links Richard Patel 2018-11-06 02:01:53 +01:00
  • 8cfada7904
    kill perf rip Richard Patel 2018-11-06 01:44:09 +01:00
  • ed5e35f005
    Performance improvements Richard Patel 2018-11-06 00:34:22 +01:00
  • a12bca01c8
    fasturl: Discard UserInfo Richard Patel 2018-11-06 00:33:57 +01:00
  • ba9c818461
    fasturl: Don't parse username and password Richard Patel 2018-11-06 00:28:42 +01:00
  • 9cf31b1d81
    fasturl: Remove fragment Richard Patel 2018-11-06 00:17:10 +01:00
  • ed0d9c681f
    fasturl: Replace scheme with enum Richard Patel 2018-11-06 00:15:12 +01:00
  • 2d72ff3402 Micro optimization pt. 2 hexa Simon 2018-11-05 17:38:33 -05:00
  • 1b5e6bb7f4 Micro optimization Simon 2018-11-05 17:08:54 -05:00
  • b88d45fc21
    fasturl: Remove allocs from Parse Richard Patel 2018-11-05 23:05:21 +01:00
  • 4989adff9f
    Add net/url package Richard Patel 2018-11-05 22:57:57 +01:00
  • add6581804
    Add resource stats logging Richard Patel 2018-11-05 22:41:17 +01:00
  • 395a6f30b2
    Fix pprof Richard Patel 2018-11-05 21:55:07 +01:00
  • a4e53053b9
    Add LICENSE Richard Patel 2018-11-05 21:42:59 +01:00
  • e39565377e
    Add pprof debug server Richard Patel 2018-11-05 21:39:15 +01:00
  • 77cb45dbec
    Detect directory symlinks Richard Patel 2018-10-28 18:37:18 +01:00
  • fa37d45378
    Remove too many crawler block Richard Patel 2018-10-28 18:17:04 +01:00
  • bfd7302be8
    Add urfave/cli app Richard Patel 2018-10-28 17:59:46 +01:00
  • b1c40767e0
    Remember scanned URLs Richard Patel 2018-10-28 17:07:30 +01:00
  • c196b6f20d
    Better config Richard Patel 2018-10-28 14:19:09 +01:00
  • ddfdce9d0f
    Refactor a bit Richard Patel 2018-10-28 13:43:45 +01:00
  • 7c4ed9d41e
    Remove WIP disclaimer Richard Patel 2018-10-28 03:48:33 +01:00
  • ab5874129f
    Don't retry on 401/403 Richard Patel 2018-10-28 03:47:29 +01:00
  • faad19f121
    more stuff Richard Patel 2018-10-28 03:41:16 +01:00
  • 4ea5f8a410
    Handle HTTP statuses Richard Patel 2018-10-28 03:22:25 +01:00
  • 1c33346f45
    Fix crawl descent Richard Patel 2018-10-28 03:06:18 +01:00
  • a507110787
    Add stats interval parameter Richard Patel 2018-10-28 02:47:20 +02:00
  • 79f540bf29
    Scheduler Richard Patel 2018-10-28 02:40:12 +02:00
  • 5ac9fc10a1
    Merge branch 'rewrite' Richard Patel 2018-10-27 17:27:52 +02:00
  • 941899e304
    Merge branch 'queue' Richard Patel 2018-10-27 17:27:36 +02:00
  • 3fb4d4bde9
    More logs Richard Patel 2018-10-27 17:25:32 +02:00
  • 76c8c13d49
    Use finite state machine Richard Patel 2018-10-27 16:55:00 +02:00
  • 442a2cf8a7
    Compare finite state machine and Regex Richard Patel 2018-10-27 16:53:45 +02:00
  • 9e090d109d
    Header state machine Richard Patel 2018-10-27 16:29:10 +02:00
  • d748be72cd
    File HEAD requests Richard Patel 2018-10-27 16:22:01 +02:00
  • 2844d344ec
    Working listing Richard Patel 2018-10-27 15:00:20 +02:00
  • 6b73cf75b8
    Add .gitignore Richard Patel 2018-10-27 04:13:05 +02:00
  • 7b98db9e78
    WIP disclaimer Richard Patel 2018-10-27 04:12:33 +02:00
  • abf069f946
    Bits of ODDB API Richard Patel 2018-10-27 04:10:08 +02:00
  • f2d2b620fa
    Simple queue crawler Richard Patel 2018-10-27 04:08:32 +02:00