Commit Graph

  • 24ee6fcba2 Quickfix: Revert FTP give back Richard Patel 2018-11-17 12:43:30 +01:00
  • bfb18d62b2 mini fix Richard Patel 2018-11-17 05:27:09 +01:00
  • f4054441ab Return FTP tasks Richard Patel 2018-11-17 05:07:52 +01:00
  • f8d2bf386d Fix FTP error ignore Richard Patel 2018-11-17 04:54:29 +01:00
  • f41198b00c Ignore FTP URLs Richard Patel 2018-11-17 04:50:59 +01:00
  • 7fdffff58f Update config.yml Richard Patel 2018-11-17 04:19:04 +01:00
  • d596882b40 Fix ton of bugs Richard Patel 2018-11-17 04:18:22 +01:00
  • 0fe97a8058 Update README.md Richard Patel 2018-11-17 01:36:07 +01:00
  • 718f9d7fbc Rename project Richard Patel 2018-11-17 01:33:15 +01:00
  • f1687679ab Unescape results & don't recrawl 404 Richard Patel 2018-11-17 01:21:20 +01:00
  • 145d37f84a Fix wait, add back crawl command Richard Patel 2018-11-17 00:49:09 +01:00
  • cc777bcaeb redblackhash: Use bytes.Compare Richard Patel 2018-11-16 21:17:39 +01:00
  • 1e78cea7e7 Saved path should not contain file name Simon 2018-11-16 13:58:12 -05:00
  • 3f85cf679b Getting tasks Richard Patel 2018-11-16 04:47:08 +01:00
  • 8f6f8fd17f fasthttp uri fasthttpuri Richard Patel 2018-11-16 04:10:45 +01:00
  • 3c39f0d621 Random hacks Richard Patel 2018-11-16 03:22:51 +01:00
  • 50952791c5 Almost done Richard Patel 2018-11-16 03:12:26 +01:00
  • 30bf98ad34 Fix tests Richard Patel 2018-11-16 03:02:10 +01:00
  • ccaf758e90 Remove URL.Opaque Richard Patel 2018-11-16 01:53:16 +01:00
  • f668365edb Add tests Richard Patel 2018-11-16 01:51:34 +01:00
  • 1db8ff43bb Bump version Richard Patel 2018-11-16 00:25:11 +01:00
  • 82234f949e Less tokenizer allocations Richard Patel 2018-11-16 00:22:40 +01:00
  • 084b3a5903 Optimizing with hexa :P Richard Patel 2018-11-15 23:51:31 +01:00
  • ac0b8d2d0b Blacklist all paths with a query parameter Richard Patel 2018-11-15 23:36:41 +01:00
  • ffde1a9e5d Timeout and results saving Richard Patel 2018-11-15 20:14:31 +01:00
  • a268c6dbcf Reduce WaitQueue usage Richard Patel 2018-11-12 00:38:22 +01:00
  • 4c071171eb Exclude dups in dir instead of keeping hashes of links Richard Patel 2018-11-11 23:11:30 +01:00
  • 9c8174dd8d Fix header parsing Richard Patel 2018-11-11 18:53:17 +01:00
  • 93272e1da1 Update README.md Richard Patel 2018-11-06 02:41:20 +01:00
  • 0344a120ff fasturl: Remove path escape Richard Patel 2018-11-06 02:15:09 +01:00
  • 6e6afd771e fasturl: Remove query Richard Patel 2018-11-06 02:11:22 +01:00
  • a8c27b2d21 Hash links Richard Patel 2018-11-06 02:01:53 +01:00
  • 8cfada7904 kill perf rip Richard Patel 2018-11-06 01:44:09 +01:00
  • ed5e35f005 Performance improvements Richard Patel 2018-11-06 00:34:22 +01:00
  • a12bca01c8 fasturl: Discard UserInfo Richard Patel 2018-11-06 00:33:57 +01:00
  • ba9c818461 fasturl: Don't parse username and password Richard Patel 2018-11-06 00:28:42 +01:00
  • 9cf31b1d81 fasturl: Remove fragment Richard Patel 2018-11-06 00:17:10 +01:00
  • ed0d9c681f fasturl: Replace scheme with enum Richard Patel 2018-11-06 00:15:12 +01:00
  • 2d72ff3402 Micro optimization pt. 2 hexa Simon 2018-11-05 17:38:33 -05:00
  • 1b5e6bb7f4 Micro optimization Simon 2018-11-05 17:08:54 -05:00
  • b88d45fc21 fasturl: Remove allocs from Parse Richard Patel 2018-11-05 23:05:21 +01:00
  • 4989adff9f Add net/url package Richard Patel 2018-11-05 22:57:57 +01:00
  • add6581804 Add resource stats logging Richard Patel 2018-11-05 22:41:17 +01:00
  • 395a6f30b2 Fix pprof Richard Patel 2018-11-05 21:55:07 +01:00
  • a4e53053b9 Add LICENSE Richard Patel 2018-11-05 21:42:59 +01:00
  • e39565377e Add pprof debug server Richard Patel 2018-11-05 21:39:15 +01:00
  • 77cb45dbec Detect directory symlinks Richard Patel 2018-10-28 18:37:18 +01:00
  • fa37d45378 Remove too many crawler block Richard Patel 2018-10-28 18:17:04 +01:00
  • bfd7302be8 Add urfave/cli app Richard Patel 2018-10-28 17:59:46 +01:00
  • b1c40767e0 Remember scanned URLs Richard Patel 2018-10-28 17:07:30 +01:00
  • c196b6f20d Better config Richard Patel 2018-10-28 14:19:09 +01:00
  • ddfdce9d0f Refactor a bit Richard Patel 2018-10-28 13:43:45 +01:00
  • 7c4ed9d41e Remove WIP disclaimer Richard Patel 2018-10-28 03:48:33 +01:00
  • ab5874129f Don't retry on 401/403 Richard Patel 2018-10-28 03:47:29 +01:00
  • faad19f121 more stuff Richard Patel 2018-10-28 03:41:16 +01:00
  • 4ea5f8a410 Handle HTTP statuses Richard Patel 2018-10-28 03:22:25 +01:00
  • 1c33346f45 Fix crawl descent Richard Patel 2018-10-28 03:06:18 +01:00
  • a507110787 Add stats interval parameter Richard Patel 2018-10-28 02:47:20 +02:00
  • 79f540bf29 Scheduler Richard Patel 2018-10-28 02:40:12 +02:00
  • 5ac9fc10a1 Merge branch 'rewrite' Richard Patel 2018-10-27 17:27:52 +02:00
  • 941899e304 Merge branch 'queue' Richard Patel 2018-10-27 17:27:36 +02:00
  • 3fb4d4bde9 More logs Richard Patel 2018-10-27 17:25:32 +02:00
  • 76c8c13d49 Use finite state machine Richard Patel 2018-10-27 16:55:00 +02:00
  • 442a2cf8a7 Compare finite state machine and Regex Richard Patel 2018-10-27 16:53:45 +02:00
  • 9e090d109d Header state machine Richard Patel 2018-10-27 16:29:10 +02:00
  • d748be72cd File HEAD requests Richard Patel 2018-10-27 16:22:01 +02:00
  • 2844d344ec Working listing Richard Patel 2018-10-27 15:00:20 +02:00
  • 6b73cf75b8 Add .gitignore Richard Patel 2018-10-27 04:13:05 +02:00
  • 7b98db9e78 WIP disclaimer Richard Patel 2018-10-27 04:12:33 +02:00
  • abf069f946 Bits of ODDB API Richard Patel 2018-10-27 04:10:08 +02:00
  • f2d2b620fa Simple queue crawler Richard Patel 2018-10-27 04:08:32 +02:00
  • dc816146cc Initial commit Richard Patel 2018-10-27 04:07:42 +02:00