90 Commits

Author SHA1 Message Date
c295b5d30b Add Web.post 2021-09-28 15:08:51 -04:00
da0e117550 Fix fake-useragent (for real time time?) 2021-09-25 16:03:45 -04:00
c3fef7e7f8 Fix fake-useragent 2021-09-25 15:53:52 -04:00
9bd1f4b799 unbreak statefulstreamworker 2021-09-23 19:14:11 -04:00
c560cc2010 tweak StatefulStreamWorker interface 2021-09-19 14:19:17 -04:00
f4a5e6cf53 queue_iter fix 2021-09-19 12:44:49 -04:00
71cd00c063 Add StatfulStreamProcessor 2021-09-19 12:39:57 -04:00
7349c9a5f1 Quick optimisation 2021-09-19 10:57:07 -04:00
d19442b00e Update preprocess: now returns generator objects 2021-09-19 09:35:35 -04:00
4711cd1b66 Add trigrams 2021-09-10 17:35:19 -04:00
7e0ffafb8c Update fix_single_quotes 2021-08-28 20:48:00 -04:00
60273fb6bd Use imap instead of map 2021-08-28 20:09:00 -04:00
67c09cc10c Add remove_numbers 2021-08-28 20:06:53 -04:00
a7bf5b2d15 Fix clean html (again!) 2021-08-28 19:59:04 -04:00
31b35e3a32 Fix clean html (again) 2021-08-28 19:44:10 -04:00
4cff343370 version bump 2021-08-28 19:34:09 -04:00
4d6c8018df Fix clean_html 2021-08-28 19:33:11 -04:00
db3e191983 Add plot_confusion_matrix 2021-06-29 14:13:07 -04:00
33e9734991 Add plot_freq_bar 2021-06-23 19:37:22 -04:00
3238f92e4d Revert "get_soup() decode utf8"
This reverts commit f8e93354
2021-05-14 10:09:07 -04:00
f8e93354a4 get_soup() decode utf8 2021-05-14 09:59:59 -04:00
75bf2c2d85 Rename test.clean to text.preprocess, add QS util func, more debug logging 2021-04-25 12:10:03 -04:00
9002ae7506 Add debug info 2021-04-24 10:21:54 -04:00
88f3124f85 add bigram option for clean function 2021-04-21 21:34:49 -04:00
8edad0255b add retries arg in get_web() 2021-04-21 19:50:59 -04:00
32119535ae improve text cleaning 2021-04-18 21:27:12 -04:00
2ffaa4a5b3 improve text cleaning 2021-04-18 21:10:07 -04:00
067a20f7a8 improve text cleaning 2021-04-18 20:32:34 -04:00
00323ea576 improve text cleaning 2021-04-18 18:50:39 -04:00
45b5803c40 improve text cleaning 2021-04-18 15:40:30 -04:00
18cd59fc4a ignore log in text 2021-04-18 12:20:22 -04:00
d895ac837e ignore log in text 2021-04-18 12:18:27 -04:00
ae59522b27 Add customstderr 2021-04-18 12:17:00 -04:00
765f6f59b7 Add text cleaning function 2021-04-18 12:12:31 -04:00
30902c8235 Add sep option in volatile state 2021-04-16 19:10:45 -04:00
53a262a138 Add retry_sleep to retry 2021-04-06 21:24:23 -04:00
6e1aa53455 Add retry_sleep to retry 2021-04-06 21:23:28 -04:00
53ac0c37e8 Fix redis_publish 2021-04-06 21:08:43 -04:00
8378ed6526 Add session arg to get_web 2021-04-06 20:32:34 -04:00
c79b3bfafd Add get_soup() 2021-04-05 19:17:27 -04:00
5ee1629c79 oops 2021-03-28 09:53:16 -04:00
00f5aef721 Update redis_publish to add subproject 2021-03-28 09:42:52 -04:00
021da84433 add redis_publish 2021-03-25 18:25:46 -04:00
53a03baaa4 Add useragent option in Web 2021-03-07 11:02:26 -05:00
66d37e0be2 Add way to manually flush @buffered 2021-02-28 12:15:44 -05:00
43cb6c4a7b web retry_codes fix 2021-02-27 12:07:48 -05:00
4278b0f89e web retry_codes fix 2021-02-27 08:58:21 -05:00
9738819428 web logging fix 2021-02-25 22:00:44 -05:00
ce2e5b2af6 load redis from env if not specified 2021-02-25 21:35:18 -05:00
9cadce62ac add Web helper & logger 2021-02-25 21:26:27 -05:00