64 Commits

Author SHA1 Message Date
ed9d148411 Fix tests 2021-10-21 19:52:42 -04:00
7ecd55a1c6 Add cookiejar_filter_name 2021-10-03 11:02:39 -04:00
b746a91281 Fix default value for retry codes 2021-10-03 10:43:19 -04:00
333083e8b9 Add 520 in default retry codes 2021-10-03 10:33:49 -04:00
c295b5d30b Add Web.post 2021-09-28 15:08:51 -04:00
da0e117550 Fix fake-useragent (for real time time?) 2021-09-25 16:03:45 -04:00
c3fef7e7f8 Fix fake-useragent 2021-09-25 15:53:52 -04:00
9bd1f4b799 unbreak statefulstreamworker 2021-09-23 19:14:11 -04:00
c560cc2010 tweak StatefulStreamWorker interface 2021-09-19 14:19:17 -04:00
f4a5e6cf53 queue_iter fix 2021-09-19 12:44:49 -04:00
71cd00c063 Add StatfulStreamProcessor 2021-09-19 12:39:57 -04:00
7349c9a5f1 Quick optimisation 2021-09-19 10:57:07 -04:00
d19442b00e Update preprocess: now returns generator objects 2021-09-19 09:35:35 -04:00
4711cd1b66 Add trigrams 2021-09-10 17:35:19 -04:00
7e0ffafb8c Update fix_single_quotes 2021-08-28 20:48:00 -04:00
67c09cc10c Add remove_numbers 2021-08-28 20:06:53 -04:00
a7bf5b2d15 Fix clean html (again!) 2021-08-28 19:59:04 -04:00
31b35e3a32 Fix clean html (again) 2021-08-28 19:44:10 -04:00
4cff343370 version bump 2021-08-28 19:34:09 -04:00
db3e191983 Add plot_confusion_matrix 2021-06-29 14:13:07 -04:00
33e9734991 Add plot_freq_bar 2021-06-23 19:37:22 -04:00
75bf2c2d85 Rename test.clean to text.preprocess, add QS util func, more debug logging 2021-04-25 12:10:03 -04:00
8edad0255b add retries arg in get_web() 2021-04-21 19:50:59 -04:00
ae59522b27 Add customstderr 2021-04-18 12:17:00 -04:00
765f6f59b7 Add text cleaning function 2021-04-18 12:12:31 -04:00
30902c8235 Add sep option in volatile state 2021-04-16 19:10:45 -04:00
6e1aa53455 Add retry_sleep to retry 2021-04-06 21:23:28 -04:00
53ac0c37e8 Fix redis_publish 2021-04-06 21:08:43 -04:00
8378ed6526 Add session arg to get_web 2021-04-06 20:32:34 -04:00
c79b3bfafd Add get_soup() 2021-04-05 19:17:27 -04:00
00f5aef721 Update redis_publish to add subproject 2021-03-28 09:42:52 -04:00
021da84433 add redis_publish 2021-03-25 18:25:46 -04:00
53a03baaa4 Add useragent option in Web 2021-03-07 11:02:26 -05:00
66d37e0be2 Add way to manually flush @buffered 2021-02-28 12:15:44 -05:00
9cadce62ac add Web helper & logger 2021-02-25 21:26:27 -05:00
7d330a0f9f Add pgsql wrapper & delitem for persistent state 2021-02-06 15:41:18 -05:00
a2cfab55bc msgpack for queue 2021-01-20 20:30:57 -05:00
c4fca1b754 Add volatile queue 2021-01-20 20:08:35 -05:00
d615ebdbd9 add custum SQL in persistant state 2021-01-17 10:05:39 -05:00
f914759b71 Fix @retry 2021-01-10 22:46:20 -05:00
58d150279f Handle bool values in state 2021-01-10 21:26:34 -05:00
b2efaa99a4 Handle null values in state 2021-01-10 20:56:46 -05:00
b845d96295 Fix update in persistentstate 2021-01-09 19:57:31 -05:00
89b21884b7 Use hash in volatile state 2020-12-20 20:15:17 -05:00
30c9494daa add download_file, bool volatile state 2020-12-20 19:53:38 -05:00
b0de37d4f9 Fix pg cursor 2020-08-20 21:10:31 -04:00
266f8642fb Fix VolatileState 2020-08-12 18:11:46 -04:00
fe955668eb Add chunks() 2020-08-04 21:52:14 -04:00
52ad2d22b9 Switch to orjson, add ndjson_iter 2020-08-04 21:40:56 -04:00
30854c7f8b Add random_word & random_phrase 2020-08-04 18:45:00 -04:00