119 Commits

Author SHA1 Message Date
5275c332cc Add drop table 2023-02-25 15:38:40 -05:00
a7b1a6e1ec Fix tests, add pydantic row support for PersistentState 2023-02-25 15:20:17 -05:00
826312115c Fix deserialization in PersistentState again 2022-05-07 09:41:10 -04:00
372abb0076 Fix deserialization in PersistentState 2022-05-07 09:34:50 -04:00
78c04ef6f3 Add option to override Table factory in PersistentState 2022-05-05 15:02:48 -04:00
a51ad2cbb4 Cleanup 2022-05-03 10:59:25 -04:00
4befc3973d Add strip_dashes option in preprocess() 2022-02-26 19:31:22 -05:00
c9fac7151a Split punctuation into punctuation and special_punctuation 2022-02-23 11:01:17 -05:00
084acbe184 Set max_window_size=2147483648 for zstd 2022-01-29 10:44:38 -05:00
d578be3218 Increase timeout 2022-01-29 10:38:23 -05:00
cd5a1ac50c Remove clean_multicore function 2022-01-28 20:11:26 -05:00
62e74ed292 Make sure that _id field is present in redis MQ 2022-01-27 11:06:52 -05:00
428c82bcfd Update retry codes 2021-12-09 19:46:12 -05:00
4b3583358b Update retry codes 2021-12-09 19:44:21 -05:00
90d434ec73 Add more single quotes 2021-11-16 15:32:24 -05:00
55fd4a66d2 Fix strip_quotes 2021-11-16 11:48:23 -05:00
3677815d57 Add more quotes in strip_quotes 2021-11-16 11:39:10 -05:00
1ce795a759 ... 2021-11-16 11:36:17 -05:00
e1537297d7 normalize dashes in preprocess 2021-11-16 11:34:48 -05:00
8d8f9e8751 Fix typo, add stateless stream processor 2021-11-04 13:23:23 -04:00
18ba0024ea Null check on logger 2021-11-03 16:47:21 -04:00
408735a926 Fix git clone 2021-11-02 14:38:12 -04:00
2f6c2822b6 Add functions to handle routing keys 2021-10-22 10:36:05 -04:00
d85ad919b3 Add redis MQ, update influxdb monitoring 2021-10-21 20:19:09 -04:00
ed9d148411 Fix tests 2021-10-21 19:52:42 -04:00
5e00ddccdb Add test file 2021-10-07 15:40:23 -04:00
7ecd55a1c6 Add cookiejar_filter_name 2021-10-03 11:02:39 -04:00
b746a91281 Fix default value for retry codes 2021-10-03 10:43:19 -04:00
333083e8b9 Add 520 in default retry codes 2021-10-03 10:33:49 -04:00
c295b5d30b Add Web.post 2021-09-28 15:08:51 -04:00
da0e117550 Fix fake-useragent (for real time time?) 2021-09-25 16:03:45 -04:00
c3fef7e7f8 Fix fake-useragent 2021-09-25 15:53:52 -04:00
9bd1f4b799 unbreak statefulstreamworker 2021-09-23 19:14:11 -04:00
c560cc2010 tweak StatefulStreamWorker interface 2021-09-19 14:19:17 -04:00
f4a5e6cf53 queue_iter fix 2021-09-19 12:44:49 -04:00
71cd00c063 Add StatfulStreamProcessor 2021-09-19 12:39:57 -04:00
7349c9a5f1 Quick optimisation 2021-09-19 10:57:07 -04:00
d19442b00e Update preprocess: now returns generator objects 2021-09-19 09:35:35 -04:00
4711cd1b66 Add trigrams 2021-09-10 17:35:19 -04:00
7e0ffafb8c Update fix_single_quotes 2021-08-28 20:48:00 -04:00
60273fb6bd Use imap instead of map 2021-08-28 20:09:00 -04:00
67c09cc10c Add remove_numbers 2021-08-28 20:06:53 -04:00
a7bf5b2d15 Fix clean html (again!) 2021-08-28 19:59:04 -04:00
31b35e3a32 Fix clean html (again) 2021-08-28 19:44:10 -04:00
4cff343370 version bump 2021-08-28 19:34:09 -04:00
4d6c8018df Fix clean_html 2021-08-28 19:33:11 -04:00
db3e191983 Add plot_confusion_matrix 2021-06-29 14:13:07 -04:00
33e9734991 Add plot_freq_bar 2021-06-23 19:37:22 -04:00
3238f92e4d Revert "get_soup() decode utf8"
This reverts commit f8e93354
2021-05-14 10:09:07 -04:00
f8e93354a4 get_soup() decode utf8 2021-05-14 09:59:59 -04:00