|
78c04ef6f3
|
Add option to override Table factory in PersistentState
|
2022-05-05 15:02:48 -04:00 |
|
|
a51ad2cbb4
|
Cleanup
|
2022-05-03 10:59:25 -04:00 |
|
|
4befc3973d
|
Add strip_dashes option in preprocess()
|
2022-02-26 19:31:22 -05:00 |
|
|
c9fac7151a
|
Split punctuation into punctuation and special_punctuation
|
2022-02-23 11:01:17 -05:00 |
|
|
084acbe184
|
Set max_window_size=2147483648 for zstd
|
2022-01-29 10:44:38 -05:00 |
|
|
d578be3218
|
Increase timeout
|
2022-01-29 10:38:23 -05:00 |
|
|
cd5a1ac50c
|
Remove clean_multicore function
|
2022-01-28 20:11:26 -05:00 |
|
|
62e74ed292
|
Make sure that _id field is present in redis MQ
|
2022-01-27 11:06:52 -05:00 |
|
|
428c82bcfd
|
Update retry codes
|
2021-12-09 19:46:12 -05:00 |
|
|
4b3583358b
|
Update retry codes
|
2021-12-09 19:44:21 -05:00 |
|
|
90d434ec73
|
Add more single quotes
|
2021-11-16 15:32:24 -05:00 |
|
|
55fd4a66d2
|
Fix strip_quotes
|
2021-11-16 11:48:23 -05:00 |
|
|
3677815d57
|
Add more quotes in strip_quotes
|
2021-11-16 11:39:10 -05:00 |
|
|
1ce795a759
|
...
|
2021-11-16 11:36:17 -05:00 |
|
|
e1537297d7
|
normalize dashes in preprocess
|
2021-11-16 11:34:48 -05:00 |
|
|
8d8f9e8751
|
Fix typo, add stateless stream processor
|
2021-11-04 13:23:23 -04:00 |
|
|
18ba0024ea
|
Null check on logger
|
2021-11-03 16:47:21 -04:00 |
|
|
408735a926
|
Fix git clone
|
2021-11-02 14:38:12 -04:00 |
|
|
2f6c2822b6
|
Add functions to handle routing keys
|
2021-10-22 10:36:05 -04:00 |
|
|
d85ad919b3
|
Add redis MQ, update influxdb monitoring
|
2021-10-21 20:19:09 -04:00 |
|
|
ed9d148411
|
Fix tests
|
2021-10-21 19:52:42 -04:00 |
|
|
5e00ddccdb
|
Add test file
|
2021-10-07 15:40:23 -04:00 |
|
|
7ecd55a1c6
|
Add cookiejar_filter_name
|
2021-10-03 11:02:39 -04:00 |
|
|
b746a91281
|
Fix default value for retry codes
|
2021-10-03 10:43:19 -04:00 |
|
|
333083e8b9
|
Add 520 in default retry codes
|
2021-10-03 10:33:49 -04:00 |
|
|
c295b5d30b
|
Add Web.post
|
2021-09-28 15:08:51 -04:00 |
|
|
da0e117550
|
Fix fake-useragent (for real time time?)
|
2021-09-25 16:03:45 -04:00 |
|
|
c3fef7e7f8
|
Fix fake-useragent
|
2021-09-25 15:53:52 -04:00 |
|
|
9bd1f4b799
|
unbreak statefulstreamworker
|
2021-09-23 19:14:11 -04:00 |
|
|
c560cc2010
|
tweak StatefulStreamWorker interface
|
2021-09-19 14:19:17 -04:00 |
|
|
f4a5e6cf53
|
queue_iter fix
|
2021-09-19 12:44:49 -04:00 |
|
|
71cd00c063
|
Add StatfulStreamProcessor
|
2021-09-19 12:39:57 -04:00 |
|
|
7349c9a5f1
|
Quick optimisation
|
2021-09-19 10:57:07 -04:00 |
|
|
d19442b00e
|
Update preprocess: now returns generator objects
|
2021-09-19 09:35:35 -04:00 |
|
|
4711cd1b66
|
Add trigrams
|
2021-09-10 17:35:19 -04:00 |
|
|
7e0ffafb8c
|
Update fix_single_quotes
|
2021-08-28 20:48:00 -04:00 |
|
|
60273fb6bd
|
Use imap instead of map
|
2021-08-28 20:09:00 -04:00 |
|
|
67c09cc10c
|
Add remove_numbers
|
2021-08-28 20:06:53 -04:00 |
|
|
a7bf5b2d15
|
Fix clean html (again!)
|
2021-08-28 19:59:04 -04:00 |
|
|
31b35e3a32
|
Fix clean html (again)
|
2021-08-28 19:44:10 -04:00 |
|
|
4cff343370
|
version bump
|
2021-08-28 19:34:09 -04:00 |
|
|
4d6c8018df
|
Fix clean_html
|
2021-08-28 19:33:11 -04:00 |
|
|
db3e191983
|
Add plot_confusion_matrix
|
2021-06-29 14:13:07 -04:00 |
|
|
33e9734991
|
Add plot_freq_bar
|
2021-06-23 19:37:22 -04:00 |
|
|
3238f92e4d
|
Revert "get_soup() decode utf8"
This reverts commit f8e93354
|
2021-05-14 10:09:07 -04:00 |
|
|
f8e93354a4
|
get_soup() decode utf8
|
2021-05-14 09:59:59 -04:00 |
|
|
75bf2c2d85
|
Rename test.clean to text.preprocess, add QS util func, more debug logging
|
2021-04-25 12:10:03 -04:00 |
|
|
9002ae7506
|
Add debug info
|
2021-04-24 10:21:54 -04:00 |
|
|
88f3124f85
|
add bigram option for clean function
|
2021-04-21 21:34:49 -04:00 |
|
|
8edad0255b
|
add retries arg in get_web()
|
2021-04-21 19:50:59 -04:00 |
|