|
b1a1da3bac
|
Add option to use nltk word_tokenize
|
2023-09-09 11:11:44 -04:00 |
|
|
4befc3973d
|
Add strip_dashes option in preprocess()
|
2022-02-26 19:31:22 -05:00 |
|
|
55fd4a66d2
|
Fix strip_quotes
|
2021-11-16 11:48:23 -05:00 |
|
|
d19442b00e
|
Update preprocess: now returns generator objects
|
2021-09-19 09:35:35 -04:00 |
|
|
4711cd1b66
|
Add trigrams
|
2021-09-10 17:35:19 -04:00 |
|
|
7e0ffafb8c
|
Update fix_single_quotes
|
2021-08-28 20:48:00 -04:00 |
|
|
67c09cc10c
|
Add remove_numbers
|
2021-08-28 20:06:53 -04:00 |
|
|
a7bf5b2d15
|
Fix clean html (again!)
|
2021-08-28 19:59:04 -04:00 |
|
|
31b35e3a32
|
Fix clean html (again)
|
2021-08-28 19:44:10 -04:00 |
|
|
4d6c8018df
|
Fix clean_html
|
2021-08-28 19:33:11 -04:00 |
|
|
75bf2c2d85
|
Rename test.clean to text.preprocess, add QS util func, more debug logging
|
2021-04-25 12:10:03 -04:00 |
|
|
88f3124f85
|
add bigram option for clean function
|
2021-04-21 21:34:49 -04:00 |
|
|
32119535ae
|
improve text cleaning
|
2021-04-18 21:27:12 -04:00 |
|
|
2ffaa4a5b3
|
improve text cleaning
|
2021-04-18 21:10:07 -04:00 |
|
|
45b5803c40
|
improve text cleaning
|
2021-04-18 15:40:30 -04:00 |
|
|
765f6f59b7
|
Add text cleaning function
|
2021-04-18 12:12:31 -04:00 |
|