30 Commits

Author SHA1 Message Date
2596361af5 Use mupdf's OCR methods rather than raw tesseract, various fixes 2023-07-10 21:40:58 -04:00
610882112d Use WEBP to encode thumbnails 2023-05-20 13:12:12 -04:00
e2e0cf260f Skip encrypted files when no passphrase is supplied 2023-05-18 20:09:17 -04:00
75b66b5982 Fix #351 2023-04-15 13:06:13 -04:00
300c70883d Fixes and cleanup 2023-04-10 11:04:16 -04:00
fc36f33d52 use sqlite to save index, major thread pool refactor 2023-04-03 21:39:50 -04:00
f8abffba81 process pool mostly works, still WIP 2023-03-09 22:11:21 -05:00
8c662bb8f8 Adjust some structs 2023-02-27 20:44:25 -05:00
fa14efbeb6 Handle zipbomb files 2023-02-22 22:25:21 -05:00
9e0d7bf992 Add test files as submodule, remove support for msword thumbnails 2023-02-02 19:52:37 -05:00
2e3d648796 Update --thumbnail-quality argument, add documentation 2023-01-29 11:24:34 -05:00
b9f008603a OCR fixes 2023-01-13 20:13:20 -05:00
c18557e360 Fix thumbnail copying for incremental index, fix incremental index when there are no new updates, add option for JSON logs output 2022-11-23 20:45:47 -05:00
901035da15 Build libmobi with cmake, update to 0.10 2022-04-15 16:01:40 -04:00
c575fca91d Do not store duration or bitrate when the value is 0 or for images 2022-03-05 21:24:59 -05:00
e9f92330fd Cleanup macros 2022-03-05 11:18:07 -05:00
16a4fb4874 Rework document IDs 2022-03-05 11:18:06 -05:00
499eb2b2e4 Un-break raw file thumbnails 2022-03-05 11:18:05 -05:00
2882741926 Fix multiple content metadata bug (but without compilation error this time) 2022-02-20 10:52:22 -05:00
edba9b7917 Fix multiple content metadata bug 2022-02-20 10:43:34 -05:00
3d4331b27d Add thumbnail-count option 2022-02-19 13:45:31 -05:00
ad95684771 Update --ocr-* args, enable OCR'ing images 2022-01-08 14:24:50 -05:00
b37e5a4ad4 Fix some warnings in media.c 2022-01-08 11:06:14 -05:00
15ae2190cf Fix tesseract lang validation, update README.md, fix tesseract memory leak 2022-01-08 11:04:52 -05:00
255bc2d689 Tweak MIN_OCR_SIZE behavior, update gitignore 2022-01-08 10:33:02 -05:00
cd2a44e016
Update ocr.h
Fix minimum image size validation in ocr_extract_text
2022-01-08 10:24:57 -05:00
Yatao Li
94a5e0ac59 refactor: split ocr_extract_text from ebook 2022-01-07 23:20:35 +08:00
81008d8936 Add --list-file argument 2021-12-29 18:54:13 -05:00
f2fd7ccf41 Fix raw parsing maybe, fix index picker css 2021-12-25 11:08:52 -05:00
a41b5dcc1f Remove libscan git submodule 2021-11-07 09:30:14 -05:00