Update docs & UI stuff

This commit is contained in:
simon987 2022-02-20 09:13:19 -05:00
parent 2a2664a5cd
commit 329afcbe4f
8 changed files with 230 additions and 56 deletions

View File

@ -52,7 +52,7 @@ sist2 (Simple incremental search tool)
Select the file corresponding to your CPU architecture and mark the binary as executable with `chmod +x` * Select the file corresponding to your CPU architecture and mark the binary as executable with `chmod +x` *
2. *(or)* Download a [development snapshot](https://files.simon987.net/.gate/sist2/simon987_sist2/) *(Not 2. *(or)* Download a [development snapshot](https://files.simon987.net/.gate/sist2/simon987_sist2/) *(Not
recommended!)* recommended!)*
3. *(or)* `docker pull simon987/sist2:2.11.6-x64-linux` 3. *(or)* `docker pull simon987/sist2:2.11.7-x64-linux`
1. See [Usage guide](docs/USAGE.md) 1. See [Usage guide](docs/USAGE.md)

View File

@ -13,7 +13,6 @@
* [options](#web-options) * [options](#web-options)
* [examples](#web-examples) * [examples](#web-examples)
* [rewrite_url](#rewrite_url) * [rewrite_url](#rewrite_url)
* [link to specific indices](#link-to-specific-indices)
* [elasticsearch](#elasticsearch) * [elasticsearch](#elasticsearch)
* [exec-script](#exec-script) * [exec-script](#exec-script)
* [tagging](#tagging) * [tagging](#tagging)
@ -26,62 +25,66 @@ Usage: sist2 scan [OPTION]... PATH
or: sist2 exec-script [OPTION]... INDEX or: sist2 exec-script [OPTION]... INDEX
Lightning-fast file system indexer and search tool. Lightning-fast file system indexer and search tool.
-h, --help show this help message and exit -h, --help show this help message and exit
-v, --version Show version and exit -v, --version Show version and exit
--verbose Turn on logging --verbose Turn on logging
--very-verbose Turn on debug messages --very-verbose Turn on debug messages
Scan options Scan options
-t, --threads=<int> Number of threads. DEFAULT=1 -t, --threads=<int> Number of threads. DEFAULT=1
-q, --quality=<flt> Thumbnail quality, on a scale of 1.0 to 31.0, 1.0 being the best. DEFAULT=3 --mem-throttle=<int> Total memory threshold in MiB for scan throttling. DEFAULT=0
--size=<int> Thumbnail size, in pixels. Use negative value to disable. DEFAULT=500 -q, --thumbnail-quality=<flt> Thumbnail quality, on a scale of 1.0 to 31.0, 1.0 being the best. DEFAULT=1
--content-size=<int> Number of bytes to be extracted from text documents. Use negative value to disable. DEFAULT=32768 --thumbnail-size=<int> Thumbnail size, in pixels. DEFAULT=500
--incremental=<str> Reuse an existing index and only scan modified files. --thumbnail-count=<int> Number of thumbnails to generate. Set a value > 1 to create video previews, set to 0 to disable thumbnails. DEFAULT=1
-o, --output=<str> Output directory. DEFAULT=index.sist2/ --content-size=<int> Number of bytes to be extracted from text documents. Set to 0 to disable. DEFAULT=32768
--rewrite-url=<str> Serve files from this url instead of from disk. --incremental=<str> Reuse an existing index and only scan modified files.
--name=<str> Index display name. DEFAULT: (name of the directory) -o, --output=<str> Output directory. DEFAULT=index.sist2/
--depth=<int> Scan up to DEPTH subdirectories deep. Use 0 to only scan files in PATH. DEFAULT: -1 --rewrite-url=<str> Serve files from this url instead of from disk.
--archive=<str> Archive file mode (skip|list|shallow|recurse). skip: Don't parse, list: only get file names as text, shallow: Don't parse archives inside archives. DEFAULT: recurse --name=<str> Index display name. DEFAULT: (name of the directory)
--archive-passphrase=<str> Passphrase for encrypted archive files --depth=<int> Scan up to DEPTH subdirectories deep. Use 0 to only scan files in PATH. DEFAULT: -1
--ocr-lang=<str> Tesseract language (use 'tesseract --list-langs' to see which are installed on your machine) --archive=<str> Archive file mode (skip|list|shallow|recurse). skip: Don't parse, list: only get file names as text, shallow: Don't parse archives inside archives. DEFAULT: recurse
--ocr-images Enable OCR'ing of image files. --archive-passphrase=<str> Passphrase for encrypted archive files
--ocr-ebooks Enable OCR'ing of ebook files. --ocr-lang=<str> Tesseract language (use 'tesseract --list-langs' to see which are installed on your machine)
-e, --exclude=<str> Files that match this regex will not be scanned --ocr-images Enable OCR'ing of image files.
--fast Only index file names & mime type --ocr-ebooks Enable OCR'ing of ebook files.
--treemap-threshold=<str> Relative size threshold for treemap (see USAGE.md). DEFAULT: 0.0005 -e, --exclude=<str> Files that match this regex will not be scanned
--mem-buffer=<int> Maximum memory buffer size per thread in MB for files inside archives (see USAGE.md). DEFAULT: 2000 --fast Only index file names & mime type
--read-subtitles Read subtitles from media files. --treemap-threshold=<str> Relative size threshold for treemap (see USAGE.md). DEFAULT: 0.0005
--fast-epub Faster but less accurate EPUB parsing (no thumbnails, metadata) --mem-buffer=<int> Maximum memory buffer size per thread in MiB for files inside archives (see USAGE.md). DEFAULT: 2000
--checksums Calculate file checksums when scanning. --read-subtitles Read subtitles from media files.
--list-file=<str> Specify a list of newline-delimited paths to be scanned instead of normal directory traversal. Use '-' to read from stdin. --fast-epub Faster but less accurate EPUB parsing (no thumbnails, metadata)
--checksums Calculate file checksums when scanning.
--list-file=<str> Specify a list of newline-delimited paths to be scanned instead of normal directory traversal. Use '-' to read from stdin.
Index options Index options
-t, --threads=<int> Number of threads. DEFAULT=1 -t, --threads=<int> Number of threads. DEFAULT=1
--es-url=<str> Elasticsearch url with port. DEFAULT=http://localhost:9200 --es-url=<str> Elasticsearch url with port. DEFAULT=http://localhost:9200
--es-index=<str> Elasticsearch index name. DEFAULT=sist2 --es-index=<str> Elasticsearch index name. DEFAULT=sist2
-p, --print Just print JSON documents to stdout. -p, --print Just print JSON documents to stdout.
--script-file=<str> Path to user script. --incremental-index Conduct incremental indexing, assumes that the old index is already digested by Elasticsearch.
--mappings-file=<str> Path to Elasticsearch mappings. --script-file=<str> Path to user script.
--settings-file=<str> Path to Elasticsearch settings. --mappings-file=<str> Path to Elasticsearch mappings.
--async-script Execute user script asynchronously. --settings-file=<str> Path to Elasticsearch settings.
--batch-size=<int> Index batch size. DEFAULT: 100 --async-script Execute user script asynchronously.
-f, --force-reset Reset Elasticsearch mappings and settings. (You must use this option the first time you use the index command) --batch-size=<int> Index batch size. DEFAULT: 100
-f, --force-reset Reset Elasticsearch mappings and settings. (You must use this option the first time you use the index command)
Web options Web options
--es-url=<str> Elasticsearch url. DEFAULT=http://localhost:9200 --es-url=<str> Elasticsearch url. DEFAULT=http://localhost:9200
--es-index=<str> Elasticsearch index name. DEFAULT=sist2 --es-index=<str> Elasticsearch index name. DEFAULT=sist2
--bind=<str> Listen on this address. DEFAULT=localhost:4090 --bind=<str> Listen on this address. DEFAULT=localhost:4090
--auth=<str> Basic auth in user:password format --auth=<str> Basic auth in user:password format
--tag-auth=<str> Basic auth in user:password format for tagging --tag-auth=<str> Basic auth in user:password format for tagging
--tagline=<str> Tagline in navbar --tagline=<str> Tagline in navbar
--dev Serve html & js files from disk (for development) --dev Serve html & js files from disk (for development)
--lang=<str> Default UI language. Can be changed by the user --lang=<str> Default UI language. Can be changed by the user
Exec-script options Exec-script options
--es-url=<str> Elasticsearch url. DEFAULT=http://localhost:9200 --es-url=<str> Elasticsearch url. DEFAULT=http://localhost:9200
--es-index=<str> Elasticsearch index name. DEFAULT=sist2 --es-index=<str> Elasticsearch index name. DEFAULT=sist2
--script-file=<str> Path to user script. --script-file=<str> Path to user script.
--async-script Execute user script asynchronously. --async-script Execute user script asynchronously.
Made by simon987 <me@simon987.net>. Released under GPL-3.0
``` ```
## Scan ## Scan
@ -90,13 +93,21 @@ Exec-script options
* `-t, --threads` * `-t, --threads`
Number of threads for file parsing. **Do not set a number higher than `$(nproc)` or `$(Get-CimInstance Win32_ComputerSystem).NumberOfLogicalProcessors` in Windows!** Number of threads for file parsing. **Do not set a number higher than `$(nproc)` or `$(Get-CimInstance Win32_ComputerSystem).NumberOfLogicalProcessors` in Windows!**
* `-q, --quality` * `--mem-throttle`
Total memory threshold in MiB for scan throttling. Worker threads will not start a new parse job
until the total memory usage of sist2 is below this threshold. Set to 0 to disable. DEFAULT=0
* `-q, --thumbnail-quality`
Thumbnail quality, on a scale of 1.0 to 31.0, 1.0 being the best. Thumbnail quality, on a scale of 1.0 to 31.0, 1.0 being the best.
* `--size` * `--thumbnail-size`
Thumbnail size in pixels. Thumbnail size in pixels.
* `--thumbnail-count`
Maximum number of thumbnails to generate. When set to a value >= 2, thumbnails for video previews
will be generated. The actual number of thumbnails generated depends on the length of the video (maximum 1 image
every ~5s). Set to 0 to completely disable thumbnails.
* `--content-size` * `--content-size`
Number of bytes of text to be extracted from the content of files (plain text and PDFs). Number of bytes of text to be extracted from the content of files (plain text, PDFs etc.).
Repeated whitespace and special characters do not count toward this limit. Repeated whitespace and special characters do not count toward this limit.
Set to 0 to completely disable content parsing.
* `--incremental` * `--incremental`
Specify an existing index. Information about files in this index that were not modified (based on *mtime* attribute) Specify an existing index. Information about files in this index that were not modified (based on *mtime* attribute)
will be copied to the new index and will not be parsed again. will be copied to the new index and will not be parsed again.
@ -129,13 +140,13 @@ Exec-script options
In effect, smaller `treemap-threshold` values will yield a more detailed In effect, smaller `treemap-threshold` values will yield a more detailed
(but also a more cluttered and harder to read) visualization. (but also a more cluttered and harder to read) visualization.
* `--mem-buffer` Maximum memory buffer size in MB (per thread) for files inside archives. Media files * `--mem-buffer` Maximum memory buffer size in MiB (per thread) for files inside archives. Media files
larger than this number will be read sequentially and no *seek* operations will be supported. larger than this number will be read sequentially and no *seek* operations will be supported.
To check if a media file can be parsed without *seek*, execute `cat file.mp4 | ffprobe -` To check if a media file can be parsed without *seek*, execute `cat file.mp4 | ffprobe -`
* `--read-subtitles` When enabled, will attempt to read the subtitles stream from media files. * `--read-subtitles` When enabled, will attempt to read the subtitles stream from media files.
* `--fast-epub` Much faster but less accurate EPUB parsing. When enabled, sist2 will use a simple HTML parser to read epub files instead of the MuPDF library. No thumbnails are generated and author/title metadata are not parsed. * `--fast-epub` Much faster but less accurate EPUB parsing. When enabled, sist2 will use a simple HTML parser to read epub files instead of the MuPDF library. No thumbnails are generated and author/title metadata are not parsed.
* `--checksums` Calculate file checksums (sha1) when scanning files. This option does not cause any additional read * `--checksums` Calculate file checksums (SHA1) when scanning files. This option does not cause any additional read
operations. Checksums are not calculated for all file types, unless the file is inside an archive. When enabled, duplicate operations. Checksums are not calculated for all file types, unless the file is inside an archive. When enabled, duplicate
files are hidden in the web UI (this behaviour can be toggled in the Configuration page). files are hidden in the web UI (this behaviour can be toggled in the Configuration page).
@ -205,6 +216,9 @@ and values are raw image bytes.
Elasticsearch index name. DEFAULT=sist2 Elasticsearch index name. DEFAULT=sist2
* `-p, --print` * `-p, --print`
Print index in JSON format to stdout. Print index in JSON format to stdout.
* `--incremental-index`
Conduct incremental indexing. Assumes that the old index is already ingested in Elasticsearch.
Only the new changes since the last scan will be sent.
* `--script-file` * `--script-file`
Path to user script. See [Scripting](scripting.md). Path to user script. See [Scripting](scripting.md).
* `--mappings-file` * `--mappings-file`

9
sist2-vue/dist/css/chunk-vendors.css vendored Normal file

File diff suppressed because one or more lines are too long

1
sist2-vue/dist/css/index.css vendored Normal file

File diff suppressed because one or more lines are too long

3
sist2-vue/dist/index.html vendored Normal file
View File

@ -0,0 +1,3 @@
<!DOCTYPE html><html lang="en"><head><meta charset="utf-8"><meta http-equiv="X-UA-Compatible" content="IE=edge"><meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1,user-scalable=no"><title>sist2</title><link href="css/chunk-vendors.css" rel="preload" as="style"><link href="css/index.css" rel="preload" as="style"><link href="js/chunk-vendors.js" rel="preload" as="script"><link href="js/index.js" rel="preload" as="script"><link href="css/chunk-vendors.css" rel="stylesheet"><link href="css/index.css" rel="stylesheet"></head><body><noscript><style>body {
height: initial;
}</style><div style="text-align: center; margin-top: 100px"><strong>We're sorry but sist2 doesn't work properly without JavaScript enabled. Please enable it to continue.</strong><br><strong>Nous sommes désolés mais sist2 ne fonctionne pas correctement si JavaScript est activé. Veuillez l'activer pour continuer.</strong></div></noscript><div id="app"></div><script src="js/chunk-vendors.js"></script><script src="js/index.js"></script></body></html>

146
sist2-vue/dist/js/chunk-vendors.js vendored Normal file

File diff suppressed because one or more lines are too long

1
sist2-vue/dist/js/index.js vendored Normal file

File diff suppressed because one or more lines are too long

View File

@ -674,7 +674,7 @@ int main(int argc, const char *argv[]) {
OPT_STRING(0, "es-index", &common_es_index, "Elasticsearch index name. DEFAULT=sist2"), OPT_STRING(0, "es-index", &common_es_index, "Elasticsearch index name. DEFAULT=sist2"),
OPT_BOOLEAN('p', "print", &index_args->print, "Just print JSON documents to stdout."), OPT_BOOLEAN('p', "print", &index_args->print, "Just print JSON documents to stdout."),
OPT_BOOLEAN(0, "incremental-index", &index_args->incremental, OPT_BOOLEAN(0, "incremental-index", &index_args->incremental,
"Conduct incremental indexing, assumes that the old index is already digested by Elasticsearch."), "Conduct incremental indexing. Assumes that the old index is already ingested in Elasticsearch."),
OPT_STRING(0, "script-file", &common_script_path, "Path to user script."), OPT_STRING(0, "script-file", &common_script_path, "Path to user script."),
OPT_STRING(0, "mappings-file", &index_args->es_mappings_path, "Path to Elasticsearch mappings."), OPT_STRING(0, "mappings-file", &index_args->es_mappings_path, "Path to Elasticsearch mappings."),
OPT_STRING(0, "settings-file", &index_args->es_settings_path, "Path to Elasticsearch settings."), OPT_STRING(0, "settings-file", &index_args->es_settings_path, "Path to Elasticsearch settings."),