Compare commits

...

9 Commits

Author SHA1 Message Date
ebfd7e03ce User scripts, bug fixes, docker image 2019-11-12 20:58:43 -05:00
6931d320a2 bugfix with invalid/corrupted index path 2019-11-11 20:49:38 -05:00
fc22e52eae Image placeholder 2019-11-09 23:26:49 -05:00
ba81748a74 Update build 2019-11-09 17:15:20 -05:00
e72fa1587b EXIF metadata for images 2019-11-09 15:18:44 -05:00
ea4fb7fa0d Bug fixes 2019-11-09 12:00:07 -05:00
b0a868bb73 remove 'must match' 2019-11-08 21:46:54 -05:00
d761a3b595 update readme 2019-11-08 19:42:36 -05:00
2d7a8a2fdc fuzzy toggle 2019-11-08 16:15:10 -05:00
35 changed files with 610 additions and 118 deletions

6
.gitmodules vendored
View File

@@ -25,3 +25,9 @@
[submodule "lib/harfbuzz"]
path = lib/harfbuzz
url = https://github.com/harfbuzz/harfbuzz
[submodule "lib/libmagic"]
path = lib/libmagic
url = https://github.com/threatstack/libmagic
[submodule "lib/bzip2-1.0.6"]
path = lib/bzip2-1.0.6
url = https://github.com/enthought/bzip2-1.0.6

View File

@@ -122,10 +122,10 @@ if (WITH_SIST2)
target_compile_options(sist2
PRIVATE
-Ofast
# -Ofast
# -march=native
-fno-stack-protector
-fomit-frame-pointer
# -fno-stack-protector
# -fomit-frame-pointer
)
TARGET_LINK_LIBRARIES(

9
Docker/Dockerfile Normal file
View File

@@ -0,0 +1,9 @@
FROM ubuntu:19.10
MAINTAINER simon987 <me@simon987.net>
RUN apt update
RUN apt install -y libglib2.0-0 libcurl4 libmagic1 libharfbuzz-bin libopenjp2-7
ADD sist2 /root/sist2
ENTRYPOINT ["/root/sist2"]

8
Docker/build.sh Executable file
View File

@@ -0,0 +1,8 @@
cp ../sist2 .
version=$(./sist2 --version)
echo "Version ${version}"
docker build . -t simon987/sist2:${version} -t simon987/sist2:latest
docker push simon987/sist2:${version}
docker push simon987/sist2:latest

View File

@@ -14,6 +14,7 @@ sist2 (Simple incremental search tool)
* Extracts text from common file types\*
* Generates thumbnails\*
* Incremental scanning
* Automatic tagging from file attributes via [user scripts](scripting/README.md)
\* See [format support](#format-support)
@@ -21,11 +22,13 @@ sist2 (Simple incremental search tool)
## Getting Started
1. Have an [Elasticsearch](https://www.elastic.co/downloads/elasticsearch) instance running
1. Download the [latest sist2 release](https://github.com/simon987/sist2/releases)
1.
1. Download the [latest sist2 release](https://github.com/simon987/sist2/releases) *
1. *(or)* `docker pull simon987/sist2:latest`
*Windows users*: `sist2` runs under [WSL](https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux)
*Mac users*: See [#1](https://github.com/simon987/sist2/issues/1)
\* *Windows users*: **sist2** runs under [WSL](https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux)
\* *Mac users*: See [#1](https://github.com/simon987/sist2/issues/1)
## Example usage
@@ -52,14 +55,40 @@ sist2 index --print ./my_idx > raw_documents.ndjson
sist2 web --bind 0.0.0.0 --port 4321 ./my_idx1 ./my_idx2 ./my_idx3
```
### Use sist2 with docker
**scan**
```bash
docker run -it \
-v /path/to/files/:/files \
-v $PWD/out/:/out \
simon987/sist2 scan -t 4 /files -o /out/my_idx1
```
**index**
```bash
docker run -it --network host\
-v $PWD/out/:/out \
simon987/sist2 index /out/my_idx1
```
**web**
```bash
docker run --rm --network host -d --name sist2\
-v $PWD/out/my_idx:/idx \
-v $PWD/my/files:/files
simon987/sist2 web --bind 0.0.0.0 /idx
docker stop sist2
```
## Format support
File type | Library | Content | Thumbnail | Metadata
:---|:---|:---|:---|:---
pdf,xps,cbz,cbr,fb2,epub | MuPDF | yes | yes, `png` | title |
pdf,xps,cbz,fb2,epub | MuPDF | yes | yes, `png` | title |
`audio/*` | ffmpeg | - | yes, `jpeg` | ID3 tags |
`video/*` | ffmpeg | - | yes, `jpeg` | title, comment |
`image/*` | ffmpeg | - | yes, `jpeg` | *planned* |
`video/*` | ffmpeg | - | yes, `jpeg` | title, comment, artist |
`image/*` | ffmpeg | - | yes, `jpeg` | `EXIF:Artist`, `EXIF:ImageDescription` |
ttf,ttc,cff,woff,fnt,otf | Freetype2 | - | yes, `bmp` | Name & style |
`text/plain` | *(none)* | yes | no | - |
docx, xlsx, pptx | | *planned* | no | *planned* |
@@ -79,7 +108,7 @@ binaries.
apt install git cmake pkg-config libglib2.0-dev\
libssl-dev uuid-dev libavformat-dev libswscale-dev \
python3 libmagic-dev libfreetype6-dev libcurl-dev \
libbz2-dev yasm libharfbuzz-dev
libbz2-dev yasm libharfbuzz-dev ragel
```
*(FreeBSD)*
```bash

1
lib/bzip2-1.0.6 Submodule

Submodule lib/bzip2-1.0.6 added at 288acf97a1

1
lib/libmagic Submodule

Submodule lib/libmagic added at 1249b5cd02

View File

@@ -80,6 +80,9 @@
"analyzer": "my_nGram"
}
}
},
"tag": {
"type": "keyword"
}
}
}

117
scripting/README.md Normal file
View File

@@ -0,0 +1,117 @@
## User scripts
*This document is under construction, more in-depth guide coming soon*
During the `index` step, you can use the `--script-file <script>` option to
modify documents or add user tags. This option is mainly used to
implement automatic tagging based on file attributes.
The scripting language used
([Painless Scripting Language](https://www.elastic.co/guide/en/elasticsearch/painless/7.4/index.html))
is very similar to Java, but you should be able to create user scripts
without programming experience at all if you're somewhat familiar with
regex.
This is the base structure of the documents we're working with:
```json
{
"_id": "e171405c-fdb5-4feb-bb32-82637bc32084",
"_index": "sist2",
"_type": "_doc",
"_source": {
"index": "206b3050-e821-421a-891d-12fcf6c2db0d",
"mime": "application/json",
"size": 1799,
"mtime": 1545443685,
"extension": "md",
"name": "README",
"path": "sist2/scripting",
"content": "..."
}
}
```
**Example script**
This script checks if the `genre` attribute exists, if it does
it adds the `genre.<genre>` tag.
```Java
ArrayList tags = ctx._source.tag = new ArrayList();
if (ctx._source?.genre != null) {
tags.add("genre." + ctx._source.genre.toLowerCase())
}
```
You can use `.` to create a hierarchical tag tree:
![scripting/genre_example](genre_example.png)
To use regular expressions, you need to add this line in `/etc/elasticsearch/elasticsearch.yml`
```yaml
script.painless.regex.enabled: true
```
Or, if you're using docker add `-e "script.painless.regex.enabled=true"`
### Examples
If `(20XX)` is in the file name, add the `year.<year>` tag:
```Java
ArrayList tags = ctx._source.tag = new ArrayList();
Matcher m = /[\(\.+](20[0-9]{2})[\)\.+]/.matcher(ctx._source.name);
if (m.find()) {
tags.add("year." + m.group(1))
}
```
Use default *Calibre* folder structure to infer author.
```Java
ArrayList tags = ctx._source.tag = new ArrayList();
// We expect the book path to look like this:
// /path/to/Calibre Library/Author/Title/Title - Author.pdf
if (ctx._source.name.contains("-") && ctx._source.extension == "pdf") {
String[] names = ctx._source.name.splitOnToken('-');
tags.add("author." + names[1].strip());
}
```
If the file matches a specific pattern `AAAA-000 fName1 lName1, <fName2 lName2>...`, add the `actress.<actress>` and
`studio.<studio>` tag:
```Java
ArrayList tags = ctx._source.tag = new ArrayList();
Matcher m = /([A-Z]{4})-[0-9]{3} (.*)/.matcher(ctx._source.name);
if (m.find()) {
tags.add("studio." + m.group(1));
// Take the matched group (.*), and add a tag for
// each name, separated by comma
for (String name : m.group(2).splitOnToken(',')) {
tags.add("actress." + name);
}
}
```
Set the name of the last folder (`/path/to/<studio>/file.mp4`) to `studio.<studio>` tag
```Java
ArrayList tags = ctx._source.tag = new ArrayList();
if (ctx._source.path != "") {
String[] names = ctx._source.path.splitOnToken('/');
tags.add("studio." + names[names.length-1]);
}
```
Set the name of the last folder (`/path/to/<studio>/file.mp4`) to `studio.<studio>` tag
```Java
ArrayList tags = ctx._source.tag = new ArrayList();
if (ctx._source.path != "") {
String[] names = ctx._source.path.splitOnToken('/');
tags.add("studio." + names[names.length-1]);
}
```

BIN
scripting/genre_example.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

View File

@@ -54,14 +54,12 @@ cd ../..
mv onion/build/src/onion/libonion_static.a .
#bzip2
git clone https://github.com/enthought/bzip2-1.0.6
cd bzip2-1.0.6
make -j 4
cd ..
mv bzip2-1.0.6/libbz2.a .
# magic
git clone https://github.com/threatstack/libmagic
cd libmagic
./autogen.sh
./configure --enable-static --disable-shared

View File

@@ -42,14 +42,12 @@ mv ffmpeg/libswresample/libswresample.a .
mv ffmpeg/libswscale/libswscale.a .
#bzip2
git clone https://github.com/enthought/bzip2-1.0.6
cd bzip2-1.0.6
make -j 4
cd ..
mv bzip2-1.0.6/libbz2.a .
# magic
git clone https://github.com/threatstack/libmagic
cd libmagic
./autogen.sh
./configure --enable-static --disable-shared

View File

@@ -2,8 +2,8 @@
#define DEFAULT_OUTPUT "index.sist2/"
#define DEFAULT_CONTENT_SIZE 4096
#define DEFAULT_QUALITY 15
#define DEFAULT_SIZE 200
#define DEFAULT_QUALITY 5
#define DEFAULT_SIZE 500
#define DEFAULT_REWRITE_URL ""
#define DEFAULT_ES_URL "http://localhost:9200"
@@ -25,7 +25,7 @@ int scan_args_validate(scan_args_t *args, int argc, const char **argv) {
char *abs_path = abspath(argv[1]);
if (abs_path == NULL) {
fprintf(stderr, "File not found: %s", argv[1]);
fprintf(stderr, "File not found: %s\n", argv[1]);
return 1;
} else {
args->path = abs_path;
@@ -34,7 +34,7 @@ int scan_args_validate(scan_args_t *args, int argc, const char **argv) {
if (args->incremental != NULL) {
abs_path = abspath(args->incremental);
if (abs_path == NULL) {
fprintf(stderr, "File not found: %s", args->incremental);
fprintf(stderr, "File not found: %s\n", args->incremental);
return 1;
}
}
@@ -100,7 +100,7 @@ int index_args_validate(index_args_t *args, int argc, const char **argv) {
char *index_path = abspath(argv[1]);
if (index_path == NULL) {
fprintf(stderr, "File not found: %s", argv[1]);
fprintf(stderr, "File not found: %s\n", argv[1]);
return 1;
} else {
args->index_path = argv[1];
@@ -109,6 +109,27 @@ int index_args_validate(index_args_t *args, int argc, const char **argv) {
if (args->es_url == NULL) {
args->es_url = DEFAULT_ES_URL;
}
if (args->script_path != NULL) {
struct stat info;
int res = stat(args->script_path, &info);
if (res == -1) {
fprintf(stderr, "Error opening script file '%s': %s\n", args->script_path, strerror(errno));
return 1;
}
int fd = open(args->script_path, O_RDONLY);
if (fd == -1) {
fprintf(stderr, "Error opening script file '%s': %s\n", args->script_path, strerror(errno));
return 1;
}
args->script = malloc(info.st_size + 1);
read(fd, args->script, info.st_size);
*(args->script + info.st_size) = '\0';
close(fd);
}
return 0;
}
@@ -137,7 +158,7 @@ int web_args_validate(web_args_t *args, int argc, const char **argv) {
for (int i = 0; i < args->index_count; i++) {
char *abs_path = abspath(args->indices[i]);
if (abs_path == NULL) {
fprintf(stderr, "File not found: %s", abs_path);
fprintf(stderr, "File not found: %s\n", abs_path);
return 1;
}
}

View File

@@ -22,6 +22,8 @@ int scan_args_validate(scan_args_t *args, int argc, const char **argv);
typedef struct index_args {
char *es_url;
const char *index_path;
const char *script_path;
char *script;
int print;
int force_reset;
} index_args_t;

View File

@@ -6,7 +6,6 @@
#include <stdio.h>
#include <string.h>
#include <cJSON/cJSON.h>
#include <src/ctx.h>
#include "static_generated.c"
@@ -54,6 +53,40 @@ void index_json(cJSON *document, const char uuid_str[UUID_STR_LEN]) {
elastic_index_line(bulk_line);
}
void execute_update_script(const char *script, const char index_id[UUID_STR_LEN]) {
cJSON *body = cJSON_CreateObject();
cJSON *script_obj = cJSON_AddObjectToObject(body, "script");
cJSON_AddStringToObject(script_obj, "lang", "painless");
cJSON_AddStringToObject(script_obj, "source", script);
cJSON *query = cJSON_AddObjectToObject(body, "query");
cJSON *term_obj = cJSON_AddObjectToObject(query, "term");
cJSON_AddStringToObject(term_obj, "index", index_id);
char * str = cJSON_Print(body);
char bulk_url[4096];
snprintf(bulk_url, 4096, "%s/sist2/_update_by_query?pretty", Indexer->es_url);
response_t *r = web_post(bulk_url, str, "Content-Type: application/json");
printf("Executed user script <%d>\n", r->status_code);
cJSON *resp = cJSON_Parse(r->body);
cJSON_free(str);
cJSON_Delete(body);
free_response(r);
cJSON *error = cJSON_GetObjectItem(resp, "error");
if (error != NULL) {
char *error_str = cJSON_Print(error);
fprintf(stderr, "User script error: \n%s\n", error_str);
cJSON_free(error_str);
}
cJSON_Delete(resp);
}
void elastic_flush() {
if (Indexer == NULL) {
@@ -115,6 +148,7 @@ void elastic_flush() {
cJSON_Delete(ret_json);
free_response(r);
free(buf);
}
void elastic_index_line(es_bulk_line_t *line) {
@@ -140,8 +174,7 @@ void elastic_index_line(es_bulk_line_t *line) {
es_indexer_t *create_indexer(const char *url) {
size_t url_len = strlen(url);
char *es_url = malloc(url_len);
char *es_url = malloc(strlen(url) + 1);
strcpy(es_url, url);
es_indexer_t *indexer = malloc(sizeof(es_indexer_t));
@@ -154,7 +187,7 @@ es_indexer_t *create_indexer(const char *url) {
return indexer;
}
void destroy_indexer() {
void destroy_indexer(char * script, char index_id[UUID_STR_LEN]) {
char url[4096];
@@ -163,6 +196,15 @@ void destroy_indexer() {
printf("Refresh index <%d>\n", r->status_code);
free_response(r);
if (script != NULL) {
execute_update_script(script, index_id);
}
snprintf(url, sizeof(url), "%s/sist2/_refresh", IndexCtx.es_url);
r = web_post(url, "", NULL);
printf("Refresh index <%d>\n", r->status_code);
free_response(r);
snprintf(url, sizeof(url), "%s/sist2/_forcemerge", IndexCtx.es_url);
r = web_post(url, "", NULL);
printf("Merge index <%d>\n", r->status_code);

View File

@@ -24,7 +24,7 @@ void index_json(cJSON *document, const char uuid_str[UUID_STR_LEN]);
es_indexer_t *create_indexer(const char* es_url);
void destroy_indexer();
void destroy_indexer(char *script, char index_id[UUID_STR_LEN]);
void elastic_init(int force_reset);

File diff suppressed because one or more lines are too long

View File

@@ -54,6 +54,12 @@ index_descriptor_t read_index_descriptor(char *path) {
struct stat info;
stat(path, &info);
int fd = open(path, O_RDONLY);
if (fd == -1) {
fprintf(stderr, "Invalid/corrupt index (Could not find descriptor)\n");
exit(1);
}
char *buf = malloc(info.st_size + 1);
read(fd, buf, info.st_size);
*(buf + info.st_size) = '\0';
@@ -258,8 +264,9 @@ void read_index(const char *path, const char index_id[UUID_STR_LEN], index_func
}
func(document, uuid_str);
cJSON_free(document);
cJSON_Delete(document);
}
dyn_buffer_destroy(&buf);
fclose(file);
}

View File

@@ -15,7 +15,7 @@ store_t *store_create(char *path) {
);
if (open_ret != 0) {
fprintf(stderr, "Error while opening store: %s", mdb_strerror(open_ret));
fprintf(stderr, "Error while opening store: %s (%s)\n", mdb_strerror(open_ret), path);
exit(1);
}

View File

@@ -10,7 +10,7 @@
#define EPILOG "Made by simon987 <me@simon987.net>. Released under GPL-3.0"
static const char *const Version = "1.1.1";
static const char *const Version = "1.1.5";
static const char *const usage[] = {
"sist2 scan [OPTION]... PATH",
"sist2 index [OPTION]... INDEX",
@@ -163,10 +163,11 @@ void sist2_index(index_args_t *args) {
read_index(file_path, desc.uuid, f);
}
}
closedir(dir);
if (!args->print) {
elastic_flush();
destroy_indexer();
destroy_indexer(args->script, desc.uuid);
}
}
@@ -208,16 +209,20 @@ int main(int argc, const char *argv[]) {
web_args_t *web_args = web_args_create();
#endif
int arg_version = 0;
char * common_es_url = NULL;
struct argparse_option options[] = {
OPT_HELP(),
OPT_BOOLEAN('v', "version", &arg_version, "Show version and exit"),
OPT_GROUP("Scan options"),
OPT_INTEGER('t', "threads", &scan_args->threads, "Number of threads. DEFAULT=1"),
OPT_FLOAT('q', "quality", &scan_args->quality,
"Thumbnail quality, on a scale of 1.0 to 31.0, 1.0 being the best. DEFAULT=15"),
OPT_INTEGER(0, "size", &scan_args->size, "Thumbnail size, in pixels. DEFAULT=200"),
"Thumbnail quality, on a scale of 1.0 to 31.0, 1.0 being the best. DEFAULT=5"),
OPT_INTEGER(0, "size", &scan_args->size, "Thumbnail size, in pixels. DEFAULT=500"),
OPT_INTEGER(0, "content-size", &scan_args->content_size,
"Number of bytes to be extracted from text documents. DEFAULT=4096"),
OPT_STRING(0, "incremental", &scan_args->incremental,
@@ -230,6 +235,7 @@ int main(int argc, const char *argv[]) {
OPT_GROUP("Index options"),
OPT_STRING(0, "es-url", &common_es_url, "Elasticsearch url. DEFAULT=http://localhost:9200"),
OPT_BOOLEAN('p', "print", &index_args->print, "Just print JSON documents to stdout."),
OPT_STRING(0, "script-file", &index_args->script_path, "Path to user script."),
OPT_BOOLEAN('f', "force-reset", &index_args->force_reset, "Reset Elasticsearch mappings and settings. "
"(You must use this option the first time you use the index command)"),
@@ -247,6 +253,11 @@ int main(int argc, const char *argv[]) {
argparse_describe(&argparse, DESCRIPTION, EPILOG);
argc = argparse_parse(&argparse, argc, argv);
if (arg_version) {
printf(Version);
exit(0);
}
#ifndef SIST_SCAN_ONLY
web_args->es_url = common_es_url;
index_args->es_url = common_es_url;

View File

@@ -142,6 +142,9 @@ void parse_font(const char *buf, size_t buf_len, document_t *doc) {
if (library == NULL) {
FT_Init_FreeType(&library);
}
if (buf == NULL) {
return;
}
FT_Face face;
FT_Error err = FT_New_Memory_Face(library, (unsigned char *) buf, buf_len, 0, &face);

View File

@@ -116,9 +116,9 @@ AVFrame *read_frame(AVFormatContext *pFormatCtx, AVCodecContext *decoder, int st
return frame;
}
#define APPEND_TAG_META(doc, tag, keyname) \
#define APPEND_TAG_META(doc, tag_, keyname) \
text_buffer_t tex = text_buffer_create(-1); \
text_buffer_append_string0(&tex, tag->value); \
text_buffer_append_string0(&tex, tag_->value); \
meta_line_t *meta_tag = malloc(sizeof(meta_line_t) + tex.dyn_buffer.cur); \
meta_tag->key = keyname; \
strcpy(meta_tag->strval, tex.dyn_buffer.buf); \
@@ -151,30 +151,39 @@ void append_audio_meta(AVFormatContext *pFormatCtx, document_t *doc) {
}
__always_inline
void append_video_meta(AVFormatContext *pFormatCtx, document_t *doc, int include_audio_tags) {
void append_video_meta(AVFormatContext *pFormatCtx, AVFrame *frame, document_t *doc, int include_audio_tags, int is_video) {
meta_line_t *meta_duration = malloc(sizeof(meta_line_t));
meta_duration->key = MetaMediaDuration;
meta_duration->longval = pFormatCtx->duration / AV_TIME_BASE;
APPEND_META(doc, meta_duration)
if (is_video) {
meta_line_t *meta_duration = malloc(sizeof(meta_line_t));
meta_duration->key = MetaMediaDuration;
meta_duration->longval = pFormatCtx->duration / AV_TIME_BASE;
APPEND_META(doc, meta_duration)
meta_line_t *meta_bitrate = malloc(sizeof(meta_line_t));
meta_bitrate->key = MetaMediaBitrate;
meta_bitrate->longval = pFormatCtx->bit_rate;
APPEND_META(doc, meta_bitrate)
meta_line_t *meta_bitrate = malloc(sizeof(meta_line_t));
meta_bitrate->key = MetaMediaBitrate;
meta_bitrate->longval = pFormatCtx->bit_rate;
APPEND_META(doc, meta_bitrate)
}
AVDictionaryEntry *tag = NULL;
while ((tag = av_dict_get(pFormatCtx->metadata, "", tag, AV_DICT_IGNORE_SUFFIX))) {
char key[32];
strncpy(key, tag->key, sizeof(key));
char *ptr = key;
for (; *ptr; ++ptr) *ptr = (char) tolower(*ptr);
if (strcmp(key, "title") == 0 && include_audio_tags) {
APPEND_TAG_META(doc, tag, MetaTitle)
} else if (strcmp(key, "comment") == 0) {
APPEND_TAG_META(doc, tag, MetaContent)
if (is_video) {
while ((tag = av_dict_get(pFormatCtx->metadata, "", tag, AV_DICT_IGNORE_SUFFIX))) {
if (include_audio_tags && strcmp(tag->key, "title") == 0) {
APPEND_TAG_META(doc, tag, MetaTitle)
} else if (strcmp(tag->key, "comment") == 0) {
APPEND_TAG_META(doc, tag, MetaContent)
} else if (include_audio_tags && strcmp(tag->key, "artist") == 0) {
APPEND_TAG_META(doc, tag, MetaArtist)
}
}
} else {
// EXIF metadata
while ((tag = av_dict_get(frame->metadata, "", tag, AV_DICT_IGNORE_SUFFIX))) {
if (include_audio_tags && strcmp(tag->key, "Artist") == 0) {
APPEND_TAG_META(doc, tag, MetaArtist)
} else if (strcmp(tag->key, "ImageDescription") == 0) {
APPEND_TAG_META(doc, tag, MetaContent)
}
}
}
}
@@ -236,11 +245,6 @@ void parse_media(const char *filepath, document_t *doc) {
if (video_stream != -1) {
AVStream *stream = pFormatCtx->streams[video_stream];
if (stream->nb_frames > 1) {
//This is a video (not a still image)
append_video_meta(pFormatCtx, doc, audio_stream == -1);
}
if (stream->codecpar->width <= MIN_SIZE || stream->codecpar->height <= MIN_SIZE) {
avformat_close_input(&pFormatCtx);
avformat_free_context(pFormatCtx);
@@ -273,6 +277,8 @@ void parse_media(const char *filepath, document_t *doc) {
return;
}
append_video_meta(pFormatCtx, frame, doc, audio_stream == -1, stream->nb_frames > 1);
// Scale frame
AVFrame *scaled_frame = scale_frame(decoder, frame, ScanCtx.tn_size);

View File

@@ -16,7 +16,6 @@ void *read_all(parse_job_t *job, const char *buf, int bytes_read, int *fd) {
if (*fd == -1) {
perror("open");
printf("%s\n", job->filepath);
free(job);
return NULL;
}
}
@@ -25,6 +24,7 @@ void *read_all(parse_job_t *job, const char *buf, int bytes_read, int *fd) {
int ret = read(*fd, full_buf + bytes_read, job->info.st_size - bytes_read);
if (ret == -1) {
perror("read");
return NULL;
}
}
@@ -108,7 +108,7 @@ void parse(void *arg) {
void *pdf_buf = read_all(job, (char *) buf, bytes_read, &fd);
parse_pdf(pdf_buf, doc.size, &doc);
if (pdf_buf != buf) {
if (pdf_buf != buf && pdf_buf != NULL) {
free(pdf_buf);
}
@@ -119,7 +119,7 @@ void parse(void *arg) {
void *font_buf = read_all(job, (char *) buf, bytes_read, &fd);
parse_font(font_buf, doc.size, &doc);
if (font_buf != buf) {
if (font_buf != buf && font_buf != NULL) {
free(font_buf);
}
}

View File

@@ -114,6 +114,10 @@ int read_stext_block(fz_stext_block *block, text_buffer_t *tex) {
void parse_pdf(void *buf, size_t buf_len, document_t *doc) {
if (buf == NULL) {
return;
}
static int mu_is_initialized = 0;
if (!mu_is_initialized) {
pthread_mutex_init(&ScanCtx.mupdf_mu, NULL);

View File

@@ -90,7 +90,7 @@ void text_buffer_terminate_string(text_buffer_t *buf) {
}
__always_inline
int utf8_validchr(const char* s) {
int utf8_validchr(const char *s) {
if (0x00 == (0x80 & *s)) {
return TRUE;
} else if (0xf0 == (0xf8 & *s)) {
@@ -130,7 +130,7 @@ int utf8_validchr(const char* s) {
if (0 == (0x1e & s[0])) {
return FALSE;
}
} else {
} else {
return FALSE;
}
@@ -140,12 +140,22 @@ int utf8_validchr(const char* s) {
int text_buffer_append_string(text_buffer_t *buf, char *str, size_t len) {
utf8_int32_t c;
for (void *v = utf8codepoint(str, &c); c != '\0' && ((char*)v - str + 4) < len; v = utf8codepoint(v, &c)) {
if (str == NULL || len < 1 ||
(0xf0 == (0xf8 & str[0]) && len < 4) ||
(0xe0 == (0xf0 & str[0]) && len < 3) ||
(0xc0 == (0xe0 & str[0]) && len == 1) ||
*(str) == 0) {
text_buffer_terminate_string(buf);
return 0;
}
for (void *v = utf8codepoint(str, &c); c != '\0' && ((char *) v - str + 4) < len; v = utf8codepoint(v, &c)) {
if (utf8_validchr(v)) {
text_buffer_append_char(buf, c);
}
}
text_buffer_terminate_string(buf);
return 0;
}
int text_buffer_append_string0(text_buffer_t *buf, char *str) {

File diff suppressed because one or more lines are too long

View File

@@ -1,3 +1,7 @@
*:focus {
outline: 0;
}
a {
color: #00BCD4;
}
@@ -95,6 +99,11 @@ body {
margin-right: 3px;
}
.badge-user {
color: #212529;
background-color: #e0e0e0;
}
.fit {
display: block;
min-width: 64px;
@@ -164,6 +173,7 @@ mark {
margin-top: 1em;
margin-bottom: 1em;
}
.custom-select {
overflow: auto;
background-color: #37474F;
@@ -239,4 +249,38 @@ option {
.btn {
color: #eee;
}
}
.nav-tabs .nav-link {
color: #e0e0e0;
}
.nav-tabs .nav-item.show .nav-link, .nav-tabs .nav-link.active {
background-color: #212121;
border-color: #616161 #616161 #212121;
color: #e0e0e0;
}
.nav-tabs .nav-link:focus, .nav-tabs .nav-link:focus {
border-color: #616161 #616161 #212121;
color: #e0e0e0;
}
.nav-tabs .nav-link:focus, .nav-tabs .nav-link:hover {
border-color: #e0e0e0 #e0e0e0 #212121;
color: #e0e0e0;
}
.nav-tabs {
border-bottom: #616161;
}
.nav {
margin-top: 0.5rem;
}
@media (min-width: 800px) {
.nav {
min-width: 800px;
}
}

View File

@@ -1,3 +1,7 @@
*:focus {
outline: 0;
}
body {overflow-y:scroll;}
.progress {
@@ -47,6 +51,11 @@ body {overflow-y:scroll;}
background-color: #FFC107;
}
.badge-user {
color: #212529;
background-color: #e0e0e0;
}
.badge-text {
color: #FFFFFF;
background-color: #FAAB3C;
@@ -168,4 +177,14 @@ mark {
padding: .1rem .3rem;
font-size: .875rem;
border-radius: .2rem;
}
}
.nav {
margin-top: 0.5rem;
}
@media (min-width: 800px) {
.nav {
min-width: 800px;
}
}

View File

@@ -75,6 +75,18 @@ function shouldPlayVideo(hit) {
return videoc !== "hevc" && videoc !== "mpeg2video" && videoc !== "wmv3";
}
function makePlaceholder(w, h) {
const calc = w > h
? (175 / w / h) >= 272
? (175 * w / h)
: 175
: 175;
const el = document.createElement("div");
el.setAttribute("style", `height: ${calc}px`);
return el;
}
/**
*
* @param hit
@@ -119,14 +131,22 @@ function createDocCard(hit) {
thumbnail = document.createElement("video");
addVidSrc("f/" + hit["_id"], hit["_source"]["mime"], thumbnail);
const placeholder = makePlaceholder(hit["_source"]["width"], hit["_source"]["height"]);
imgWrapper.appendChild(placeholder);
thumbnail.setAttribute("class", "fit");
thumbnail.setAttribute("loop", "");
thumbnail.setAttribute("controls", "");
thumbnail.setAttribute("preload", "none");
thumbnail.setAttribute("poster", `t/${hit["_source"]["index"]}/${hit["_id"]}`);
thumbnail.addEventListener("dblclick", function () {
thumbnail.webkitRequestFullScreen();
});
const poster = new Image();
poster.src = thumbnail.getAttribute('poster');
poster.addEventListener("load", function () {
placeholder.remove();
imgWrapper.appendChild(thumbnail);
});
} else if ((hit["_source"].hasOwnProperty("width") && hit["_source"]["width"] > 20 && hit["_source"]["height"] > 20)
|| hit["_source"]["mime"] === "application/pdf"
|| hit["_source"]["mime"] === "application/epub+zip"
@@ -136,9 +156,17 @@ function createDocCard(hit) {
thumbnail = document.createElement("img");
thumbnail.setAttribute("class", "card-img-top fit");
thumbnail.setAttribute("src", `t/${hit["_source"]["index"]}/${hit["_id"]}`);
const placeholder = makePlaceholder(hit["_source"]["width"], hit["_source"]["height"]);
imgWrapper.appendChild(placeholder);
thumbnail.addEventListener("error", () => {
imgWrapper.remove();
});
thumbnail.addEventListener("load", () => {
placeholder.remove();
imgWrapper.appendChild(thumbnail);
});
}
//Thumbnail overlay
@@ -167,7 +195,7 @@ function createDocCard(hit) {
if (hit["_source"].hasOwnProperty("duration")) {
thumbnailOverlay = document.createElement("div");
thumbnailOverlay.setAttribute("class", "card-img-overlay");
let durationBadge = document.createElement("span");
const durationBadge = document.createElement("span");
durationBadge.setAttribute("class", "badge badge-resolution");
durationBadge.appendChild(document.createTextNode(humanTime(hit["_source"]["duration"])));
thumbnailOverlay.appendChild(durationBadge);
@@ -179,7 +207,7 @@ function createDocCard(hit) {
case "video":
case "image":
if (hit["_source"].hasOwnProperty("videoc")) {
let formatTag = document.createElement("span");
const formatTag = document.createElement("span");
formatTag.setAttribute("class", "badge badge-pill badge-video");
formatTag.appendChild(document.createTextNode(hit["_source"]["videoc"].replace(" ", "")));
tags.push(formatTag);
@@ -199,14 +227,13 @@ function createDocCard(hit) {
//Content
let contentHl = getContentHighlight(hit);
if (contentHl !== undefined) {
let contentDiv = document.createElement("div");
const contentDiv = document.createElement("div");
contentDiv.setAttribute("class", "content-div");
contentDiv.insertAdjacentHTML('afterbegin', contentHl);
docCard.appendChild(contentDiv);
}
if (thumbnail !== null) {
imgWrapper.appendChild(thumbnail);
docCard.appendChild(imgWrapper);
}
@@ -227,6 +254,26 @@ function createDocCard(hit) {
imgWrapper.appendChild(thumbnailOverlay);
}
// User tags
if (hit["_source"].hasOwnProperty("tag")) {
hit["_source"]["tag"].forEach(tag => {
const userTag = document.createElement("span");
userTag.setAttribute("class", "badge badge-pill badge-user");
const tokens = tag.split("#");
if (tokens.length > 1) {
const bg = "#" + tokens[1];
const fg = lum(tokens[1]) > 40 ? "#000" : "#fff";
userTag.setAttribute("style", `background-color: ${bg}; color: ${fg}`);
}
const name = tokens[0].split(".")[tokens[0].split(".").length - 1];
userTag.appendChild(document.createTextNode(name));
tags.push(userTag);
})
}
for (let i = 0; i < tags.length; i++) {
tagContainer.appendChild(tags[i]);
}

View File

@@ -1,6 +1,8 @@
const SIZE = 40;
let mimeMap = [];
let tree;
let tagMap = [];
let mimeTree;
let tagTree;
let searchBar = document.getElementById("searchBar");
let pathBar = document.getElementById("pathBar");
@@ -32,7 +34,7 @@ window.onload = () => {
})
};
function toggleSearchBar() {
function toggleFuzzy() {
searchDebounced();
}
@@ -49,6 +51,23 @@ $.jsonPost("i").then(resp => {
});
});
function handleTreeClick (tree) {
return (event, node, handler) => {
event.preventTreeDefault();
if (node.id === "any") {
if (!node.itree.state.checked) {
tree.deselect();
}
} else {
tree.node("any").deselect();
}
handler();
searchDebounced();
}
}
$.jsonPost("es", {
aggs: {
mimeTypes: {
@@ -85,34 +104,86 @@ $.jsonPost("es", {
});
mimeMap.push({"text": "All", "id": "any"});
tree = new InspireTree({
mimeTree = new InspireTree({
selection: {
mode: 'checkbox'
},
data: mimeMap
});
new InspireTreeDOM(tree, {
target: '.tree'
new InspireTreeDOM(mimeTree, {
target: '#mimeTree'
});
tree.on("node.click", function (event, node, handler) {
event.preventTreeDefault();
mimeTree.on("node.click", handleTreeClick(mimeTree));
mimeTree.select();
mimeTree.node("any").deselect();
});
if (node.id === "any") {
if (!node.itree.state.checked) {
tree.deselect();
function leafTag(tag) {
const tokens = tag.split(".");
return tokens[tokens.length-1]
}
// Tags tree
$.jsonPost("es", {
aggs: {
tags: {
terms: {
field: "tag",
size: 10000
}
} else {
tree.node("any").deselect();
}
handler();
searchDebounced();
},
size: 0,
}).then(resp => {
resp["aggregations"]["tags"]["buckets"]
.sort((a, b) => a["key"].localeCompare(b["key"]))
.forEach(bucket => {
addTag(tagMap, bucket["key"], bucket["key"], bucket["doc_count"])
});
tree.select();
tree.node("any").deselect();
tagMap.push({"text": "All", "id": "any"});
tagTree = new InspireTree({
selection: {
mode: 'checkbox'
},
data: tagMap
});
new InspireTreeDOM(tagTree, {
target: '#tagTree'
});
tagTree.on("node.click", handleTreeClick(tagTree));
tagTree.node("any").select();
searchBusy = false;
});
function addTag(map, tag, id, count) {
let tags = tag.split("#")[0].split(".");
let child = {
id: id,
text: tags.length !== 1 ? tags[0] : `${tags[0]} (${count})`,
children: []
};
let found = false;
map.forEach(node => {
if (node.text === child.text) {
found = true;
if (tags.length !== 1) {
addTag(node.children, tags.slice(1).join("."), id, count);
}
}
});
if (!found) {
if (tags.length !== 1) {
addTag(child.children, tags.slice(1).join("."), id, count);
map.push(child);
} else {
map.push(child);
}
}
}
new autoComplete({
selector: '#pathBar',
minChars: 1,
@@ -181,8 +252,8 @@ function doScroll() {
})
}
function getSelectedMimeTypes() {
let mimeTypes = [];
function getSelectedNodes(tree) {
let selectedNodes = [];
let selected = tree.selected();
@@ -194,11 +265,11 @@ function getSelectedMimeTypes() {
//Only get children
if (selected[i].text.indexOf("(") !== -1) {
mimeTypes.push(selected[i].id);
selectedNodes.push(selected[i].id);
}
}
return mimeTypes
return selectedNodes
}
function search() {
@@ -218,21 +289,37 @@ function search() {
let query = searchBar.value;
let empty = query === "";
let condition = $("#barToggle").prop("checked") && !empty ? "must" : "should";
let condition = empty ? "should" : "must";
let filters = [
{range: {size: {gte: size_min, lte: size_max}}},
{terms: {index: selectedIndices}}
];
let fields = [
"name^8",
"content^3",
"album^8", "artist^8", "title^8", "genre^2", "album_artist^8",
"font_name^6"
];
if ($("#fuzzyToggle").prop("checked")) {
fields.push("content.nGram");
fields.push("name.nGram^3");
}
let path = pathBar.value.replace(/\/$/, "").toLowerCase(); //remove trailing slashes
if (path !== "") {
filters.push([{term: {path: path}}])
}
let mimeTypes = getSelectedMimeTypes();
let mimeTypes = getSelectedNodes(mimeTree);
if (!mimeTypes.includes("any")) {
filters.push([{terms: {"mime": mimeTypes}}]);
}
let tags = getSelectedNodes(tagTree);
if (!tags.includes("any")) {
filters.push([{terms: {"tag": tags}}]);
}
$.jsonPost("es?scroll=1", {
"_source": {
excludes: ["content"]
@@ -243,12 +330,7 @@ function search() {
multi_match: {
query: query,
type: "most_fields",
fields: [
"name^8", "name.nGram^3", "content^3",
"content.nGram",
"album^8", "artist^8", "title^8", "genre^2", "album_artist^8",
"font_name^6"
],
fields: fields,
operator: "and"
}
},
@@ -265,7 +347,7 @@ function search() {
content: {},
name: {},
"name.nGram": {},
// font_name: {},
font_name: {},
}
},
aggs: {

View File

@@ -43,9 +43,9 @@ function humanTime(sec_num) {
function debounce(func, wait) {
let timeout;
return function() {
return function () {
let context = this, args = arguments;
let later = function() {
let later = function () {
timeout = null;
func.apply(context, args);
};
@@ -54,3 +54,13 @@ function debounce(func, wait) {
func.apply(context, args);
};
}
function lum(c) {
c = c.substring(1);
let rgb = parseInt(c, 16);
let r = (rgb >> 16) & 0xff;
let g = (rgb >> 8) & 0xff;
let b = (rgb >> 0) & 0xff;
return 0.2126 * r + 0.7152 * g + 0.0722 * b;
}

View File

@@ -24,9 +24,9 @@
<div class="input-group">
<div class="input-group-prepend">
<div class="input-group-text">
<span onclick="document.getElementById('barToggle').click()">Must match&nbsp</span>
<input title="Toggle between 'Should' and 'Must' match mode" type="checkbox" id="barToggle"
onclick="toggleSearchBar()" checked>
<span title="Toggle fuzzy searching" onclick="document.getElementById('fuzzyToggle').click()">Fuzzy&nbsp</span>
<input title="Toggle fuzzy searching" type="checkbox" id="fuzzyToggle"
onclick="toggleFuzzy()" checked>
</div>
</div>
<input id="searchBar" type="search" class="form-control" placeholder="Search">
@@ -42,10 +42,24 @@
</div>
<div class="col">
<label>Mime types</label>
<div class="tree"></div>
<ul class="nav nav-tabs" role="tablist">
<li class="nav-item">
<a class="nav-link active" data-toggle="tab" href="#mime" role="tab" aria-controls="home" aria-selected="true">Mime Types</a>
</li>
<li class="nav-item">
<a class="nav-link" data-toggle="tab" href="#tag" role="tab" aria-controls="profile" aria-selected="false" title="User-defined tags">Tags</a>
</li>
</ul>
<div class="tab-content" id="myTabContent">
<div class="tab-pane fade show active" id="mime" role="tabpanel" aria-labelledby="home-tab">
<div id="mimeTree" class="tree"></div>
</div>
<div class="tab-pane fade" id="tag" role="tabpanel" aria-labelledby="profile-tab">
<div id="tagTree" class="tree"></div>
</div>
</div>
</div>
</div>
</div>
</div>