Compare commits

..

No commits in common. "master" and "v2.1.0" have entirely different histories.

296 changed files with 6482 additions and 65864 deletions

View File

@ -1,43 +0,0 @@
.idea/
*.sist2
docs/
test_i/
test_i_inc/
Testing/
.drone.yml
**/cmake_install.cmake
**/CMakeCache.txt
**/CMakeFiles/
.cmake
LICENSE
Makefile
**/*.md
**/*.cbp
VERSION
**/node_modules/
sist2-*-linux-debug
sist2-*-linux
sist2_debug
sist2
**/libscan-test-files
**/scan_ub_test
**/scan_a_test
**/scan_test
**/ext_ffmpeg
**/ext_libmobi
**/ext_libwpd
**/core
*.a
tmp_scan/
Dockerfile
Dockerfile.arm64
docker-compose.yml
state.db
*-journal
build/
__pycache__/
sist2-vue/dist
sist2-admin/frontend/dist
*.fts
.git
third-party/libscan/third-party/ext_*/*

View File

@ -1,108 +0,0 @@
kind: pipeline
type: docker
name: amd64
platform:
os: linux
arch: amd64
steps:
- name: submodules
image: alpine/git
commands:
- git submodule update --init --recursive
- name: docker
image: plugins/docker
depends_on:
- submodules
settings:
username:
from_secret: DOCKER_USER
password:
from_secret: DOCKER_PASSWORD
repo: sist2app/sist2
context: ./
dockerfile: ./Dockerfile
auto_tag: true
auto_tag_suffix: x64-linux
when:
event:
- tag
- name: build
image: sist2app/sist2-build
depends_on:
- submodules
commands:
- ./scripts/build.sh
- name: scp files
depends_on:
- build
image: appleboy/drone-scp
settings:
host:
from_secret: SSH_HOST
port:
from_secret: SSH_PORT
user:
from_secret: SSH_USER
key:
from_secret: SSH_KEY
target: ~/files/sist2/${DRONE_REPO_OWNER}_${DRONE_REPO_NAME}/${DRONE_BRANCH}_${DRONE_BUILD_NUMBER}_${DRONE_COMMIT}/
source:
- ./VERSION
- ./sist2-x64-linux
- ./sist2-x64-linux-debug
---
kind: pipeline
type: docker
name: arm64
platform:
arch: arm64
steps:
- name: submodules
image: alpine/git
commands:
- git submodule update --init --recursive
- name: docker
image: plugins/docker
depends_on:
- submodules
settings:
username:
from_secret: DOCKER_USER
password:
from_secret: DOCKER_PASSWORD
repo: sist2app/sist2
context: ./
dockerfile: ./Dockerfile.arm64
auto_tag: true
auto_tag_suffix: arm64-linux
when:
event:
- tag
- name: build
image: sist2app/sist2-build-arm64
depends_on:
- submodules
commands:
- ./scripts/build_arm64.sh
- name: scp files
depends_on:
- build
image: appleboy/drone-scp
settings:
host:
from_secret: SSH_HOST
port:
from_secret: SSH_PORT
user:
from_secret: SSH_USER
key:
from_secret: SSH_KEY
target: ~/files/sist2/${DRONE_REPO_OWNER}_${DRONE_REPO_NAME}/arm_${DRONE_BRANCH}_${DRONE_BUILD_NUMBER}_${DRONE_COMMIT}/
source:
- ./sist2-arm64-linux
- ./sist2-arm64-linux-debug

3
.gitattributes vendored Normal file
View File

@ -0,0 +1,3 @@
CMakeModules/* linguist-vendored
web/js/*.min.js linguist-vendored
web/css/*.min.css linguist-vendored

View File

@ -1,40 +0,0 @@
---
name: "🐞 Bug Report"
about: Submit a bug report
title: ''
labels: bug
assignees: ''
---
**Device Information (please complete the following information):**
- OS: `[e.g., Ubuntu 20.04, WSL2]`
- Deployment: `[Linux, Linux ARM64 or Docker]`
- Browser *(if relevant)*: `[e.g., chrome, safari]`
- SIST2 Version: `[e.g., v2.9.0]`
- Elasticsearch Version *(if relevant)* : ``
**Command with arguments**
<!-- `ex: "scan ~/Documents -o ./i2 --threads 3 -q 1.0` -->
**Describe the bug**
<!-- A clear and concise description of what the bug is. -->
**Steps To Reproduce**
Please be specific!
1. Go to '...'
2. Click on '....'
3. etc.
**Expected behavior**
<!-- A clear and concise description of what you expected to happen. -->
**Actual Behavior**
<!-- A clear and concise description of what actually happens. -->
**Screenshots**
<!-- If applicable, add screenshots to help explain your problem. -->
**Additional context**
<!-- Add any other context about the problem here. If applicable, please include why you think the bug is occurring and/or troubleshooting you have already performed. -->
<!-- If the issue is related to the `scan` module, please attach the files necessary to reproduce the error or email them to me[at]simon987.net. -->

View File

@ -1,5 +0,0 @@
blank_issues_enabled: false
contact_links:
- name: SIST2 Documentation
url: https://github.com/simon987/sist2/blob/master/docs/USAGE.md
about: Check out the SIST2 documentation for answers to common questions

View File

@ -1,18 +0,0 @@
---
name: "🚀 Feature Request"
about: Suggest an idea for SIST2
title: ''
assignees: ''
---
**Which SIST2 component is your Feature Request related to?**
<!-- e.g., Scan, Index, or Web? -->
**Is your feature request related to a problem? Please describe.**
<!-- A clear and concise description of what the problem is. e.g., "I'm always frustrated when [...]" -->
**What would you like to see happen?**
<!-- A clear and concise description of what you want to happen. -->
**Additional context**
<!-- Add any other context or screenshots about the feature request here. -->

View File

@ -9,9 +9,7 @@ assignees: ''
sist2 version:
Platform (Linux or Docker, x86-64 or arm64):
Elasticsearch version:
Platform (please indicate if you're using Docker):
Command with arguments: `ex: "scan ~/Documents -o ./i2 --threads 3 -q 1.0`

36
.gitignore vendored
View File

@ -1,49 +1,19 @@
.idea
thumbs
test
*.cbp
CMakeCache.txt
CMakeFiles
cmake-build-default-event-trace
cmake-build-debug
cmake_install.cmake
Makefile
*.out
LOG
sist2*
!sist2-vue/
!sist2-admin
!sist2_admin
!sist2.py
*.sist2/
index.sist2/
bundle*.css
bundle.js
*.a
vgcore.*
build/
third-party/argparse
*.idx/
VERSION
git_hash.h
Testing/
test_i
test_i_inc
node_modules/
.cmake/
i_inc/
state.db
*.pyc
!sist2-admin/frontend/dist
*.js.map
sist2-vue/dist
sist2-admin/frontend/dist
.ninja_deps
.ninja_log
build.ninja
src/web/static_generated.c
src/magic_generated.c
src/index/static_generated.c
*.sist2
*-shm
*-journal
.vscode
*.fts
third-party/

17
.gitmodules vendored
View File

@ -1,15 +1,6 @@
[submodule "third-party/libscan"]
path = third-party/libscan
url = https://github.com/simon987/libscan
[submodule "third-party/argparse"]
path = third-party/argparse
url = https://github.com/simon987/argparse
[submodule "third-party/libscan/third-party/utf8.h"]
path = third-party/libscan/third-party/utf8.h
url = https://github.com/sheredom/utf8.h
[submodule "third-party/libscan/third-party/antiword"]
path = third-party/libscan/third-party/antiword
url = https://github.com/simon987/antiword
[submodule "third-party/libscan/third-party/libmobi"]
path = third-party/libscan/third-party/libmobi
url = https://github.com/bfabiszewski/libmobi
[submodule "third-party/libscan/libscan-test-files"]
path = third-party/libscan/libscan-test-files
url = https://github.com/simon987/libscan-test-files
url = https://github.com/cofyc/argparse

69
.teamcity/settings.kts vendored Normal file
View File

@ -0,0 +1,69 @@
import jetbrains.buildServer.configs.kotlin.v2019_2.*
import jetbrains.buildServer.configs.kotlin.v2019_2.buildSteps.ExecBuildStep
import jetbrains.buildServer.configs.kotlin.v2019_2.buildSteps.exec
import jetbrains.buildServer.configs.kotlin.v2019_2.triggers.vcs
import jetbrains.buildServer.configs.kotlin.v2019_2.vcs.GitVcsRoot
/*
The settings script is an entry point for defining a TeamCity
project hierarchy. The script should contain a single call to the
project() function with a Project instance or an init function as
an argument.
VcsRoots, BuildTypes, Templates, and subprojects can be
registered inside the project using the vcsRoot(), buildType(),
template(), and subProject() methods respectively.
To debug settings scripts in command-line, run the
mvnDebug org.jetbrains.teamcity:teamcity-configs-maven-plugin:generate
command and attach your debugger to the port 8000.
To debug in IntelliJ Idea, open the 'Maven Projects' tool window (View
-> Tool Windows -> Maven Projects), find the generate task node
(Plugins -> teamcity-configs -> teamcity-configs:generate), the
'Debug' option is available in the context menu for the task.
*/
version = "2019.2"
project {
vcsRoot(HttpsGithubComSimon987sist2refsHeadsMaster)
buildType(Build)
}
object Build : BuildType({
name = "Build"
artifactRules = """
sist2
sist2_scan
""".trimIndent()
vcs {
root(HttpsGithubComSimon987sist2refsHeadsMaster)
}
steps {
exec {
name = "Build"
path = "./ci/build.sh"
dockerImage = "simon987/general_ci"
dockerImagePlatform = ExecBuildStep.ImagePlatform.Linux
dockerPull = true
}
}
triggers {
vcs {
}
}
})
object HttpsGithubComSimon987sist2refsHeadsMaster : GitVcsRoot({
name = "https://github.com/simon987/sist2#refs/heads/master"
url = "https://github.com/simon987/sist2"
})

View File

@ -1,31 +1,9 @@
cmake_minimum_required(VERSION 3.7)
project(sist2)
set(CMAKE_C_STANDARD 11)
project(sist2 C)
option(SIST_DEBUG "Build a debug executable" on)
option(SIST_FAST "Enable more optimisation flags" off)
option(SIST_DEBUG_INFO "Turn on debug information in web interface" on)
add_compile_definitions(
"SIST_PLATFORM=${SIST_PLATFORM}"
)
if (SIST_DEBUG)
add_compile_definitions(
"SIST_DEBUG=${SIST_DEBUG}"
)
set(VCPKG_BUILD_TYPE debug)
else ()
set(VCPKG_BUILD_TYPE release)
endif ()
if (SIST_DEBUG_INFO)
add_compile_definitions(
"SIST_DEBUG_INFO=${SIST_DEBUG_INFO}"
)
endif ()
add_subdirectory(third-party/libscan)
set(ARGPARSE_SHARED off)
@ -33,49 +11,37 @@ add_subdirectory(third-party/argparse)
add_executable(
sist2
# argparse
third-party/argparse/argparse.h third-party/argparse/argparse.c
src/main.c
src/sist.h
src/io/walk.h src/io/walk.c
src/io/store.h src/io/store.c
src/tpool.h src/tpool.c
src/parsing/parse.h src/parsing/parse.c
src/parsing/magic_util.c src/parsing/magic_util.h
src/io/serialize.h src/io/serialize.c
src/parsing/mime.h src/parsing/mime.c src/parsing/mime_generated.c
src/index/web.c src/index/web.h
src/web/serve.c src/web/serve.h
src/web/web_util.c src/web/web_util.h
src/index/elastic.c src/index/elastic.h
src/util.c src/util.h
src/ctx.c src/ctx.h
src/types.h
src/ctx.h src/types.h
src/log.c src/log.h
# argparse
third-party/argparse/argparse.h third-party/argparse/argparse.c
src/cli.c src/cli.h
src/database/database.c src/database/database.h
src/parsing/fs_util.h
src/auth0/auth0_c_api.h src/auth0/auth0_c_api.cpp
src/database/database_stats.c
src/database/database_schema.c
src/database/database_fts.c
src/web/web_fts.c
src/database/database_embeddings.c)
set_target_properties(sist2 PROPERTIES LINKER_LANGUAGE C)
)
target_link_directories(sist2 PRIVATE BEFORE ${_VCPKG_INSTALLED_DIR}/${VCPKG_TARGET_TRIPLET}/lib/)
set(CMAKE_FIND_LIBRARY_SUFFIXES .a .lib)
find_package(PkgConfig REQUIRED)
find_package(lmdb CONFIG REQUIRED)
find_package(cJSON CONFIG REQUIRED)
find_package(unofficial-glib CONFIG REQUIRED)
find_package(unofficial-mongoose CONFIG REQUIRED)
find_package(CURL CONFIG REQUIRED)
find_library(MAGIC_LIB NAMES libmagic.a REQUIRED)
find_package(unofficial-sqlite3 CONFIG REQUIRED)
find_package(OpenBLAS CONFIG REQUIRED)
find_library(UUID_LIB NAMES uuid)
#find_package(OpenSSL REQUIRED)
target_include_directories(
@ -90,6 +56,7 @@ target_compile_options(
sist2
PRIVATE
-fPIC
-Werror
)
if (SIST_DEBUG)
@ -100,41 +67,25 @@ if (SIST_DEBUG)
-fstack-protector
-fno-omit-frame-pointer
-fsanitize=address
-fno-inline
# -O2
)
target_link_options(
sist2
PRIVATE
-fsanitize=address
-static-libasan
# -static
)
set_target_properties(
sist2
PROPERTIES
OUTPUT_NAME sist2_debug
)
elseif (SIST_FAST)
target_compile_options(
sist2
PRIVATE
-Ofast
-march=native
-fno-stack-protector
-fomit-frame-pointer
-freciprocal-math
)
else ()
target_compile_options(
sist2
PRIVATE
-Ofast
# -g
-fno-stack-protector
-fomit-frame-pointer
-w
)
endif ()
@ -147,19 +98,19 @@ add_dependencies(
target_link_libraries(
sist2
m
z
lmdb
cjson
argparse
unofficial::glib::glib
unofficial::mongoose::mongoose
CURL::libcurl
# OpenSSL::SSL OpenSSL::Crypto
${UUID_LIB}
pthread
magic
scan
${MAGIC_LIB}
unofficial::sqlite3::sqlite3
OpenBLAS::OpenBLAS
)
add_custom_target(

22
Docker/Dockerfile Normal file
View File

@ -0,0 +1,22 @@
FROM ubuntu:19.10
MAINTAINER simon987 <me@simon987.net>
RUN apt update
RUN apt install -y libglib2.0-0 libcurl4 libmagic1 libharfbuzz-bin libopenjp2-7 libarchive13 liblzma5 libzstd1 liblz4-1 \
curl libtiff5 libpng16-16 libpcre3
RUN mkdir -p /usr/share/tessdata && \
cd /usr/share/tessdata/ && \
curl -o /usr/share/tessdata/hin.traineddata https://raw.githubusercontent.com/tesseract-ocr/tessdata/master/hin.traineddata &&\
curl -o /usr/share/tessdata/jpn.traineddata https://raw.githubusercontent.com/tesseract-ocr/tessdata/master/jpn.traineddata &&\
curl -o /usr/share/tessdata/eng.traineddata https://raw.githubusercontent.com/tesseract-ocr/tessdata/master/eng.traineddata &&\
curl -o /usr/share/tessdata/fra.traineddata https://raw.githubusercontent.com/tesseract-ocr/tessdata/master/fra.traineddata &&\
curl -o /usr/share/tessdata/rus.traineddata https://raw.githubusercontent.com/tesseract-ocr/tessdata/master/rus.traineddata &&\
curl -o /usr/share/tessdata/spa.traineddata https://raw.githubusercontent.com/tesseract-ocr/tessdata/master/spa.traineddata && ls -lh
ADD sist2 /root/sist2
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
ENTRYPOINT ["/root/sist2"]

14
Docker/build.sh Executable file
View File

@ -0,0 +1,14 @@
rm ./sist2 sist2_debug
cp ../sist2.gz .
gzip -d sist2.gz
strip sist2
version=$(./sist2 --version)
echo "Version ${version}"
docker build . -t simon987/sist2:${version} -t simon987/sist2:latest
docker push simon987/sist2:${version}
docker push simon987/sist2:latest
docker run --rm simon987/sist2 -v

View File

@ -1,52 +0,0 @@
FROM sist2app/sist2-build as build
WORKDIR /build/
COPY scripts scripts
COPY schema schema
COPY CMakeLists.txt .
COPY third-party third-party
COPY src src
COPY sist2-vue sist2-vue
COPY sist2-admin sist2-admin
RUN cd sist2-vue/ && npm install && npm run build
RUN cd sist2-admin/frontend/ && npm install && npm run build
RUN mkdir build && cd build && cmake -DSIST_PLATFORM=x64_linux_docker -DSIST_DEBUG_INFO=on -DSIST_DEBUG=off -DBUILD_TESTS=off -DCMAKE_TOOLCHAIN_FILE=/vcpkg/scripts/buildsystems/vcpkg.cmake ..
RUN cd build && make -j$(nproc)
RUN strip build/sist2 || mv build/sist2_debug build/sist2
FROM --platform="linux/amd64" ubuntu@sha256:965fbcae990b0467ed5657caceaec165018ef44a4d2d46c7cdea80a9dff0d1ea
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
ENTRYPOINT ["/root/sist2"]
RUN apt update && DEBIAN_FRONTEND=noninteractive apt install -y curl libasan5 libmagic1 python3 \
python3-pip git tesseract-ocr && rm -rf /var/lib/apt/lists/*
RUN mkdir -p /usr/share/tessdata && \
cd /usr/share/tessdata/ && \
curl -o /usr/share/tesseract-ocr/4.00/tessdata/hin.traineddata https://raw.githubusercontent.com/tesseract-ocr/tessdata/master/hin.traineddata &&\
curl -o /usr/share/tesseract-ocr/4.00/tessdata/jpn.traineddata https://raw.githubusercontent.com/tesseract-ocr/tessdata/master/jpn.traineddata &&\
curl -o /usr/share/tesseract-ocr/4.00/tessdata/eng.traineddata https://raw.githubusercontent.com/tesseract-ocr/tessdata/master/eng.traineddata &&\
curl -o /usr/share/tesseract-ocr/4.00/tessdata/fra.traineddata https://raw.githubusercontent.com/tesseract-ocr/tessdata/master/fra.traineddata &&\
curl -o /usr/share/tesseract-ocr/4.00/tessdata/rus.traineddata https://raw.githubusercontent.com/tesseract-ocr/tessdata/master/rus.traineddata &&\
curl -o /usr/share/tesseract-ocr/4.00/tessdata/osd.traineddata https://raw.githubusercontent.com/tesseract-ocr/tessdata/master/osd.traineddata &&\
curl -o /usr/share/tesseract-ocr/4.00/tessdata/spa.traineddata https://raw.githubusercontent.com/tesseract-ocr/tessdata/master/spa.traineddata &&\
curl -o /usr/share/tesseract-ocr/4.00/tessdata/deu.traineddata https://raw.githubusercontent.com/tesseract-ocr/tessdata/master/deu.traineddata &&\
curl -o /usr/share/tesseract-ocr/4.00/tessdata/equ.traineddata https://raw.githubusercontent.com/tesseract-ocr/tessdata/master/equ.traineddata &&\
curl -o /usr/share/tesseract-ocr/4.00/tessdata/pol.traineddata https://raw.githubusercontent.com/tesseract-ocr/tessdata/master/pol.traineddata &&\
curl -o /usr/share/tesseract-ocr/4.00/tessdata/chi_sim.traineddata https://raw.githubusercontent.com/tesseract-ocr/tessdata/master/chi_sim.traineddata
# sist2
COPY --from=build /build/build/sist2 /root/sist2
# sist2-admin
WORKDIR /root/sist2-admin
COPY sist2-admin/requirements.txt /root/sist2-admin/
RUN ln /usr/bin/python3 /usr/bin/python
RUN python -m pip install --no-cache -r /root/sist2-admin/requirements.txt
COPY --from=build /build/sist2-admin/ /root/sist2-admin/

View File

@ -1,53 +0,0 @@
FROM sist2app/sist2-build-arm64 as build
WORKDIR /build/
COPY scripts scripts
COPY schema schema
COPY CMakeLists.txt .
COPY third-party third-party
COPY src src
COPY sist2-vue sist2-vue
COPY sist2-admin sist2-admin
RUN cd sist2-vue/ && npm install && npm run build
RUN cd sist2-admin/frontend/ && npm install && npm run build
WORKDIR /build/
ADD . /build/
RUN mkdir build && cd build && cmake -DSIST_PLATFORM=arm64_linux_docker -DSIST_DEBUG_INFO=on -DSIST_DEBUG=off -DBUILD_TESTS=off -DCMAKE_TOOLCHAIN_FILE=/vcpkg/scripts/buildsystems/vcpkg.cmake ..
RUN cd build && make -j$(nproc)
RUN strip build/sist2 || mv build/sist2_debug build/sist2
FROM --platform=linux/arm64/v8 ubuntu@sha256:537da24818633b45fcb65e5285a68c3ec1f3db25f5ae5476a7757bc8dfae92a3
WORKDIR /root
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
ENTRYPOINT ["/root/sist2"]
RUN apt update && apt install -y curl libasan5 libmagic1 tesseract-ocr python3-pip python3 git && rm -rf /var/lib/apt/lists/*
RUN mkdir -p /usr/share/tessdata && \
cd /usr/share/tessdata/ && \
curl -o /usr/share/tesseract-ocr/4.00/tessdata/hin.traineddata https://raw.githubusercontent.com/tesseract-ocr/tessdata/master/hin.traineddata &&\
curl -o /usr/share/tesseract-ocr/4.00/tessdata/jpn.traineddata https://raw.githubusercontent.com/tesseract-ocr/tessdata/master/jpn.traineddata &&\
curl -o /usr/share/tesseract-ocr/4.00/tessdata/eng.traineddata https://raw.githubusercontent.com/tesseract-ocr/tessdata/master/eng.traineddata &&\
curl -o /usr/share/tesseract-ocr/4.00/tessdata/fra.traineddata https://raw.githubusercontent.com/tesseract-ocr/tessdata/master/fra.traineddata &&\
curl -o /usr/share/tesseract-ocr/4.00/tessdata/rus.traineddata https://raw.githubusercontent.com/tesseract-ocr/tessdata/master/rus.traineddata &&\
curl -o /usr/share/tesseract-ocr/4.00/tessdata/osd.traineddata https://raw.githubusercontent.com/tesseract-ocr/tessdata/master/osd.traineddata &&\
curl -o /usr/share/tesseract-ocr/4.00/tessdata/spa.traineddata https://raw.githubusercontent.com/tesseract-ocr/tessdata/master/spa.traineddata &&\
curl -o /usr/share/tesseract-ocr/4.00/tessdata/deu.traineddata https://raw.githubusercontent.com/tesseract-ocr/tessdata/master/deu.traineddata &&\
curl -o /usr/share/tesseract-ocr/4.00/tessdata/equ.traineddata https://raw.githubusercontent.com/tesseract-ocr/tessdata/master/equ.traineddata &&\
curl -o /usr/share/tesseract-ocr/4.00/tessdata/pol.traineddata https://raw.githubusercontent.com/tesseract-ocr/tessdata/master/pol.traineddata &&\
curl -o /usr/share/tesseract-ocr/4.00/tessdata/chi_sim.traineddata https://raw.githubusercontent.com/tesseract-ocr/tessdata/master/chi_sim.traineddata
# sist2
COPY --from=build /build/build/sist2 /root/sist2
# sist2-admin
COPY sist2-admin/requirements.txt sist2-admin/
RUN python3 -m pip install --no-cache -r sist2-admin/requirements.txt
COPY --from=build /build/sist2-admin/ sist2-admin/

239
README.md
View File

@ -1,10 +1,6 @@
![GitHub](https://img.shields.io/github/license/sist2app/sist2.svg)
[![CodeFactor](https://www.codefactor.io/repository/github/sist2app/sist2/badge?s=05daa325188aac4eae32c786f3d9cf4e0593f822)](https://www.codefactor.io/repository/github/sist2app/sist2)
[![Development snapshots](https://ci.simon987.net/api/badges/simon987/sist2/status.svg)](https://files.simon987.net/.gate/sist2/simon987_sist2/)
**Demo**: [sist2.simon987.net](https://sist2.simon987.net/)
**Community URL:** [Discord](https://discord.gg/2PEjDy3Rfs)
![GitHub](https://img.shields.io/github/license/simon987/sist2.svg)
[![CodeFactor](https://www.codefactor.io/repository/github/simon987/sist2/badge?s=05daa325188aac4eae32c786f3d9cf4e0593f822)](https://www.codefactor.io/repository/github/simon987/sist2)
[![Development snapshots](https://ci.simon987.net/app/rest/builds/buildType(Sist2_Build)/statusIcon)](https://files.simon987.net/artifacts/Sist2/Build/)
# sist2
@ -12,218 +8,123 @@ sist2 (Simple incremental search tool)
*Warning: sist2 is in early development*
![search panel](docs/sist2.gif)
![sist2.png](docs/sist2.png)
## Features
* Fast, low memory usage, multi-threaded
* Manage & schedule scan jobs with simple web interface (Docker only)
* Mobile-friendly Web interface
* Extracts text and metadata from common file types \*
* Portable (all its features are packaged in a single executable)
* Extracts text from common file types \*
* Generates thumbnails \*
* Incremental scanning
* Manual tagging from the UI and automatic tagging based on file attributes via [user scripts](docs/scripting.md)
* Automatic tagging from file attributes via [user scripts](scripting/README.md)
* Recursive scan inside archive files \*\*
* OCR support with tesseract \*\*\*
* Stats page & disk utilisation visualization
* Named-entity recognition (client-side) \*\*\*\*
\* See [format support](#format-support)
\*\* See [Archive files](#archive-files)
\*\*\* See [OCR](#ocr)
\*\*\*\* See [Named-Entity Recognition](#NER)
## Getting Started
### Using Docker Compose *(Windows/Linux/Mac)*
1. Have an Elasticsearch (>= 6.X.X) instance running
1. Download [from official website](https://www.elastic.co/downloads/elasticsearch)
1. *(or)* Run using docker:
```bash
docker run -d --name es1 --net sist2_net -p 9200:9200 \
-e "discovery.type=single-node" elasticsearch:7.5.2
```
1. *(or)* Run using docker-compose:
```yaml
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.5.2
environment:
- discovery.type=single-node
- "ES_JAVA_OPTS=-Xms1G -Xmx2G"
```
1. Download sist2 executable
1. Download the [latest sist2 release](https://github.com/simon987/sist2/releases) *
1. *(or)* Download a [development snapshot](https://files.simon987.net/artifacts/Sist2/Build/) *(Not recommended!)*
1. *(or)* `docker pull simon987/sist2:latest`
```yaml
services:
elasticsearch:
image: elasticsearch:7.17.9
restart: unless-stopped
volumes:
# This directory must have 1000:1000 permissions (or update PUID & PGID below)
- /data/sist2-es-data/:/usr/share/elasticsearch/data
environment:
- "discovery.type=single-node"
- "ES_JAVA_OPTS=-Xms2g -Xmx2g"
- "PUID=1000"
- "PGID=1000"
sist2-admin:
image: sist2app/sist2:x64-linux
restart: unless-stopped
volumes:
- /data/sist2-admin-data/:/sist2-admin/
- /<path to index>/:/host
ports:
- 4090:4090
# NOTE: Don't expose this port publicly!
- 8080:8080
working_dir: /root/sist2-admin/
entrypoint: python3
command:
- /root/sist2-admin/sist2_admin/app.py
```
1. See [Usage guide](DOCS/USAGE.md)
Navigate to http://localhost:8080/ to configure sist2-admin.
\* *Windows users*: **sist2** runs under [WSL](https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux)
### Using the executable file *(Linux/WSL only)*
1. Choose search backend (See [comparison](#search-backends)):
* **Elasticsearch**: have an Elasticsearch (version >= 6.8.X, ideally >=7.14.0) instance running
1. Download [from official website](https://www.elastic.co/downloads/elasticsearch)
2. *(or)* Run using docker:
```bash
docker run -d -p 9200:9200 -e "discovery.type=single-node" elasticsearch:7.17.9
```
* **SQLite**: No installation required
## Example usage
2. Download the [latest sist2 release](https://github.com/sist2app/sist2/releases).
Select the file corresponding to your CPU architecture and mark the binary as executable with `chmod +x`.
3. See [usage guide](docs/USAGE.md) for command line usage.
See [Usage guide](DOCS/USAGE.md) for more details
Example usage:
1. Scan a directory: `sist2 scan ~/Documents -o ./docs_idx`
1. Push index to Elasticsearch: `sist2 index ./docs_idx`
1. Start web interface: `sist2 web ./docs_idx`
1. Scan a directory: `sist2 scan ~/Documents --output ./documents.sist2`
2. Prepare search index:
* **Elasticsearch**: `sist2 index --es-url http://localhost:9200 ./documents.sist2`
* **SQLite**: `sist2 sqlite-index --search-index ./search.sist2 ./documents.sist2`
3. Start web interface:
* **Elasticsearch**: `sist2 web ./documents.sist2`
* **SQLite**: `sist2 web --search-index ./search.sist2 ./documents.sist2`
## Format support
| File type | Library | Content | Thumbnail | Metadata |
|:--------------------------------------------------------------------------|:-----------------------------------------------------------------------------|:---------|:------------|:---------------------------------------------------------------------------------------------------------------------------------------|
| pdf,xps,fb2,epub | MuPDF | text+ocr | yes | author, title |
| cbz,cbr | [libscan](https://github.com/sist2app/sist2/tree/master/third-party/libscan) | - | yes | - |
| `audio/*` | ffmpeg | - | yes | ID3 tags |
| `video/*` | ffmpeg | - | yes | title, comment, artist |
| `image/*` | ffmpeg | ocr | yes | [Common EXIF tags](https://github.com/sist2app/sist2/blob/efdde2734eca9b14a54f84568863b7ffd59bdba3/src/parsing/media.c#L190), GPS tags |
| raw, rw2, dng, cr2, crw, dcr, k25, kdc, mrw, pef, xf3, arw, sr2, srf, erf | LibRaw | no | yes | Common EXIF tags, GPS tags |
| ttf,ttc,cff,woff,fnt,otf | Freetype2 | - | yes, `bmp` | Name & style |
| `text/plain` | [libscan](https://github.com/sist2app/sist2/tree/master/third-party/libscan) | yes | no | - |
| html, xml | [libscan](https://github.com/sist2app/sist2/tree/master/third-party/libscan) | yes | no | - |
| tar, zip, rar, 7z, ar ... | Libarchive | yes\* | - | no |
| docx, xlsx, pptx | [libscan](https://github.com/sist2app/sist2/tree/master/third-party/libscan) | yes | if embedded | creator, modified_by, title |
| doc (MS Word 97-2003) | antiword | yes | no | author, title |
| mobi, azw, azw3 | libmobi | yes | yes | author, title |
| wpd (WordPerfect) | libwpd | yes | no | *planned* |
| json, jsonl, ndjson | [libscan](https://github.com/sist2app/sist2/tree/master/third-party/libscan) | yes | - | - |
File type | Library | Content | Thumbnail | Metadata
:---|:---|:---|:---|:---
pdf,xps,cbz,cbr,fb2,epub | MuPDF | text+ocr | yes, `png` | title |
`audio/*` | ffmpeg | - | yes, `jpeg` | ID3 tags |
`video/*` | ffmpeg | - | yes, `jpeg` | title, comment, artist |
`image/*` | ffmpeg | - | yes, `jpeg` | [Common EXIF tags](https://github.com/simon987/sist2/blob/efdde2734eca9b14a54f84568863b7ffd59bdba3/src/parsing/media.c#L190) |
ttf,ttc,cff,woff,fnt,otf | Freetype2 | - | yes, `bmp` | Name & style |
`text/plain` | *(none)* | yes | no | - |
tar, zip, rar, 7z, ar ... | Libarchive | yes\* | - | no |
docx, xlsx, pptx | *(none)* | yes | no | creator, modified_by, title |
mobi, azw, azw3 | libmobi | yes | no | author, title |
\* *See [Archive files](#archive-files)*
### Archive files
**sist2** will scan files stored into archive files (zip, tar, 7z...) as if they were directly in the file system.
Recursive (archives inside archives)
**sist2** will scan files stored into archive files (zip, tar, 7z...) as if
they were directly in the file system. Recursive (archives inside archives)
scan is also supported.
**Limitations**:
* Support for parsing media files with formats that require *seek* (e.g. `.gif`, `.mp4` w/ fragmented metadata etc.)
is limitted (see `--mem-buffer` option)
* Parsing media files with formats that require
*seek* (e.g. `.gif`, `.mp4` w/ fragmented metadata etc.) is not supported.
* Archive files are scanned sequentially, by a single thread. On systems where
**sist2** is not I/O bound, scans might be faster when larger archives are split into smaller parts.
**sist2** is not I/O bound, scans might be faster when larger archives are split
into smaller parts.
To check if a media file can be parsed without *seek*, execute `cat file.mp4 | ffprobe -`
### OCR
You can enable OCR support for ebook (pdf,xps,fb2,epub) or image file types with the
`--ocr-lang <lang>` option in combination with `--ocr-images` and/or `--ocr-ebooks`.
Download the language data files with your package manager (`apt install tesseract-ocr-eng`) or
directly [from Github](https://github.com/tesseract-ocr/tesseract/wiki/Data-Files).
You can enable OCR support for pdf,xps,cbz,cbr,fb2,epub file types with the
`--ocr <lang>` option. Download the language data files with your
package manager (`apt install tesseract-ocr-eng`) or directly [from Github](https://github.com/tesseract-ocr/tesseract/wiki/Data-Files).
The `sist2app/sist2` image comes with common languages
(hin, jpn, eng, fra, rus, spa, chi_sim, deu, pol) pre-installed.
You can use the `+` separator to specify multiple languages. The language
name must be identical to the `*.traineddata` file installed on your system
(use `chi_sim` rather than `chi-sim`).
Examples:
The `simon987/sist2` image comes with common languages
(hin, jpn, eng, fra, rus, spa) pre-installed.
Examples
```bash
sist2 scan --ocr-ebooks --ocr-lang jpn ~/Books/Manga/
sist2 scan --ocr-images --ocr-lang eng ~/Images/Screenshots/
sist2 scan --ocr-ebooks --ocr-images --ocr-lang eng+chi_sim ~/Chinese-Bilingual/
sist2 scan --ocr jpn ~/Books/Manga/
sist2 scan --ocr eng ~/Books/Textbooks/
```
### Search backends
sist2 v3.0.7+ supports SQLite search backend. The SQLite search backend has
fewer features and generally comparable query performance for medium-size
indices, but it uses much less memory and is easier to set up.
| | SQLite | Elasticsearch |
|----------------------------------------------|:---------------------------------------------------:|:-------------------------------------------------------------------------------------------------------------------------------------:|
| Requires separate search engine installation | | ✓ |
| Memory footprint | ~20MB | >500MB |
| Query syntax | [fts5](https://www.sqlite.org/fts5.html) | [query_string](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#query-string-syntax) |
| Fuzzy search | | ✓ |
| Media Types tree real-time updating | | ✓ |
| Manual tagging | ✓ | ✓ |
| User scripts | ✓ | ✓ |
| Media Type breakdown for search results | | ✓ |
| Embeddings search | ✓ *O(n)* | ✓ *O(logn)* |
### NER
sist2 v3.0.4+ supports named-entity recognition (NER). Simply add a supported repository URL to
**Configuration** > **Machine learning options** > **Model repositories**
to enable it.
The text processing is done in your browser, no data is sent to any third-party services.
See [sist2app/sist2-ner-models](https://github.com/sist2app/sist2-ner-models) for more details.
#### List of available repositories:
| URL | Maintainer | Purpose |
|---------------------------------------------------------------------------------------------------------|-----------------------------------------|---------|
| [sist2app/sist2-ner-models](https://raw.githubusercontent.com/sist2app/sist2-ner-models/main/repo.json) | [sist2app](https://github.com/sist2app) | General |
<details>
<summary>Screenshot</summary>
![ner](docs/ner.png)
</details>
## Build from source
You can compile **sist2** by yourself if you don't want to use the pre-compiled binaries
### Using docker
```bash
git clone --recursive https://github.com/sist2app/sist2/
cd sist2
docker build . -t my-sist2-image
# Copy sist2 executable from docker image
docker run --rm --entrypoint cat my-sist2-image /root/sist2 > sist2-x64-linux
```
### Using a linux computer
You can compile **sist2** by yourself if you don't want to use the pre-compiled
binaries (GCC 7+ required).
1. Install compile-time dependencies
```bash
apt install gcc g++ python3 yasm ragel automake autotools-dev wget libtool libssl-dev curl zip unzip tar xorg-dev libglu1-mesa-dev libxcursor-dev libxml2-dev libxinerama-dev gettext nasm git nodejs
vcpkg install lmdb cjson glib libarchive[core,bzip2,libxml2,lz4,lzma,lzo] pthread tesseract libxml2 ffmpeg zstd gtest mongoose libuuid libmagic
```
2. Install vcpkg using my fork: https://github.com/sist2app/vcpkg
3. Install vcpkg dependencies
2. Build
```bash
vcpkg install openblas curl[core,openssl] sqlite3[core,fts5,json1] cpp-jwt pcre cjson brotli libarchive[core,bzip2,libxml2,lz4,lzma,lzo] pthread tesseract libxml2 libmupdf[ocr] gtest mongoose libmagic libraw gumbo ffmpeg[core,avcodec,avformat,swscale,swresample,webp,opus,mp3lame,vpx,zlib]
```
4. Build
```bash
git clone --recursive https://github.com/sist2app/sist2/
(cd sist2-vue; npm install; npm run build)
(cd sist2-admin/frontend; npm install; npm run build)
cmake -DSIST_DEBUG=off -DCMAKE_TOOLCHAIN_FILE=<VCPKG_ROOT>/scripts/buildsystems/vcpkg.cmake .
git clone --recursive https://github.com/simon987/sist2/
cmake -D <VCPKG_ROOT>/scripts/buildsystems/vcpkg.cmake .
make
```

16
ci/build.sh Executable file
View File

@ -0,0 +1,16 @@
#!/usr/bin/env bash
rm *.gz
rm -rf CMakeFiles CMakeCache.txt
cmake -DSIST_DEBUG=off -DCMAKE_TOOLCHAIN_FILE=/vcpkg/scripts/buildsystems/vcpkg.cmake .
make
strip sist2
gzip -9 sist2
rm -rf CMakeFiles CMakeCache.txt
cmake -DSIST_DEBUG=on -DCMAKE_TOOLCHAIN_FILE=/vcpkg/scripts/buildsystems/vcpkg.cmake .
make
cp /usr/lib/x86_64-linux-gnu/libasan.so.2.0.0 libasan.so.2
tar -czf sist2_debug.tar.gz sist2_debug libasan.so.2

View File

@ -1,7 +0,0 @@
install:
install sist2-update-all.sh /usr/bin/sist2-update-all.sh
install sist2-update-files.sh /usr/bin/sist2-update-files.sh
install sist2-update-nextcloud.sh /usr/bin/sist2-update-nextcloud.sh
install sist2-update.service /etc/systemd/system/sist2-update.service
install sist2-update.timer /etc/systemd/system/sist2-update.timer
systemctl daemon-reload

View File

@ -1,31 +0,0 @@
# Systemd integration example
This example contains my (yatli) personal configuration for sist2 auto-updating.
The following indices are involved in this configuration:
| Index | Path | Description |
|-----------|------------------|--------------------------------------------|
| files | /zpool/files | Main file repository |
| nextcloud | /zpool/nextcloud | Externally synchronized to a cloud account |
The systemd integration achieves automatic sist2 scanning & indexing everyday at 3:00AM.
### Tailoring the configuration for yourself
`sist2-update-all.sh` calls update scripts for each sist2 index. Add or remove
update scripts accordingly to suit your need. Each update script (e.g.
`sist2-update-files.sh`) has important parameters laid down at the beginning so
make sure to edit them to point to your files and index locations.
### Installation
```bash
# install the services and scripts
sudo make install
# enable & start the timer
sudo systemctl enable sist2-update.timer
sudo systemctl start sist2-update.timer
# verify that the timer has been enabled
systemctl list-timers --all
```

View File

@ -1,9 +0,0 @@
#!/bin/bash
set -e
__dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
echo "Update index: Files"
source ${__dir}/sist2-update-files.sh
echo "Update index: Nextcloud"
source ${__dir}/sist2-update-nextcloud.sh
echo "Done. Restarting sist2."
docker restart sist2-sist2-1

View File

@ -1,34 +0,0 @@
#!/bin/bash
set -e
DATE=$(date +%Y_%m_%d)
CONTENT=/zpool/files
ORIG=/mnt/ssd/sist-index/files.idx
NEW=/mnt/ssd/sist-index/files_$DATE.idx
EXCLUDE='ZArchives|TorrentStore|TorrentDownload|624f0c59-1fef-44f6-95e9-7483296f2833|ubuntu-full-2021-12-07'
NAME=Files
#REWRITE_URL="http://localhost:33333/activate?collection=$NAME&path="
REWRITE_URL=""
sist2 scan \
--threads 14 \
--mem-throttle 32768 \
--thumbnail-quality 2 \
--name $NAME \
--ocr-lang=eng+chi_sim \
--ocr-ebooks \
--ocr-images \
--exclude=$EXCLUDE \
--rewrite-url=$REWRITE_URL \
--incremental=$ORIG \
--output=$NEW \
$CONTENT
echo ">>> Scan complete"
rm -rf $ORIG
mv $NEW $ORIG
unset http_proxy
unset https_proxy
unset HTTP_PROXY
unset HTTPS_PROXY
sist2 index $ORIG --incremental-index
echo ">>> Index complete"

View File

@ -1,33 +0,0 @@
#!/bin/bash
set -e
DATE=$(date +%Y_%m_%d)
CONTENT=/zpool/nextcloud/v-yadli
ORIG=/mnt/ssd/sist-index/nextcloud.idx
NEW=/mnt/ssd/sist-index/nextcloud_$DATE.idx
EXCLUDE='Yatao|.*263418493\\/Image\\/.*'
NAME=NextCloud
# REWRITE_URL="http://localhost:33333/activate?collection=$NAME&path="
REWRITE_URL=""
sist2 scan \
--threads 14 \
--mem-throttle 32768 \
--thumbnail-quality 2 \
--name $NAME \
--ocr-lang=eng+chi_sim \
--ocr-ebooks \
--ocr-images \
--exclude=$EXCLUDE \
--rewrite-url=$REWRITE_URL \
--incremental=$ORIG \
--output=$NEW \
$CONTENT
echo ">>> Scan complete"
rm -rf $ORIG
mv $NEW $ORIG
unset http_proxy
unset https_proxy
unset HTTP_PROXY
unset HTTPS_PROXY
sist2 index $ORIG --incremental-index

View File

@ -1,6 +0,0 @@
[Unit]
Description=sist2-update
[Service]
User=yatli
ExecStart=/bin/bash /usr/bin/sist2-update-all.sh

View File

@ -1,10 +0,0 @@
[Unit]
Description=sist2-update
[Timer]
OnCalendar=*-*-* 3:00:00
Persistent=true
Unit=sist2-update.service
[Install]
WantedBy=timers.target

View File

@ -1,29 +0,0 @@
version: "3"
services:
elasticsearch:
image: elasticsearch:7.17.9
container_name: sist2-es
volumes:
# This directory must have 1000:1000 permissions (or update PUID & PGID below)
- /data/sist2-es-data/:/usr/share/elasticsearch/data
environment:
- "discovery.type=single-node"
- "ES_JAVA_OPTS=-Xms2g -Xmx2g"
- "PUID=1000"
- "PGID=1000"
sist2-admin:
build:
context: .
container_name: sist2-admin
volumes:
- /data/sist2-admin-data/:/sist2-admin/
- /<path to index>/:/host
ports:
- 4090:4090
# NOTE: Don't export this port publicly!
- 8080:8080
working_dir: /root/sist2-admin/
entrypoint: python3
command:
- /root/sist2-admin/sist2_admin/app.py

View File

@ -1,89 +1,99 @@
# Usage
*More examples (specifically with docker/compose) are in progress*
* [scan](#scan)
* [options](#scan-options)
* [examples](#scan-examples)
* [index format](#index-format)
* [index](#index)
* [options](#index-options)
* [examples](#index-examples)
* [web](#web)
* [options](#web-options)
* [examples](#web-examples)
* [rewrite_url](#rewrite_url)
* [link to specific indices](#link-to-specific-indices)
```
Usage: sist2 scan [OPTION]... PATH
or: sist2 index [OPTION]... INDEX
or: sist2 sqlite-index [OPTION]... INDEX
or: sist2 web [OPTION]... INDEX...
Lightning-fast file system indexer and search tool.
-h, --help show this help message and exit
-v, --version Print version and exit.
--verbose Turn on logging.
--very-verbose Turn on debug messages.
--json-logs Output logs in JSON format.
-h, --help show this help message and exit
-v, --version Show version and exit
--verbose Turn on logging
--very-verbose Turn on debug messages
Scan options
-t, --threads=<int> Number of threads. DEFAULT: 1
-q, --thumbnail_count-quality=<int> Thumbnail quality, on a scale of 0 to 100, 100 being the best. DEFAULT: 50
--thumbnail_count-size=<int> Thumbnail size, in pixels. DEFAULT: 552
--thumbnail_count-count=<int> Number of thumbnails to generate. Set a value > 1 to create video previews, set to 0 to disable thumbnails. DEFAULT: 1
--content-size=<int> Number of bytes to be extracted from text documents. Set to 0 to disable. DEFAULT: 32768
-o, --output=<str> Output index file path. DEFAULT: index.sist2
--incremental If the output file path exists, only scan new or modified files.
--optimize-index Defragment index file after scan to reduce its file size.
--rewrite-url=<str> Serve files from this url instead of from disk.
--name=<str> Index display name. DEFAULT: index
--depth=<int> Scan up to DEPTH subdirectories deep. Use 0 to only scan files in PATH. DEFAULT: -1
--archive=<str> Archive file mode (skip|list|shallow|recurse). skip: don't scan, list: only save file names as text, shallow: don't scan archives inside archives. DEFAULT: recurse
--archive-passphrase=<str> Passphrase for encrypted archive files
--ocr-lang=<str> Tesseract language (use 'tesseract --list-langs' to see which are installed on your machine)
--ocr-images Enable OCR'ing of image files.
--ocr-ebooks Enable OCR'ing of ebook files.
-e, --exclude=<str> Files that match this regex will not be scanned.
--fast Only index file names & mime type.
--treemap-threshold=<str> Relative size threshold for treemap (see USAGE.md). DEFAULT: 0.0005
--mem-buffer=<int> Maximum memory buffer size per thread in MiB for files inside archives (see USAGE.md). DEFAULT: 2000
--read-subtitles Read subtitles from media files.
--fast-epub Faster but less accurate EPUB parsing (no thumbnails, metadata).
--checksums Calculate file checksums when scanning.
--list-file=<str> Specify a list of newline-delimited paths to be scanned instead of normal directory traversal. Use '-' to read from stdin.
-t, --threads=<int> Number of threads. DEFAULT=1
-q, --quality=<flt> Thumbnail quality, on a scale of 1.0 to 31.0, 1.0 being the best. DEFAULT=5
--size=<int> Thumbnail size, in pixels. Use negative value to disable. DEFAULT=500
--content-size=<int> Number of bytes to be extracted from text documents. Use negative value to disable. DEFAULT=32768
--incremental=<str> Reuse an existing index and only scan modified files.
-o, --output=<str> Output directory. DEFAULT=index.sist2/
--rewrite-url=<str> Serve files from this url instead of from disk.
--name=<str> Index display name. DEFAULT: (name of the directory)
--depth=<int> Scan up to DEPTH subdirectories deep. Use 0 to only scan files in PATH. DEFAULT: -1
--archive=<str> Archive file mode (skip|list|shallow|recurse). skip: Don't parse, list: only get file names as text, shallow: Don't parse archives inside archives. DEFAULT: recurse
--ocr=<str> Tesseract language (use tesseract --list-langs to see which are installed on your machine)
-e, --exclude=<str> Files that match this regex will not be scanned
--fast Only index file names & mime type
Index options
-t, --threads=<int> Number of threads. DEFAULT: 1
--es-url=<str> Elasticsearch url with port. DEFAULT: http://localhost:9200
--es-insecure-ssl Do not verify SSL connections to Elasticsearch.
--es-index=<str> Elasticsearch index name. DEFAULT: sist2
-p, --print Print JSON documents to stdout instead of indexing to elasticsearch.
--incremental-index Conduct incremental indexing. Assumes that the old index is already ingested in Elasticsearch.
--script-file=<str> Path to user script.
--mappings-file=<str> Path to Elasticsearch mappings.
--settings-file=<str> Path to Elasticsearch settings.
--async-script Execute user script asynchronously.
--batch-size=<int> Index batch size. DEFAULT: 70
-f, --force-reset Reset Elasticsearch mappings and settings.
sqlite-index options
--search-index=<str> Path to search index. Will be created if it does not exist yet.
--es-url=<str> Elasticsearch url with port. DEFAULT=http://localhost:9200
-p, --print Just print JSON documents to stdout.
--script-file=<str> Path to user script.
--batch-size=<int> Index batch size. DEFAULT: 100
-f, --force-reset Reset Elasticsearch mappings and settings. (You must use this option the first time you use the index command)
Web options
--es-url=<str> Elasticsearch url. DEFAULT: http://localhost:9200
--es-insecure-ssl Do not verify SSL connections to Elasticsearch.
--search-index=<str> Path to SQLite search index.
--es-index=<str> Elasticsearch index name. DEFAULT: sist2
--bind=<str> Listen for connections on this address. DEFAULT: localhost:4090
--auth=<str> Basic auth in user:password format
--auth0-audience=<str> API audience/identifier
--auth0-domain=<str> Application domain
--auth0-client-id=<str> Application client ID
--auth0-public-key-file=<str> Path to Auth0 public key file extracted from <domain>/pem
--tag-auth=<str> Basic auth in user:password format for tagging
--tagline=<str> Tagline in navbar
--dev Serve html & js files from disk (for development)
--lang=<str> Default UI language. Can be changed by the user
--es-url=<str> Elasticsearch url. DEFAULT=http://localhost:9200
--bind=<str> Listen on this address. DEFAULT=localhost
--port=<str> Listen on this port. DEFAULT=4090
--auth=<str> Basic auth in user:password format
Made by simon987 <me@simon987.net>. Released under GPL-3.0
```
#### Thumbnail database size estimation
## Scan
See chart below for rough estimate of thumbnail_count size vs. thumbnail_count size & quality arguments:
### Scan options
For example, `--thumbnail_count-size=500`, `--thumbnail_count-quality=50` for a directory with 8 million images will create a thumbnail_count database
that is about `8000000 * 11.8kB = 94.4GB`.
![thumbnail_size](thumbnail_size.png)
* `-t, --threads`
Number of threads for file parsing. **Do not set a number higher than `$(nproc)`!**.
* `-q, --quality`
Thumbnail quality, on a scale of 1.0 to 31.0, 1.0 being the best. *Does not affect PDF thumbnails quality*
* `--size`
Thumbnail size in pixels.
* `--content-size`
Number of bytes of text to be extracted from the content of files (plain text and PDFs).
Repeated whitespace and special characters do not count toward this limit.
* `--incremental`
Specify an existing index. Information about files in this index that were not modified (based on *mtime* attribute)
will be copied to the new index and will not be parsed again.
* `-o, --output` Output directory.
* `--rewrite-url` Set the `rewrite_url` option for the web module (See [rewrite_url](#rewrite_url))
* `--name` Set the `name` option for the web module
* `--depth` Maximum scan dept. Set to 0 only scan files directly in the root directory, set to -1 for infinite depth
* `--archive` Archive file mode.
* skip: Don't parse
* list: Only get file names as text
* shallow: Don't parse archives inside archives.
* recurse: Scan archives recursively (default)
* `--ocr` See [OCR](../README.md#OCR)
* `-e, --exclude` Regex pattern to exclude files. A file is excluded if the pattern matches any
part of the full absolute path.
Examples:
* `-e ".*\.ttf"`: Ignore ttf files
* `-e ".*\.(ttf|rar)"`: Ignore ttf and rar files
* `-e "^/mnt/backups/"`: Ignore all files in the `/mnt/backups/` directory
* `-e "^/mnt/Data[12]/"`: Ignore all files in the `/mnt/Data1/` and `/mnt/Data2/` directory
* `-e "(^/usr/)|(^/var/)|(^/media/DRIVE-A/tmp/)|(^/media/DRIVE-B/Trash/)"` Exclude the
`/usr`, `/var`, `/media/DRIVE-A/tmp`, `/media/DRIVE-B/Trash` directories
* `--fast` Only index file names and mime type
### Scan examples
@ -92,141 +102,174 @@ Simple scan
sist2 scan ~/Documents
sist2 scan \
--threads 4 --content-size 16000000 --thumbnail_count-quality 2 --archive shallow \
--threads 4 --content-size 16000000 --quality 1.0 --archive shallow \
--name "My Documents" --rewrite-url "http://nas.domain.local/My Documents/" \
~/Documents -o ./documents.sist2
~/Documents -o ./documents.idx/
```
Incremental scan
If the index file does not exist, `--incremental` has no effect.
```bash
sist scan ~/Documents -o ./documents.sist2
sist scan ~/Documents -o ./documents.sist2 --incremental
# or
sist scan ~/Documents -o ./documents.sist2 --incremental
sist scan ~/Documents -o ./documents.sist2 --incremental
```
sist2 scan --incremental ./orig_idx/ -o ./updated_idx/ ~/Documents
```
### Index documents to Elasticsearch search backend
### Index format
```bash
sist2 index --force-reset --batch-size 1000 --es-url http://localhost:9200 ./my_index.sist2
sist2 index ./my_index.sist2
A typical `binary` type index structure looks like this:
```
documents.idx/
├── descriptor.json
├── _index_139965416830720
├── _index_139965425223424
├── _index_139965433616128
├── _index_139965442008832
└── thumbs
├── data.mdb
└── lock.mdb
```
#### Index documents to SQLite search backend
The `_index_*` files contain the raw binary index data and are not meant to be
read by other applications. The format is generally compatible across different
sist2 versions.
The `thumbs/` folder is a [LMDB](https://en.wikipedia.org/wiki/Lightning_Memory-Mapped_Database)
database containing the thumbnails.
The `descriptor.json` file contains general information about the index. The
following fields are safe to modify manually: `root`, `name`, [rewrite_url](#rewrite_url) and `timestamp`.
*Advanced usage*
Instead of using the `scan` module, you can also import an index generated
by a third party application. The 'external' index must have the following format:
```
my_index/
├── descriptor.json
├── _index_0
└── thumbs
├── data.mdb
└── lock.mdb
```
*descriptor.json*:
```json
{
"uuid": "<valid UUID4>",
"version": "_external_v1",
"root": "(optional)",
"name": "<name>",
"rewrite_url": "(optional)",
"type": "json",
"timestamp": 1578971024
}
```
*_index_0*: NDJSON format (One json object per line)
```json
{
"_id": "unique uuid for the file",
"index": "index uuid4 (same one as descriptor.json!)",
"mime": "application/x-cbz",
"size": 14341204,
"mtime": 1578882996,
"extension": "cbz",
"name": "my_book",
"path": "path/to/books",
"content": "text contents of the book",
"title": "Title of the book",
"tag": ["genre.fiction", "author.someguy", "etc..."],
"_keyword": [
{"k": "ISBN", "v": "ABCD34789231"}
],
"_text": [
{"k": "other", "v": "This will be indexed as text"}
]
}
```
You can find the full list of supported fields [here](../src/io/serialize.c#L90)
The `_keyword.*` items will be indexed and searchable as **keyword** fields (only full matches allowed).
The `_text.*` items will be indexed and searchable as **text** fields (fuzzy searching allowed)
*thumbs/*:
LMDB key-value store. Keys are **binary** 128-bit UUID4s (`_id` field)
and values are raw image bytes.
Importing an external `binary` type index is technically possible but
it is currently unsupported and has no guaranties of back/forward compatibility.
## Index
### Index options
* `--es-url`
Elasticsearch url and port. If you are using docker, make sure that both containers are on the
same network.
* `-p, --print`
Print index in JSON format to stdout.
* `--script-file`
Path to user script. See [Scripting](scripting/README.md).
* `--batch-size=<int>`
Index batch size. Indexing is generally faster with larger batches, but payloads that
are too large will fail and additional overhead for retrying with smaller sizes may slow
down the process.
* `-f, --force-reset`
Reset Elasticsearch mappings and settings.
**(You must use this option the first time you use the index command)**.
### Index examples
**Push to elasticsearch**
```bash
# The search index will be created if it does not exist already
sist2 sqlite-index ./index1.sist2 --search-index search.sist2
sist2 sqlite-index ./index2.sist2 --search-index search.sist2
sist2 index --force-reset --batch-size 1000 --es-url http://localhost:9200 ./my_index/
sist2 index ./my_index/
```
**Save index in JSON format**
```bash
sist2 index --print ./my_index.sist2 > my_index.ndjson
sist2 index --print ./my_index/ > my_index.ndjson
```
**Inspect contents of an index**
```bash
sist2 index --print ./my_index.sist2 | jq | less
sist2 index --print ./my_index/ | jq | less
```
## Web
### Web options
* `--es-url=<str>` Elasticsearch url.
* `--bind=<str>` Listen on this address.
* `--port=<str>` Listen on this port.
* `--auth=<str>` Basic auth in user:password format
### Web examples
**Single index (Elasticsearch backend)**
**Single index**
```bash
sist2 web --auth admin:hunter2 --bind 0.0.0.0:8888 my_index.sist2
sist2 web --auth admin:hunter2 --bind 0.0.0.0 --port 8888 my_index
```
**Multiple indices (Elasticsearch backend)**
**Multiple indices**
```bash
# Indices will be displayed in this order in the web interface
sist2 web index1.sist2 index2.sist2 index3.sist2 index4.sist2
sist2 web index1 index2 index3 index4
```
**SQLite search backend**
```bash
sist2 web --search-index search.sist2 index1.sist2
```
#### Auth0 authentication
See [auth0.md](auth0.md)
### rewrite_url
When the `rewrite_url` field is not empty, the web module ignores the `root`
field and will return a HTTP redirect to `<rewrite_url><path>/<name><extension>`
instead of serving the file from disk.
Both the `root` and `rewrite_url` fields are safe to manually modify from the
Both the `root` and `rewrite_url` fields are safe to manually modify from the
`descriptor.json` file.
# Elasticsearch
### Link to specific indices
Elasticsearch versions >=6.8.0, 7.X.X and 8.X.X are supported by sist2.
Using a version >=7.14.0 is recommended to enable the following features:
- Bug fix for large documents (See #198)
Using a version >=8.0.0 is recommended to enable the following features:
- Approximate KNN search for Embeddings search (faster queries).
When using a legacy version of ES, a notice will be displayed next to the sist2 version in the web UI.
If you don't care about the features above, you can ignore it or disable it in the configuration page.
# Embeddings search
Since v3.2.0, User scripts can be used to generate _embeddings_ (vector of float32 numbers) which are stored in the .sist2 index file
(see [scripting](scripting.md)). Embeddings can be used for:
* Nearest-neighbor queries (e.g. "return the documents most similar to this one")
* Semantic searches (e.g. "return the documents that are most closely related to the given topic")
In theory, embeddings can be created for any type of documents (image, text, audio etc.).
For example, the [clip](https://github.com/sist2app/sist2-script-clip) User Script, generates 512-d embeddings of images
(videos are also supported using the thumbnails generated by sist2). When the user enters a query in the "Embeddings Search"
textbox, the query's embedding is generated in their browser, leveraging the ONNX web runtime.
<details>
<summary>Screenshots</summary>
![embeddings-1](embeddings-1.png)
![embeddings-2](embeddings-2.png)
1. Embeddings search bar. You can select the model using the dropdown on the left.
2. This icon appears for indices with embeddings search enabled.
3. Documents with this icon have embeddings. Click on the icon to perform KNN search.
</details>
# Tagging
### Manual tagging
You can modify tags of individual documents directly from the
`web` interface. Note that you can setup authentication for this feature
with the `--tag-auth` option (See [web options](#web-options))
![manual_tag](manual_tag.png)
Tags that are manually added are saved both in the
index folder (in `/tags/`) and in Elasticsearch*. When re-`index`ing,
they are read from the index and automatically applied.
You can safely copy the `/tags/` database to another index.
See [Automatic tagging](#automatic-tagging) for information about tag
hierarchies and tag colors.
\* *It can take a few seconds to take effect in new search queries.*
### Automatic tagging
See [scripting](scripting.md) documentation.
To link to specific indices, you can add a list of comma-separated index name to
the URL: `?i=<name>,<name>`. By default, indices with `"(nsfw)"` in their name are
not displayed.

View File

@ -1,19 +0,0 @@
# Authentication with Auth0
1. Create a new Auth0 application (Single page app)
2. Create a new Auth0 API:
1. Choose `RS256` signing algorithm
2. Set identifier (audience) to `https://sist2`
3. Download the Auth0 certificate from https://<domain>.auth0.com/pem (you can find the domain Applications->Basic information)
4. Extract the public key from the certificate using `openssl x509 -pubkey -noout -in cert.pem > pubkey.txt`
5. Start the sist2 web server
Example options:
```bash
sist2 web \
--auth0-client-id XXX \
--auth0-audience https://sist2 \
--auth0-domain YYY.auth0.com \
--auth0-public-key-file /ZZZ/pubkey.txt
```

Binary file not shown.

Before

Width:  |  Height:  |  Size: 90 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 996 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 35 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 448 KiB

View File

@ -1,47 +1,18 @@
## User scripts
User scripts are used to augment your sist2 index with additional metadata, neural network embeddings, tags etc.
Since version 3.2.0, user scripts are written in Python, and are ran against the sist2 index file. User scripts do not
need a connection to the search backend.
You can create a user script based on a template from the sist2-admin interface:
![sist2-admin-scripts](sist2-admin-scripts.png)
User scripts leverage the [sist2-python](https://github.com/simon987/sist2-python) library to interface with the
index file*. You can find sist2-python documentation and examples
here: [sist2-python.readthedocs.io](https://sist2-python.readthedocs.io/).
If you are not using the sist2-admin interface, you can run user scripts manually from the command line:
```
pip install git+https://github.com/simon987/sist2-python.git
python my_script.py /path/to/my_index.sist2
```
\* It is possible to manually update the index using raw SQL queries, but the database schema is not stable and
can change at any time; it is recommended to use the more stable sist2-python wrapper instead.
<hr>
<details>
<summary>Legacy user scripts (sist2 version < 3.2.0)</summary>
*This document is under construction, more in-depth guide coming soon*
During the `index` step, you can use the `--script-file <script>` option to
modify documents or add user tags. This option is mainly used to
implement automatic tagging based on file attributes.
The scripting language used
([Painless Scripting Language](https://www.elastic.co/guide/en/elasticsearch/painless/7.4/index.html))
The scripting language used
([Painless Scripting Language](https://www.elastic.co/guide/en/elasticsearch/painless/7.4/index.html))
is very similar to Java, but you should be able to create user scripts
without programming experience at all if you're somewhat familiar with
regex.
This is the base structure of the documents we're working with:
```json
{
"_id": "e171405c-fdb5-4feb-bb32-82637bc32084",
@ -63,13 +34,12 @@ This is the base structure of the documents we're working with:
**Example script**
This script checks if the `genre` attribute exists, if it does
it adds the `genre.<genre>` tag.
it adds the `genre.<genre>` tag.
```Java
ArrayList tags = ctx._source.tag = new ArrayList();
if (ctx._source?.genre != null) {
tags.add("genre." + ctx._source.genre.toLowerCase());
tags.add("genre." + ctx._source.genre.toLowerCase())
}
```
@ -77,34 +47,31 @@ You can use `.` to create a hierarchical tag tree:
![scripting/genre_example](genre_example.png)
To use regular expressions, you need to add this line in `/etc/elasticsearch/elasticsearch.yml`
To use regular expressions, you need to add this line in `/etc/elasticsearch/elasticsearch.yml`
```yaml
script.painless.regex.enabled: true
```
Or, if you're using docker add `-e "script.painless.regex.enabled=true"`
**Tag color**
You can specify the color for an individual tag by appending an
You can specify the color for an individual tag by appending an
hexadecimal color code (`#RRGGBBAA`) to the tag name.
### Examples
If `(20XX)` is in the file name, add the `year.<year>` tag:
```Java
ArrayList tags = ctx._source.tag = new ArrayList();
Matcher m = /[\(\.+](20[0-9]{2})[\)\.+]/.matcher(ctx._source.name);
if (m.find()) {
tags.add("year." + m.group(1));
tags.add("year." + m.group(1))
}
```
Use default *Calibre* folder structure to infer author.
```Java
ArrayList tags = ctx._source.tag = new ArrayList();
@ -117,9 +84,8 @@ if (ctx._source.name.contains("-") && ctx._source.extension == "pdf") {
}
```
If the file matches a specific pattern `AAAA-000 fName1 lName1, <fName2 lName2>...`, add the `actress.<actress>` and
If the file matches a specific pattern `AAAA-000 fName1 lName1, <fName2 lName2>...`, add the `actress.<actress>` and
`studio.<studio>` tag:
```Java
ArrayList tags = ctx._source.tag = new ArrayList();
@ -136,18 +102,26 @@ if (m.find()) {
```
Set the name of the last folder (`/path/to/<studio>/file.mp4`) to `studio.<studio>` tag
```Java
ArrayList tags = ctx._source.tag = new ArrayList();
if (ctx._source.path != "") {
String[] names = ctx._source.path.splitOnToken('/');
String[] names = ctx._source.path.splitOnToken('/');
tags.add("studio." + names[names.length-1]);
}
```
Set the name of the last folder (`/path/to/<studio>/file.mp4`) to `studio.<studio>` tag
```Java
ArrayList tags = ctx._source.tag = new ArrayList();
if (ctx._source.path != "") {
String[] names = ctx._source.path.splitOnToken('/');
tags.add("studio." + names[names.length-1]);
}
```
Parse `EXIF:F Number` tag
```Java
if (ctx._source?.exif_fnumber != null) {
String[] values = ctx._source.exif_fnumber.splitOnToken(' ');
@ -160,7 +134,6 @@ if (ctx._source?.exif_fnumber != null) {
```
Display year and months from `EXIF:DateTime` tag
```Java
if (ctx._source?.exif_datetime != null) {
SimpleDateFormat parser = new SimpleDateFormat("yyyy:MM:dd HH:mm:ss");
@ -177,6 +150,3 @@ if (ctx._source?.exif_datetime != null) {
}
```
</details>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 78 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 3.7 MiB

BIN
docs/sist2.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 889 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 167 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 169 KiB

View File

@ -4,42 +4,20 @@
"type": "keyword",
"doc_values": true
},
"checksum": {
"type": "keyword",
"index": false
},
"_depth": {
"type": "integer"
},
"path": {
"type": "text",
"analyzer": "path_analyzer",
"copy_to": "suggest-path",
"fielddata": true,
"fields": {
"nGram": {
"type": "text",
"analyzer": "my_nGram"
},
"text": {
"type": "text",
"analyzer": "content_analyzer"
}
}
},
"suggest-path": {
"type": "completion",
"analyzer": "case_insensitive_kw_analyzer"
"index_prefixes": {}
},
"mime": {
"type": "keyword"
},
"parent": {
"type": "keyword",
"index": false
},
"thumbnail": {
"type": "integer",
"type": "keyword",
"index": false
},
"videoc": {
@ -51,7 +29,7 @@
"index": false
},
"duration": {
"type": "integer",
"type": "float",
"index": false
},
"width": {
@ -62,13 +40,8 @@
"type": "integer",
"index": false
},
"pages": {
"type": "integer",
"index": false
},
"mtime": {
"type": "date",
"format": "epoch_second"
"type": "integer"
},
"size": {
"type": "long"
@ -79,7 +52,6 @@
"name": {
"analyzer": "content_analyzer",
"type": "text",
"fielddata": true,
"fields": {
"nGram": {
"type": "text",
@ -111,10 +83,10 @@
"analyzer": "my_nGram",
"type": "text"
},
"_keyword.*": {
"_keyword.*": {
"type": "keyword"
},
"_text.*": {
"_text.*": {
"analyzer": "content_analyzer",
"type": "text",
"fields": {
@ -140,14 +112,7 @@
}
},
"tag": {
"type": "text",
"fielddata": true,
"analyzer": "tag_analyzer",
"copy_to": "suggest-tag"
},
"suggest-tag": {
"type": "completion",
"analyzer": "case_insensitive_kw_analyzer"
"type": "keyword"
},
"exif_make": {
"type": "text"
@ -173,75 +138,11 @@
"exif_user_comment": {
"type": "text"
},
"exif_gps_longitude_ref": {
"type": "keyword",
"index": false
},
"exif_gps_longitude_dms": {
"type": "keyword",
"index": false
},
"exif_gps_longitude_dec": {
"type": "keyword",
"index": false
},
"exif_gps_latitude_ref": {
"type": "keyword",
"index": false
},
"exif_gps_latitude_dms": {
"type": "keyword",
"index": false
},
"exif_gps_latitude_dec": {
"type": "keyword",
"index": false
},
"author": {
"type": "text"
},
"modified_by": {
"type": "text"
},
"emb.384.*": {
"type": "dense_vector",
"dims": 384
},
"emb.idx_384.*": {
"type": "dense_vector",
"dims": 384,
"index": true,
"similarity": "cosine"
},
"emb.idx_512.clip": {
"type": "dense_vector",
"dims": 512,
"index": true,
"similarity": "cosine"
},
"emb.512.*": {
"type": "dense_vector",
"dims": 512
},
"emb.idx_768.*": {
"type": "dense_vector",
"dims": 768,
"index": true,
"similarity": "cosine"
},
"emb.768.*": {
"type": "dense_vector",
"dims": 768
},
"emb.idx_1024.*": {
"type": "dense_vector",
"dims": 1024,
"index": true,
"similarity": "cosine"
},
"emb.1024.*": {
"type": "dense_vector",
"dims": 1024
}
}
}

View File

@ -1,22 +1,15 @@
{
"index": {
"refresh_interval": "30s",
"codec": "best_compression",
"number_of_replicas": 0,
"highlight.max_analyzed_offset": 1000000
"codec": "best_compression"
},
"analysis": {
"tokenizer": {
"path_tokenizer": {
"type": "path_hierarchy",
"delimiter": "/"
},
"tag_tokenizer": {
"type": "path_hierarchy",
"delimiter": "."
"type": "path_hierarchy"
},
"my_nGram_tokenizer": {
"type": "ngram",
"type": "nGram",
"min_gram": 3,
"max_gram": 3
}
@ -28,12 +21,6 @@
"lowercase"
]
},
"tag_analyzer": {
"tokenizer": "tag_tokenizer",
"filter": [
"lowercase"
]
},
"case_insensitive_kw_analyzer": {
"tokenizer": "keyword",
"filter": [

View File

@ -1,58 +0,0 @@
{
"index": {
"refresh_interval": "30s",
"codec": "best_compression",
"number_of_replicas": 0
},
"analysis": {
"tokenizer": {
"path_tokenizer": {
"type": "path_hierarchy",
"delimiter": "/"
},
"tag_tokenizer": {
"type": "path_hierarchy",
"delimiter": "."
},
"my_nGram_tokenizer": {
"type": "nGram",
"min_gram": 3,
"max_gram": 3
}
},
"analyzer": {
"path_analyzer": {
"tokenizer": "path_tokenizer",
"filter": [
"lowercase"
]
},
"tag_analyzer": {
"tokenizer": "tag_tokenizer",
"filter": [
"lowercase"
]
},
"case_insensitive_kw_analyzer": {
"tokenizer": "keyword",
"filter": [
"lowercase"
]
},
"my_nGram": {
"tokenizer": "my_nGram_tokenizer",
"filter": [
"lowercase",
"asciifolding"
]
},
"content_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
}

View File

@ -1,13 +1,17 @@
#!/usr/bin/env bash
(
cd ..
rm -rf index.sist2
rm -rf index.sist2/
python3 scripts/mime.py > src/parsing/mime_generated.c
python3 scripts/serve_static.py > src/web/static_generated.c
python3 scripts/index_static.py > src/index/static_generated.c
python3 scripts/magic_static.py > src/magic_generated.c
rm src/static/js/bundle.js 2> /dev/null
cat `ls src/static/js/*.min.js` > src/static/js/bundle.js
cat src/static/js/{util,dom,search}.js >> src/static/js/bundle.js
printf "static const char *const Sist2CommitHash = \"%s\";\n" $(git rev-parse HEAD) > src/git_hash.h
)
rm src/static/css/bundle*.css 2> /dev/null
cat src/static/css/*.min.css > src/static/css/bundle.css
cat src/static/css/light.css >> src/static/css/bundle.css
cat src/static/css/*.min.css > src/static/css/bundle_dark.css
cat src/static/css/dark.css >> src/static/css/bundle_dark.css
python3 scripts/mime.py > src/parsing/mime_generated.c
python3 scripts/serve_static.py > src/web/static_generated.c
python3 scripts/index_static.py > src/index/static_generated.c

View File

@ -1,35 +0,0 @@
#!/usr/bin/env bash
VCPKG_ROOT="/vcpkg"
(
cd sist2-vue/
npm install
npm run build
) &
(
cd sist2-admin/frontend/
npm install
npm run build
) &
wait
mkdir build
(
cd build
cmake -DSIST_PLATFORM=x64_linux -DSIST_DEBUG_INFO=on -DSIST_DEBUG=off -DBUILD_TESTS=off -DCMAKE_TOOLCHAIN_FILE="${VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake" ..
make -j $(nproc)
strip sist2
./sist2 -v > VERSION
)
mv build/sist2 sist2-x64-linux
(
cd build
rm -rf CMakeFiles CMakeCache.txt
cmake -DSIST_PLATFORM=x64_linux -DSIST_DEBUG_INFO=on -DSIST_DEBUG=on -DBUILD_TESTS=off -DCMAKE_TOOLCHAIN_FILE="${VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake" ..
make -j $(nproc)
)
mv build/sist2_debug sist2-x64-linux-debug

View File

@ -1,36 +0,0 @@
#!/usr/bin/env bash
VCPKG_ROOT="/vcpkg"
git submodule update --init --recursive
(
cd sist2-vue/
npm install
npm run build
) &
(
cd sist2-admin/frontend/
npm install
npm run build
) &
wait
mkdir build
(
cd build
cmake -DSIST_PLATFORM=arm64_linux -DSIST_DEBUG_INFO=on -DSIST_DEBUG=off -DBUILD_TESTS=off -DCMAKE_TOOLCHAIN_FILE="${VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake" ..
make -j $(nproc)
strip sist2
)
mv build/sist2 sist2-arm64-linux
rm -rf CMakeFiles CMakeCache.txt
(
cd build
cmake -DSIST_PLATFORM=arm64_linux -DSIST_DEBUG_INFO=on -DSIST_DEBUG=on -DBUILD_TESTS=off -DCMAKE_TOOLCHAIN_FILE="${VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake" ..
make -j $(nproc)
)
mv build/sist2_debug sist2-arm64-linux-debug

View File

@ -3,7 +3,6 @@ import json
files = [
"schema/mappings.json",
"schema/settings.json",
"schema/settings_legacy.json",
"schema/pipeline.json",
]

View File

@ -1,16 +0,0 @@
MAGIC_PATHS = [
"/vcpkg/installed/x64-linux/share/libmagic/misc/magic.mgc",
"/work/vcpkg/installed/x64-linux/share/libmagic/misc/magic.mgc",
"/usr/lib/file/magic.mgc"
]
for path in MAGIC_PATHS:
try:
with open(path, "rb") as f:
data = f.read()
break
except:
continue
print("char magic_database_buffer[%d] = {%s};" % (len(data), ",".join(str(int(b)) for b in data)))

View File

@ -1,4 +1,3 @@
application/x-matlab-data,mat
application/arj, arj
application/base64, mme
application/binhex, hqx
@ -14,7 +13,7 @@ application/epub+zip, epub
application/freeloader, frl
application/futuresplash, spl
application/groupwise, vew
application/gzip, gz|tgz
application/gzip, gz
application/hta, hta
application/i-deas, unv
application/iges, iges|igs
@ -23,14 +22,13 @@ application/java-archive, jar
application/java, class
application/javascript,
application/json, json
application/ndjson, jsonl|ndjson
application/marc, mrc
application/mbedlet, mbd
application/mime, aps
application/mspowerpoint, ppz
application/msword, doc|dot|w6w|wiz|word
application/netmc, mcp
application/octet-stream, bin|dump|gpg|pack|idx
application/octet-stream, bin|dump|gpg
application/oda, oda
application/ogg, ogv
application/pdf, pdf
@ -80,7 +78,9 @@ application/vocaltec-media-desc, vmd
application/vocaltec-media-file, vmf
application/warc, warc
application/winhelp, hlp
application/wordperfect, wp|wp5|wp6|wpd|w60|w61
application/wordperfect6.0, w60
application/wordperfect6.1, w61
application/wordperfect, wp|wp5|wp6|wpd
application/x-123, wk1
application/x-7z-compressed, 7z
application/x-aim, aim
@ -111,7 +111,7 @@ application/x-dbf, dbf
application/x-dbt,
application/x-debian-package, deb
application/x-deepv, deepv
application/x-director, dir|dxr
application/x-director, dcr|dir|dxr
application/x-dmp, dmp
application/x-dosdriver,
application/x-dosexec, dll
@ -157,6 +157,7 @@ application/x-livescreen, ivy
application/x-lotus, wq1
application/x-lz4+json, jsonlz4
application/x-lz4, lz4
application/x-lz4, lz4
application/x-lzh-compressed,
application/x-lzh, lzh
application/x-lzip, lz
@ -244,7 +245,7 @@ audio/make, funk|my|pfunk
audio/midi, kar
audio/mid, rmi
audio/mp4, m4b
audio/mpeg, m2a|mpa|mpga
audio/mpeg, m2a|mpa
audio/ogg, ogg
audio/s3m, s3m
audio/tsp-audio, tsi
@ -346,10 +347,7 @@ text/javascript, js
text/mcf, mcf
text/pascal, pas
text/PGP,
text/plain, com|cmd|conf|def|g|idc|list|lst|mar|sdml|text|txt|md|groovy|license|properties|desktop|ini|rst|cmake|ipynb|readme|less|lo|go|yml|d|cs|hpp|srt|nfo|sfv|m3u|csv|eml|make|log|markdown|yaml
text/x-script.python, pyx
text/csv,
application/vnd.coffeescript, coffee
text/plain, com|cmd|conf|def|g|idc|list|lst|mar|sdml|text|txt|md|groovy|license|properties|desktop|ini|rst|cmake|ipynb|readme|less|lo|go|yml|d|cs|hpp|srt|nfo|sfv|m3u|csv|eml
text/richtext, rt|rtf|rtx
text/rtf,
text/scriplet, wsc
@ -385,7 +383,7 @@ text/x-pascal, p
text/x-perl, pl
text/x-php, php
text/x-po, po
text/x-python, py|pyi
text/x-python, py
text/x-ruby, rb
text/x-sass, sass
text/x-scss, scss
@ -431,22 +429,3 @@ video/x-qtc, qtc
video/x-sgi-movie, movie|mv
x-epoc/x-sisx-app,
application/x-zstd-dictionary,
application/vnd.ms-outlook, msg
image/x-olympus-orf, orf
image/x-nikon-nef, nef
image/x-fuji-raf, raf
image/x-panasonic-raw, rw2|raw
image/x-adobe-dng, dng
image/x-canon-cr2, cr2
image/x-canon-crw, crw
image/x-dcraw,
image/x-kodak-dcr, dcr
image/x-kodak-k25, k25
image/x-kodak-kdc, kdc
image/x-minolta-mrw, mrw
image/x-pentax-pef, pef
image/x-sigma-x3f, xf3
image/x-sony-arw, arw
image/x-sony-sr2, sr2
image/x-sony-srf, srf
image/x-epson-erf, erf
1 application/x-matlab-data application/arj mat arj
application/x-matlab-data mat
1 application/arj application/arj arj arj
2 application/base64 application/base64 mme mme
3 application/binhex application/binhex hqx hqx
13 application/freeloader application/freeloader frl frl
14 application/futuresplash application/futuresplash spl spl
15 application/groupwise application/groupwise vew vew
16 application/gzip application/gzip gz|tgz gz
17 application/hta application/hta hta hta
18 application/i-deas application/i-deas unv unv
19 application/iges application/iges iges|igs iges|igs
22 application/java application/java class class
23 application/javascript application/javascript
24 application/json application/json json json
application/ndjson jsonl|ndjson
25 application/marc application/marc mrc mrc
26 application/mbedlet application/mbedlet mbd mbd
27 application/mime application/mime aps aps
28 application/mspowerpoint application/mspowerpoint ppz ppz
29 application/msword application/msword doc|dot|w6w|wiz|word doc|dot|w6w|wiz|word
30 application/netmc application/netmc mcp mcp
31 application/octet-stream application/octet-stream bin|dump|gpg|pack|idx bin|dump|gpg
32 application/oda application/oda oda oda
33 application/ogg application/ogg ogv ogv
34 application/pdf application/pdf pdf pdf
78 application/vocaltec-media-file application/vocaltec-media-file vmf vmf
79 application/warc application/warc warc warc
80 application/winhelp application/winhelp hlp hlp
81 application/wordperfect application/wordperfect6.0 wp|wp5|wp6|wpd|w60|w61 w60
82 application/wordperfect6.1 w61
83 application/wordperfect wp|wp5|wp6|wpd
84 application/x-123 application/x-123 wk1 wk1
85 application/x-7z-compressed application/x-7z-compressed 7z 7z
86 application/x-aim application/x-aim aim aim
111 application/x-dbt application/x-dbt
112 application/x-debian-package application/x-debian-package deb deb
113 application/x-deepv application/x-deepv deepv deepv
114 application/x-director application/x-director dir|dxr dcr|dir|dxr
115 application/x-dmp application/x-dmp dmp dmp
116 application/x-dosdriver application/x-dosdriver
117 application/x-dosexec application/x-dosexec dll dll
157 application/x-lotus application/x-lotus wq1 wq1
158 application/x-lz4+json application/x-lz4+json jsonlz4 jsonlz4
159 application/x-lz4 application/x-lz4 lz4 lz4
160 application/x-lz4 lz4
161 application/x-lzh-compressed application/x-lzh-compressed
162 application/x-lzh application/x-lzh lzh lzh
163 application/x-lzip application/x-lzip lz lz
245 audio/midi audio/midi kar kar
246 audio/mid audio/mid rmi rmi
247 audio/mp4 audio/mp4 m4b m4b
248 audio/mpeg audio/mpeg m2a|mpa|mpga m2a|mpa
249 audio/ogg audio/ogg ogg ogg
250 audio/s3m audio/s3m s3m s3m
251 audio/tsp-audio audio/tsp-audio tsi tsi
347 text/mcf text/mcf mcf mcf
348 text/pascal text/pascal pas pas
349 text/PGP text/PGP
350 text/plain text/plain com|cmd|conf|def|g|idc|list|lst|mar|sdml|text|txt|md|groovy|license|properties|desktop|ini|rst|cmake|ipynb|readme|less|lo|go|yml|d|cs|hpp|srt|nfo|sfv|m3u|csv|eml|make|log|markdown|yaml com|cmd|conf|def|g|idc|list|lst|mar|sdml|text|txt|md|groovy|license|properties|desktop|ini|rst|cmake|ipynb|readme|less|lo|go|yml|d|cs|hpp|srt|nfo|sfv|m3u|csv|eml
text/x-script.python pyx
text/csv
application/vnd.coffeescript coffee
351 text/richtext text/richtext rt|rtf|rtx rt|rtf|rtx
352 text/rtf text/rtf
353 text/scriplet text/scriplet wsc wsc
383 text/x-perl text/x-perl pl pl
384 text/x-php text/x-php php php
385 text/x-po text/x-po po po
386 text/x-python text/x-python py|pyi py
387 text/x-ruby text/x-ruby rb rb
388 text/x-sass text/x-sass sass sass
389 text/x-scss text/x-scss scss scss
429 video/x-sgi-movie video/x-sgi-movie movie|mv movie|mv
430 x-epoc/x-sisx-app x-epoc/x-sisx-app
431 application/x-zstd-dictionary application/x-zstd-dictionary
application/vnd.ms-outlook msg
image/x-olympus-orf orf
image/x-nikon-nef nef
image/x-fuji-raf raf
image/x-panasonic-raw rw2|raw
image/x-adobe-dng dng
image/x-canon-cr2 cr2
image/x-canon-crw crw
image/x-dcraw
image/x-kodak-dcr dcr
image/x-kodak-k25 k25
image/x-kodak-kdc kdc
image/x-minolta-mrw mrw
image/x-pentax-pef pef
image/x-sigma-x3f xf3
image/x-sony-arw arw
image/x-sony-sr2 sr2
image/x-sony-srf srf
image/x-epson-erf erf

View File

@ -1,12 +1,8 @@
import zlib
mimes = {}
noparse = set()
ext_in_hash = set()
mime_ids = {}
major_mime = {
"sist2": 0,
"model": 1,
"example": 2,
"message": 3,
@ -22,6 +18,7 @@ major_mime = {
pdf = (
"application/pdf",
"application/x-cbz",
"application/epub+zip",
"application/vnd.ms-xpsdocument",
)
@ -76,36 +73,10 @@ markup = (
"text/x-sgml"
)
raw = (
"image/x-olympus-orf",
"image/x-nikon-nef",
"image/x-fuji-raf",
"image/x-panasonic-raw",
"image/x-adobe-dng",
"image/x-canon-cr2",
"image/x-canon-crw",
"image/x-dcraw",
"image/x-kodak-dcr",
"image/x-kodak-k25",
"image/x-kodak-kdc",
"image/x-minolta-mrw",
"image/x-pentax-pef",
"image/x-sigma-x3f",
"image/x-sony-arw",
"image/x-sony-sr2",
"image/x-sony-srf",
"image/x-minolta-mrw",
"image/x-pentax-pef",
"image/x-epson-erf",
)
cnt = 1
def mime_id(mime):
if mime in mime_ids:
return mime_ids[mime]
global cnt
major = mime.split("/")[0]
mime_id = str((major_mime[major] << 16) + cnt)
@ -126,12 +97,8 @@ def mime_id(mime):
mime_id += " | 0x02000000"
elif mime in markup:
mime_id += " | 0x01000000"
elif mime in raw:
mime_id += " | 0x00800000"
elif mime == "application/x-empty":
cnt -= 1
return "1"
mime_ids[mime] = mime_id
return mime_id
@ -139,40 +106,24 @@ def clean(t):
return t.replace("/", "_").replace(".", "_").replace("+", "_").replace("-", "_")
def crc(s):
return zlib.crc32(s.encode()) & 0xffffffff
with open("scripts/mime.csv") as f:
for l in f:
mime, ext_list = l.split(",")
if l.startswith("!"):
mime = mime[1:]
noparse.add(mime)
ext = [x.strip() for x in ext_list.split("|") if x.strip() != ""]
ext = [x.strip() for x in ext_list.split("|")]
mimes[mime] = ext
seen_crc = set()
for ext in mimes.values():
for e in ext:
if crc(e) in seen_crc:
raise Exception("CRC32 collision")
seen_crc.add(crc(e))
seen_crc = set()
for mime in mimes.keys():
if crc(mime) in seen_crc:
raise Exception("CRC32 collision")
seen_crc.add(crc(mime))
print("// **Generated by mime.py**")
print("#ifndef MIME_GENERATED_C")
print("#define MIME_GENERATED_C")
print("#include <glib.h>\n")
print("#include <stdlib.h>\n")
# Enum
print("enum mime {")
for mime, ext in sorted(mimes.items()):
print(f"{clean(mime)}={mime_id(mime)},")
print(" " + clean(mime) + "=" + mime_id(mime) + ",")
print("};")
# Enum -> string
@ -183,28 +134,20 @@ with open("scripts/mime.csv") as f:
print("default: return NULL;}}")
# Ext -> Enum
print("unsigned int mime_extension_lookup(unsigned long extension_crc32) {"
"switch (extension_crc32) {")
print("GHashTable *mime_get_ext_table() {"
"GHashTable *ext_table = g_hash_table_new(g_str_hash, g_str_equal);")
for mime, ext in mimes.items():
if len(ext) > 0:
for e in ext:
print(f"case {crc(e)}:", end="")
print(f"return {clean(mime)};")
print("default: return 0;}}")
for e in [e for e in ext if e]:
print("g_hash_table_insert(ext_table, \"" + e + "\", (gpointer)" + clean(mime) + ");")
if e in ext_in_hash:
raise Exception("extension already in hash: " + e)
ext_in_hash.add(e)
print("return ext_table;}")
# string -> Enum
print("unsigned int mime_name_lookup(unsigned long mime_crc32) {"
"switch (mime_crc32) {")
for mime in mimes.keys():
print(f"case {crc(mime)}: return {clean(mime)};")
print("default: return 0;}}")
# mime list
mime_list = ",".join(mime_id(x) for x in mimes.keys()) + ",0"
print(f"unsigned int mime_ids[] = {{{mime_list}}};")
print("unsigned int* get_mime_ids() { return mime_ids; }")
print("GHashTable *mime_get_mime_table() {"
"GHashTable *mime_table = g_hash_table_new(g_str_hash, g_str_equal);")
for mime, ext in mimes.items():
print("g_hash_table_insert(mime_table, \"" + mime + "\", (gpointer)" + clean(mime) + ");")
print("return mime_table;}")
print("#endif")

View File

@ -1,10 +1,10 @@
files = [
"sist2-vue/src/assets/favicon.ico",
"sist2-vue/dist/css/chunk-vendors.css",
"sist2-vue/dist/css/index.css",
"sist2-vue/dist/js/chunk-vendors.js",
"sist2-vue/dist/js/index.js",
"sist2-vue/dist/index.html",
"src/static/css/bundle.css",
"src/static/css/bundle_dark.css",
"src/static/js/bundle.js",
"src/static/img/sprite-skin-flat.png",
"src/static/img/sprite-skin-flat-dark.png",
"src/static/search.html",
]
@ -13,10 +13,6 @@ def clean(filepath):
for file in files:
try:
with open(file, "rb") as f:
data = f.read()
except:
data = bytes([])
with open(file, "rb") as f:
data = f.read()
print("char %s[%d] = {%s};" % (clean(file), len(data), ",".join(str(int(b)) for b in data)))

View File

@ -1,84 +0,0 @@
#include <sqlite3ext.h>
#include <string.h>
#include <stdlib.h>
SQLITE_EXTENSION_INIT1
static int sep_rfind(const char *str) {
for (int i = (int) strlen(str); i >= 0; i--) {
if (str[i] == '/') {
return i;
}
}
return -1;
}
void path_parent_func(sqlite3_context *ctx, int argc, sqlite3_value **argv) {
if (argc != 1 || sqlite3_value_type(argv[0]) != SQLITE_TEXT) {
sqlite3_result_error(ctx, "Invalid parameters", -1);
}
const char *value = (const char *) sqlite3_value_text(argv[0]);
int stop = sep_rfind(value);
if (stop == -1) {
sqlite3_result_null(ctx);
return;
}
char parent[4096 * 3];
strncpy(parent, value, stop);
sqlite3_result_text(ctx, parent, stop, SQLITE_TRANSIENT);
}
void random_func(sqlite3_context *ctx, int argc, sqlite3_value **argv) {
if (argc != 1 || sqlite3_value_type(argv[0]) != SQLITE_INTEGER) {
sqlite3_result_error(ctx, "Invalid parameters", -1);
}
char state_buf[32] = {0,};
struct random_data buf;
int result;
long seed = sqlite3_value_int64(argv[0]);
initstate_r((int) seed, state_buf, sizeof(state_buf), &buf);
random_r(&buf, &result);
sqlite3_result_int(ctx, result);
}
int sqlite3_extension_init(
sqlite3 *db,
char **pzErrMsg,
const sqlite3_api_routines *pApi
) {
SQLITE_EXTENSION_INIT2(pApi);
sqlite3_create_function(
db,
"path_parent",
1,
SQLITE_UTF8,
NULL,
path_parent_func,
NULL,
NULL
);
sqlite3_create_function(
db,
"random_seeded",
1,
SQLITE_UTF8,
NULL,
random_func,
NULL,
NULL
);
return SQLITE_OK;
}

View File

@ -1 +0,0 @@
gcc -I/mnt/work/vcpkg/installed/x64-linux/include -g -fPIC -shared sqlite_extension.c -o sist2funcs.so

View File

@ -1,3 +0,0 @@
docker run --rm -it --name "sist2-dev-es3"\
-p 9200:9200 -e "discovery.type=single-node" \
-e "ES_JAVA_OPTS=-Xms8g -Xmx8g" elasticsearch:7.17.9

View File

@ -1,3 +0,0 @@
docker run --rm -it --name "sist2-dev-es-6"\
-p 9202:9200 -e "discovery.type=single-node" \
-e "ES_JAVA_OPTS=-Xms8g -Xmx8g" elasticsearch:6.8.0

View File

@ -1,3 +0,0 @@
docker run --rm -it --name "sist2-dev-es3"\
-p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" \
-e "ES_JAVA_OPTS=-Xms8g -Xmx8g" elasticsearch:8.7.0

View File

@ -1,7 +0,0 @@
docker build . -t tmp
docker run --rm -it\
-v $(pwd):/host \
tmp \
scan --ocr-lang eng --ocr-ebooks -t6 --incremental --very-verbose \
-o /host/docker.sist2 /host/third-party/libscan/libscan-test-files/test_files/

View File

@ -1,5 +0,0 @@
module.exports = {
presets: [
'@vue/cli-plugin-babel/preset'
]
}

File diff suppressed because it is too large Load Diff

View File

@ -1,49 +0,0 @@
{
"name": "sist2-admin-vue",
"version": "0.1.0",
"private": true,
"scripts": {
"serve": "vue-cli-service serve",
"build": "vue-cli-service build",
"watch": "vue-cli-service build --watch"
},
"dependencies": {
"axios": "^1.6.0",
"bootstrap-vue": "^2.21.2",
"core-js": "^3.6.5",
"moment": "^2.29.3",
"socket.io-client": "^4.5.1",
"vue": "^2.6.14",
"vue-i18n": "^8.24.4",
"vue-router": "^3.5.4",
"vuex": "^3.4.0"
},
"devDependencies": {
"@vue/cli-plugin-babel": "~5.0.8",
"@vue/cli-plugin-router": "~5.0.8",
"@vue/cli-plugin-vuex": "~5.0.8",
"@vue/cli-service": "~5.0.8",
"babel-eslint": "^10.1.0",
"bootstrap": "^4.5.2",
"vue-template-compiler": "^2.6.11"
},
"eslintConfig": {
"root": true,
"env": {
"node": true
},
"extends": [
"plugin:vue/essential",
"eslint:recommended"
],
"parserOptions": {
"parser": "babel-eslint"
},
"rules": {}
},
"browserslist": [
"> 1%",
"last 2 versions",
"not dead"
]
}

Binary file not shown.

Before

Width:  |  Height:  |  Size: 15 KiB

View File

@ -1,17 +0,0 @@
<!DOCTYPE html>
<html lang="">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width,initial-scale=1.0">
<link rel="icon" href="<%= BASE_URL %>serve_favicon_ico.ico">
<title>sist2-admin</title>
</head>
<body>
<noscript>
<strong>We're sorry but <%= htmlWebpackPlugin.options.title %> doesn't work properly without JavaScript enabled. Please enable it to continue.</strong>
</noscript>
<div id="app"></div>
<!-- built files will be auto injected -->
</body>
</html>

View File

@ -1,105 +0,0 @@
<template>
<div id="app">
<NavBar></NavBar>
<b-container class="pt-4">
<b-alert show dismissible variant="info">
This is a beta version of sist2-admin. Please submit bug reports, usability issues and feature requests
to the <a href="https://github.com/sist2app/sist2/issues/new/choose" target="_blank">issue tracker on
Github</a>. Thank you!
</b-alert>
<router-view v-if="$store.state.sist2AdminInfo"/>
</b-container>
</div>
</template>
<script>
import NavBar from "@/components/NavBar";
import Sist2AdminApi from "@/Sist2AdminApi";
export default {
components: {NavBar},
data() {
return {
socket: null
}
},
mounted() {
Sist2AdminApi.getSist2AdminInfo()
.then(resp => this.$store.commit("setSist2AdminInfo", resp.data));
this.$store.dispatch("loadBrowserSettings");
this.connectNotifications();
// this.socket.onclose = this.connectNotifications;
},
methods: {
connectNotifications() {
if (window.location.protocol === "https:") {
this.socket = new WebSocket(`wss://${window.location.host}/notifications`);
} else {
this.socket = new WebSocket(`ws://${window.location.host}/notifications`);
}
this.socket.onopen = () => {
this.socket.send("Hello from client");
}
this.socket.onmessage = e => {
const notification = JSON.parse(e.data);
if (notification.message) {
notification.messageString = this.$t(notification.message).toString();
}
this.$store.dispatch("notify", notification)
}
}
}
}
</script>
<style>
html, body {
height: 100%;
}
#app {
/*font-family: Avenir, Helvetica, Arial, sans-serif;*/
-webkit-font-smoothing: antialiased;
-moz-osx-font-smoothing: grayscale;
/*text-align: center;*/
color: #2c3e50;
padding-bottom: 1em;
min-height: 100%;
}
.info-icon {
width: 1rem;
min-width: 1rem;
margin-right: 0.2rem;
cursor: pointer;
line-height: 1rem;
height: 1rem;
min-height: 1rem;
background-image: url();
filter: brightness(45%);
display: block;
}
.tabs {
margin-top: 10px;
}
.modal-title {
text-overflow: ellipsis;
overflow: hidden;
white-space: nowrap;
}
@media screen and (min-width: 1500px) {
.container {
max-width: 1440px;
}
}
label {
margin-top: 0.5rem;
margin-bottom: 0;
}
</style>

View File

@ -1,179 +0,0 @@
import axios from "axios";
class Sist2AdminApi {
constructor() {
this.baseUrl = window.location.protocol + "//" + window.location.host;
}
getJobs() {
return axios.get(`${this.baseUrl}/api/job`);
}
getFrontends() {
return axios.get(`${this.baseUrl}/api/frontend`);
}
getTasks() {
return axios.get(`${this.baseUrl}/api/task`);
}
killTask(taskId) {
return axios.post(`${this.baseUrl}/api/task/${taskId}/kill`)
}
getTaskHistory() {
return axios.get(`${this.baseUrl}/api/task/history`);
}
/**
* @param {string} name
*/
getJob(name) {
return axios.get(`${this.baseUrl}/api/job/${name}`);
}
getSearchBackend(name) {
return axios.get(`${this.baseUrl}/api/search_backend/${name}`);
}
updateSearchBackend(name, data) {
return axios.put(`${this.baseUrl}/api/search_backend/${name}`, data);
}
getSearchBackends() {
return axios.get(`${this.baseUrl}/api/search_backend`);
}
deleteBackend(name) {
return axios.delete(`${this.baseUrl}/api/search_backend/${name}`)
}
createBackend(name) {
return axios.post(`${this.baseUrl}/api/search_backend/${name}`);
}
getFrontend(name) {
return axios.get(`${this.baseUrl}/api/frontend/${name}`);
}
/**
* @param {string} name
*/
startFrontend(name) {
return axios.post(`${this.baseUrl}/api/frontend/${name}/start`);
}
/**
* @param {string} name
*/
stopFrontend(name) {
return axios.post(`${this.baseUrl}/api/frontend/${name}/stop`);
}
/**
* @param {string} name
* @param job
*/
updateJob(name, job) {
return axios.put(`${this.baseUrl}/api/job/${name}`, job);
}
/**
* @param {string} name
* @param frontend
*/
updateFrontend(name, frontend) {
return axios.put(`${this.baseUrl}/api/frontend/${name}`, frontend);
}
/**
* @param {string} name
* @param {bool} full
*/
runJob(name, full) {
return axios.get(`${this.baseUrl}/api/job/${name}/run`, {
params: {full}
});
}
/**
* @param {string} name
*/
deleteJob(name) {
return axios.delete(`${this.baseUrl}/api/job/${name}`);
}
/**
* @param {string} name
*/
deleteFrontend(name) {
return axios.delete(`${this.baseUrl}/api/frontend/${name}`);
}
/**
* @param {string} name
*/
createJob(name) {
return axios.post(`${this.baseUrl}/api/job/${name}`);
}
/**
* @param {string} name
*/
createFrontend(name) {
return axios.post(`${this.baseUrl}/api/frontend/${name}`);
}
pingEs(url, insecure) {
return axios.get(`${this.baseUrl}/api/ping_es`, {params: {url, insecure}});
}
getSist2AdminInfo() {
return axios.get(`${this.baseUrl}/api`);
}
getLogsToDelete(jobName, n) {
return axios.get(`${this.baseUrl}/api/job/${jobName}/logs_to_delete`, {
params: {n: n}
});
}
deleteTaskLogs(taskId) {
return axios.post(`${this.baseUrl}/api/task/${taskId}/delete_logs`);
}
getUserScripts() {
return axios.get(`${this.baseUrl}/api/user_script`);
}
getUserScript(name) {
return axios.get(`${this.baseUrl}/api/user_script/${name}`);
}
createUserScript(name, template) {
return axios.post(`${this.baseUrl}/api/user_script/${name}`, null, {
params: {
template: template
}
});
}
updateUserScript(name, data) {
return axios.put(`${this.baseUrl}/api/user_script/${name}`, data);
}
deleteUserScript(name) {
return axios.delete(`${this.baseUrl}/api/user_script/${name}`);
}
testUserScript(name, job) {
return axios.get(`${this.baseUrl}/api/user_script/${name}/run`, {
params: {
job: job
}
});
}
}
export default new Sist2AdminApi()

View File

@ -1,31 +0,0 @@
<template>
<b-list-group-item action :to="`/frontend/${frontend.name}`">
<div class="d-flex w-100 justify-content-between">
<h5 class="mb-1" style="display: block">
{{ frontend.name }}
<b-badge variant="light">{{ formatBindAddress(frontend.web_options.bind) }}</b-badge>
</h5>
<div>
<b-badge v-if="frontend.running" variant="success">{{$t("online")}}</b-badge>
<b-badge v-else variant="secondary">{{$t("offline")}}</b-badge>
</div>
</div>
</b-list-group-item>
</template>
<script>
import {formatBindAddress} from "@/util";
export default {
name: "FrontendListItem",
props: ["frontend"],
data() {
return {
formatBindAddress
}
}
}
</script>

View File

@ -1,54 +0,0 @@
<template>
<div>
<h5>{{ $t("selectJobs") }}</h5>
<b-progress v-if="loading" striped animated value="100"></b-progress>
<b-form-group v-else>
<b-form-checkbox-group
v-if="jobs.length > 0"
:checked="frontend.jobs"
@input="frontend.jobs = $event; $emit('input')"
>
<div v-for="job in jobs" :key="job.name">
<b-form-checkbox :disabled="job.status !== 'indexed'"
:value="job.name">
<template #default><span
:title="job.status !== 'indexed' ? $t('jobOptions.notIndexed') : ''"
>[{{ job.name }}]</span></template>
</b-form-checkbox>
<br/>
</div>
</b-form-checkbox-group>
<div v-else>
<span class="text-muted">{{ $t('jobOptions.noJobAvailable') }}</span>
<router-link to="/">{{ $t("create") }}</router-link>
</div>
</b-form-group>
</div>
</template>
<script>
import Sist2AdminApi from "@/Sist2AdminApi";
export default {
name: "JobCheckboxGroup",
props: ["frontend"],
mounted() {
Sist2AdminApi.getJobs().then(resp => {
this._jobs = resp.data;
this.loading = false;
});
},
computed: {
jobs() {
return this._jobs
.filter(job => job.index_options.search_backend === this.frontend.web_options.search_backend)
}
},
data() {
return {
loading: true,
_jobs: null
}
}
}
</script>

View File

@ -1,55 +0,0 @@
<template>
<b-list-group-item class="flex-column align-items-start" action :to="`job/${job.name}`">
<div class="d-flex w-100 justify-content-between">
<div>
<h5 class="mb-1">
{{ job.name }}
</h5>
</div>
<div>
<b-row>
<b-col>
<small v-if="job.last_index_date">
{{ $t("scanned") }} {{ formatLastIndexDate(job.last_index_date) }}</small>
<div v-else>&nbsp;</div>
</b-col>
</b-row>
<b-row v-if="job.schedule_enabled">
<b-col>
<small><code>{{job.cron_expression }}</code></small>
</b-col>
</b-row>
<b-row v-else>
<b-col>
&nbsp;
</b-col>
</b-row>
</div>
</div>
</b-list-group-item>
</template>
<script>
import moment from "moment";
export default {
name: "JobListItem",
props: ["job"],
methods: {
formatLastIndexDate(dateString) {
if (dateString === null) {
return "";
}
return moment.utc(dateString).local().fromNow();
}
}
}
</script>
<style scoped>
</style>

View File

@ -1,94 +0,0 @@
<template>
<div>
<b-form-checkbox :checked="desktopNotificationsEnabled" @change="updateNotifications($event)">
{{ $t("jobOptions.desktopNotifications") }}
</b-form-checkbox>
<b-form-checkbox v-model="job.schedule_enabled" @change="update()">
{{ $t("jobOptions.scheduleEnabled") }}
</b-form-checkbox>
<label>{{ $t("jobOptions.cron") }}</label>
<b-form-input class="text-monospace" :state="cronValid" v-model="job.cron_expression"
:disabled="!job.schedule_enabled" @change="update()"></b-form-input>
<label>{{ $t("jobOptions.keepNLogs") }}</label>
<b-input-group>
<b-form-input type="number" v-model="job.keep_last_n_logs" @change="update()"></b-form-input>
<b-input-group-append>
<b-button variant="danger" @click="onDeleteNowClick()">{{ $t("jobOptions.deleteNow") }}</b-button>
</b-input-group-append>
</b-input-group>
</div>
</template>
<script>
import Sist2AdminApi from "@/Sist2AdminApi";
export default {
name: "JobOptions",
props: ["job"],
data() {
return {
cronValid: undefined,
logsToDelete: null
}
},
computed: {
desktopNotificationsEnabled() {
return this.$store.state.jobDesktopNotificationMap[this.job.name];
}
},
mounted() {
this.cronValid = this.checkCron(this.job.cron_expression)
},
methods: {
checkCron(expression) {
return /((((\d+,)+\d+|(\d+([/-])\d+)|\d+|\*) ?){5,7})/.test(expression);
},
updateNotifications(value) {
this.$store.dispatch("setJobDesktopNotification", {
job: this.job.name,
enabled: value
});
},
update() {
if (this.job.schedule_enabled) {
this.cronValid = this.checkCron(this.job.cron_expression);
} else {
this.cronValid = undefined;
}
if (this.cronValid !== false) {
this.$emit("change", this.job);
}
},
onDeleteNowClick() {
Sist2AdminApi.getLogsToDelete(this.job.name, this.job.keep_last_n_logs).then(resp => {
const toDelete = resp.data;
const message = `Delete ${toDelete.length} log files?`;
this.$bvModal.msgBoxConfirm(message, {
title: this.$t("confirmation"),
size: "sm",
buttonSize: "sm",
okVariant: "danger",
okTitle: this.$t("delete"),
cancelTitle: this.$t("cancel"),
footerClass: "p-2",
hideHeaderClose: false,
centered: true
}).then(value => {
if (value) {
toDelete.forEach(row => {
Sist2AdminApi.deleteTaskLogs(row["id"]);
});
}
});
})
}
},
}
</script>

View File

@ -1,34 +0,0 @@
<template>
<b-progress v-if="loading" striped animated value="100"></b-progress>
<span v-else-if="jobs.length === 0"></span>
<b-form-select v-else :options="jobs" text-field="name" value-field="name"
@change="$emit('change', $event)" :value="$t('selectJob')"></b-form-select>
</template>
<script>
import Sist2AdminApi from "@/Sist2AdminApi";
export default {
name: "JobSelect",
mounted() {
Sist2AdminApi.getJobs().then(resp => {
this._jobs = resp.data;
this.loading = false;
});
},
computed: {
jobs() {
return [
{name: this.$t("selectJob"), disabled: true},
...this._jobs.filter(job => job.index_path)
]
}
},
data() {
return {
loading: true,
_jobs: null
}
}
}
</script>

View File

@ -1,69 +0,0 @@
<template>
<b-navbar>
<b-navbar-brand to="/">
<Sist2Icon></Sist2Icon>
</b-navbar-brand>
<b-button class="ml-auto" to="/task" variant="link">{{ $t("tasks") }}</b-button>
</b-navbar>
</template>
<script>
import Sist2Icon from "@/components/icons/Sist2Icon";
export default {
name: "NavBar",
components: {Sist2Icon},
methods: {
tagline() {
return this.$store.state.sist2Info.tagline;
},
sist2Version() {
return this.$store.state.sist2Info.version;
},
isDebug() {
return this.$store.state.sist2Info.debug;
},
isLegacy() {
return this.$store.state.sist2Info.esVersionLegacy;
},
hideLegacy() {
return this.$store.state.optHideLegacy;
}
}
}
</script>
<style scoped>
.navbar {
box-shadow: 0 0.125rem 0.25rem rgb(0 0 0 / 8%) !important;
border-radius: 0;
}
.theme-black .navbar {
background: #546b7a30;
border-bottom: none;
}
.navbar-brand {
color: #222 !important;
font-size: 1.75rem;
padding: 0;
}
.navbar-brand:hover {
color: #000 !important;
}
.version {
color: #222 !important;
margin-left: -18px;
margin-top: -14px;
font-size: 11px;
font-family: monospace;
}
.btn-link {
color: #222;
}
</style>

View File

@ -1,110 +0,0 @@
<template>
<div>
<label>{{ $t("scanOptions.path") }}</label>
<b-form-input v-model="options.path" @change="update()"></b-form-input>
<label>{{ $t("scanOptions.threads") }}</label>
<b-form-input type="number" min="1" v-model="options.threads" @change="update()"></b-form-input>
<label>{{ $t("scanOptions.thumbnailQuality") }}</label>
<b-form-input type="number" min="0" max="100" v-model="options.thumbnail_quality" @change="update()"></b-form-input>
<label>{{ $t("scanOptions.thumbnailCount") }}</label>
<b-form-input type="number" min="0" max="1000" v-model="options.thumbnail_count" @change="update()"></b-form-input>
<label>{{ $t("scanOptions.thumbnailSize") }}</label>
<b-form-input type="number" min="100" v-model="options.thumbnail_size" @change="update()"></b-form-input>
<label>{{ $t("scanOptions.contentSize") }}</label>
<b-form-input type="number" min="0" v-model="options.content_size" @change="update()"></b-form-input>
<label>{{ $t("scanOptions.rewriteUrl") }}</label>
<b-form-input v-model="options.rewrite_url" @change="update()"></b-form-input>
<label>{{ $t("scanOptions.depth") }}</label>
<b-form-input type="number" min="0" v-model="options.depth" @change="update()"></b-form-input>
<label>{{ $t("scanOptions.archive") }}</label>
<b-form-select :options="['skip', 'list', 'shallow', 'recurse']" v-model="options.archive"
@change="update()"></b-form-select>
<label>{{ $t("scanOptions.archivePassphrase") }}</label>
<b-form-input v-model="options.archive_passphrase" @change="update()"></b-form-input>
<label>{{ $t("scanOptions.ocrLang") }}</label>
<b-alert variant="danger" show v-if="selectedOcrLangs.length === 0 && !disableOcrLang">{{ $t("scanOptions.ocrLangAlert") }}</b-alert>
<b-checkbox-group :disabled="disableOcrLang" v-model="selectedOcrLangs" @input="onOcrLangChange">
<b-checkbox v-for="lang in ocrLangs" :key="lang" :value="lang">{{ lang }}</b-checkbox>
</b-checkbox-group>
<!-- <b-form-input readonly v-model="options.ocr_lang" @change="update()"></b-form-input>-->
<div style="height: 10px"></div>
<b-form-checkbox v-model="options.ocr_images" @change="update()">
{{ $t("scanOptions.ocrImages") }}
</b-form-checkbox>
<b-form-checkbox v-model="options.ocr_ebooks" @change="update()">
{{ $t("scanOptions.ocrEbooks") }}
</b-form-checkbox>
<label>{{ $t("scanOptions.exclude") }}</label>
<b-form-input v-model="options.exclude" @change="update()"
:placeholder="$t('scanOptions.excludePlaceholder')"></b-form-input>
<div style="height: 10px"></div>
<b-form-checkbox v-model="options.fast" @change="update()">
{{ $t("scanOptions.fast") }}
</b-form-checkbox>
<b-form-checkbox v-model="options.checksums" @change="update()">
{{ $t("scanOptions.checksums") }}
</b-form-checkbox>
<b-form-checkbox v-model="options.read_subtitles" @change="update()">
{{ $t("scanOptions.readSubtitles") }}
</b-form-checkbox>
<b-form-checkbox v-model="options.optimize_index" @change="update()">
{{ $t("scanOptions.optimizeIndex") }}
</b-form-checkbox>
<label>{{ $t("scanOptions.treemapThreshold") }}</label>
<b-form-input type="number" min="0" v-model="options.treemap_threshold" @change="update()"></b-form-input>
</div>
</template>
<script>
export default {
name: "ScanOptions",
props: ["options"],
data() {
return {
disableOcrLang: false,
selectedOcrLangs: []
}
},
computed: {
ocrLangs() {
return this.$store.state.sist2AdminInfo?.tesseract_langs || [];
}
},
methods: {
onOcrLangChange() {
this.options.ocr_lang = this.selectedOcrLangs.join("+");
this.update();
},
update() {
this.disableOcrLang = this.options.ocr_images === false && this.options.ocr_ebooks === false;
this.$emit("change", this.options);
},
},
mounted() {
this.disableOcrLang = this.options.ocr_images === false && this.options.ocr_ebooks === false;
this.selectedOcrLangs = this.options.ocr_lang ? this.options.ocr_lang.split("+") : [];
}
}
</script>

View File

@ -1,24 +0,0 @@
<template>
<b-list-group-item action :to="`/searchBackend/${backend.name}`">
<div class="d-flex w-100 justify-content-between">
<h5 class="mb-1">
{{ backend.name }}
</h5>
<div>
<b-badge v-if="backend.backend_type === 'sqlite'" variant="info">SQLite</b-badge>
<b-badge v-else variant="info">Elasticsearch</b-badge>
</div>
</div>
</b-list-group-item>
</template>
<script>
export default {
name: "SearchBackendListItem",
props: ["backend"],
}
</script>

View File

@ -1,37 +0,0 @@
<template>
<b-progress v-if="loading" striped animated value="100"></b-progress>
<div v-else>
<label>{{$t("backendOptions.searchBackend")}}</label>
<b-select :options="options" :value="value" @change="$emit('change', $event)"></b-select>
</div>
</template>
<script>
import Sist2AdminApi from "@/Sist2AdminApi";
export default {
name: "SearchBackendSelect",
props: ["value"],
data() {
return {
loading: true,
backends: null,
}
},
computed: {
options() {
return this.backends.map(backend => backend.name)
}
},
mounted() {
Sist2AdminApi.getSearchBackends().then(resp => {
this.loading = false;
this.backends = resp.data
})
}
}
</script>
<style scoped>
</style>

View File

@ -1,57 +0,0 @@
<template>
<b-list-group-item>
<b-row style="height: 50px">
<b-col><h5>{{ task.display_name }}</h5></b-col>
<b-col class="shrink">
<router-link class="btn btn-link" :to="`/log/${task.id}`">{{ $t("logs") }}</router-link>
</b-col>
<b-col class="shrink">
<b-btn variant="link" @click="killTask(task.id)">{{ $t("kill") }}</b-btn>
</b-col>
</b-row>
<b-row>
<b-col>
<b-progress :max="task.progress.count">
<b-progress-bar :value="task.progress.done" :label-html="label" :striped="!task.progress.waiting"/>
</b-progress>
</b-col>
</b-row>
</b-list-group-item>
</template>
<script>
import sist2AdminApi from "@/Sist2AdminApi";
export default {
name: "TaskListItem",
props: ["task"],
computed: {
label() {
const count = this.task.progress.count;
const done = this.task.progress.done;
return `<span>${done}/${count}</span>`
}
},
methods: {
killTask(taskId) {
sist2AdminApi.killTask(taskId).then(() => {
this.$bvToast.toast(this.$t("killConfirmation"), {
title: this.$t("killConfirmationTitle"),
variant: "success",
toaster: "b-toaster-bottom-right"
});
});
}
}
}
</script>
<style scoped>
.shrink {
flex-grow: inherit;
}
</style>

View File

@ -1,18 +0,0 @@
<template>
<b-list-group-item action :to="`/userScript/${script.name}`">
<div class="d-flex w-100 justify-content-between">
<h5 class="mb-1">
{{ script.name }}
</h5>
</div>
</b-list-group-item>
</template>
<script>
export default {
name: "UserScriptListItem",
props: ["script"],
}
</script>

View File

@ -1,88 +0,0 @@
<template>
<b-progress v-if="loading" striped animated value="100"></b-progress>
<b-row v-else>
<b-col cols="6">
<h5>Selected scripts</h5>
<b-list-group>
<b-list-group-item v-for="script in selectedScripts" :key="script"
button
@click="onRemoveScript(script)"
class="d-flex justify-content-between align-items-center">
{{ script }}
<b-button-group>
<b-button variant="light" @click.stop="moveUpScript(script)"></b-button>
<b-button variant="light" @click.stop="moveDownScript(script)"></b-button>
</b-button-group>
</b-list-group-item>
</b-list-group>
</b-col>
<b-col cols="6">
<h5>Available scripts</h5>
<b-list-group>
<b-list-group-item v-for="script in availableScripts" :key="script" button
@click="onSelectScript(script)">
{{ script }}
</b-list-group-item>
</b-list-group>
</b-col>
</b-row>
<!-- <b-checkbox-group v-else :options="scripts" stacked :checked="selectedScripts"-->
<!-- @input="$emit('change', $event)"></b-checkbox-group>-->
</template>
<script>
import Sist2AdminApi from "@/Sist2AdminApi";
export default {
name: "UserScriptPicker",
props: ["selectedScripts"],
data() {
return {
loading: true,
scripts: []
}
},
computed: {
availableScripts() {
return this.scripts.filter(script => !this.selectedScripts.includes(script))
}
},
mounted() {
Sist2AdminApi.getUserScripts().then(resp => {
this.scripts = resp.data.map(script => script.name);
this.loading = false;
});
},
methods: {
onSelectScript(name) {
this.selectedScripts.push(name);
this.$emit("change", this.selectedScripts)
},
onRemoveScript(name) {
this.selectedScripts.splice(this.selectedScripts.indexOf(name), 1);
this.$emit("change", this.selectedScripts);
},
moveUpScript(name) {
const index = this.selectedScripts.indexOf(name);
if (index > 0) {
this.selectedScripts.splice(index, 1);
this.selectedScripts.splice(index - 1, 0, name);
}
this.$emit("change", this.selectedScripts);
},
moveDownScript(name) {
const index = this.selectedScripts.indexOf(name);
if (index < this.selectedScripts.length - 1) {
this.selectedScripts.splice(index, 1);
this.selectedScripts.splice(index + 1, 0, name);
}
this.$emit("change", this.selectedScripts);
}
}
}
</script>
<style scoped>
</style>

View File

@ -1,73 +0,0 @@
<template>
<div>
<h4>{{ $t("webOptions.title") }}</h4>
<b-card>
<label>{{ $t("webOptions.lang") }}</label>
<b-form-select v-model="options.lang" :options="['en', 'fr', 'zh-CN', 'pl', 'de']"
@change="update()"></b-form-select>
<label>{{ $t("webOptions.bind") }}</label>
<b-form-input v-model="options.bind" @change="update()"></b-form-input>
<label>{{ $t("webOptions.tagline") }}</label>
<b-form-textarea v-model="options.tagline" @change="update()"></b-form-textarea>
<label>{{ $t("webOptions.auth") }}</label>
<b-form-input v-model="options.auth" @change="update()"></b-form-input>
<label>{{ $t("webOptions.tagAuth") }}</label>
<b-form-input v-model="options.tag_auth" @change="update()" :disabled="Boolean(options.auth)"></b-form-input>
<b-form-checkbox v-model="options.verbose" @change="update()">
{{$t("webOptions.verbose")}}
</b-form-checkbox>
</b-card>
<br>
<h4>Auth0 options</h4>
<b-card>
<label>{{ $t("webOptions.auth0Audience") }}</label>
<b-form-input v-model="options.auth0_audience" @change="update()"></b-form-input>
<label>{{ $t("webOptions.auth0Domain") }}</label>
<b-form-input v-model="options.auth0_domain" @change="update()"></b-form-input>
<label>{{ $t("webOptions.auth0ClientId") }}</label>
<b-form-input v-model="options.auth0_client_id" @change="update()"></b-form-input>
<label>{{ $t("webOptions.auth0PublicKey") }}</label>
<b-textarea rows="10" v-model="options.auth0_public_key" @change="update()"></b-textarea>
</b-card>
</div>
</template>
<script>
export default {
name: "WebOptions",
props: ["options", "frontendName"],
data() {
return {
showEsTestAlert: false,
esTestOk: false,
esTestMessage: ""
}
},
methods: {
update() {
console.log(this.options)
if (this.options.auth && this.options.tag_auth) {
// If both are set, remove tagAuth
this.options.tag_auth = "";
}
this.$emit("change", this.options);
},
}
}
</script>
<style scoped>
</style>

View File

@ -1,40 +0,0 @@
<template>
<svg
xmlns="http://www.w3.org/2000/svg"
width="27.868069mm"
height="7.6446671mm"
viewBox="0 0 27.868069 7.6446671"
>
<g transform="translate(-4.5018313,-4.1849793)">
<g
style="fill: currentColor;fill-opacity:1;stroke:none;stroke-width:0.26458332">
<path
d="m 6.3153296,11.829646 q -0.7717014,0 -1.8134983,-0.337619 v -0.916395 q 1.0128581,0.511252 1.803852,0.511252 0.5643067,0 0.901926,-0.236334 0.3376194,-0.236333 0.3376194,-0.63183 0,-0.3424428 -0.2845649,-0.5498376 Q 6.980922,9.4566645 6.3635609,9.3264399 L 5.9921796,9.2492698 Q 5.2301245,9.0949295 4.8732126,8.7428407 4.5211238,8.3859288 4.5211238,7.7733908 q 0,-0.7765245 0.5305447,-1.1961372 0.5305447,-0.4196126 1.5096409,-0.4196126 0.829579,0 1.6061036,0.3183268 V 7.3441319 Q 7.4101809,6.9004036 6.5854251,6.9004036 q -1.1671984,0 -1.1671984,0.7958171 0,0.2604492 0.1012858,0.4147895 0.1012858,0.1495171 0.3858507,0.2556261 0.2845649,0.1012858 0.8392253,0.2122179 l 0.3569119,0.067524 q 1.3408312,0.2652724 1.3408312,1.4614098 0,0.80064 -0.5691298,1.263661 -0.5691298,0.458197 -1.5578722,0.458197 z"
style="stroke-width:0.26458332"
/>
<path
d="m 11.943927,5.3087694 q -0.144694,0 -0.144694,-0.144694 V 4.3296733 q 0,-0.144694 0.144694,-0.144694 h 0.694531 q 0.144694,0 0.144694,0.144694 v 0.8344021 q 0,0.144694 -0.144694,0.144694 z M 13.5645,11.728361 q -0.795817,0 -1.234722,-0.511253 -0.434082,-0.516075 -0.434082,-1.4469398 V 6.9823969 H 10.714028 V 6.2878656 h 2.069124 v 3.4823026 q 0,0.5884228 0.221864,0.8971028 0.221865,0.308681 0.6463,0.308681 h 1.036974 v 0.752409 z"
style="stroke-width:0.26458332"
/>
<path
d="m 18.209178,11.829646 q -0.771701,0 -1.813498,-0.337619 v -0.916395 q 1.012858,0.511252 1.803852,0.511252 0.564306,0 0.901926,-0.236334 0.337619,-0.236333 0.337619,-0.63183 0,-0.3424428 -0.284565,-0.5498376 Q 18.87477,9.4566645 18.257409,9.3264399 l -0.371381,-0.07717 Q 17.123973,9.0949295 16.767061,8.7428407 16.414972,8.3859288 16.414972,7.7733908 q 0,-0.7765245 0.530545,-1.1961372 0.530545,-0.4196126 1.509641,-0.4196126 0.829579,0 1.606103,0.3183268 v 0.8681641 q -0.757232,-0.4437283 -1.581988,-0.4437283 -1.167198,0 -1.167198,0.7958171 0,0.2604492 0.101286,0.4147895 0.101286,0.1495171 0.385851,0.2556261 0.284565,0.1012858 0.839225,0.2122179 l 0.356912,0.067524 q 1.340831,0.2652724 1.340831,1.4614098 0,0.80064 -0.56913,1.263661 -0.56913,0.458197 -1.557872,0.458197 z"
style="stroke-width:0.26458332"
/>
<path
d="m 25.207545,11.709068 q -0.993565,0 -1.408355,-0.40032 -0.409966,-0.405143 -0.409966,-1.3794164 V 6.9775737 H 21.947107 V 6.2878656 h 1.442117 V 4.8746874 l 0.887457,-0.3858507 v 1.7990289 h 2.016069 v 0.6897081 h -2.016069 v 2.9517579 q 0,0.5932454 0.226687,0.8344024 0.226687,0.236333 0.790994,0.236333 h 0.998388 v 0.709001 z"
style="stroke-width:0.26458332"
/>
<path
d="m 27.995317,11.043476 q 0,-0.178456 0.120578,-0.299035 0.274919,-0.289388 0.651123,-0.684885 0.376205,-0.4003199 0.805464,-0.8681638 0.327973,-0.356912 0.491959,-0.5353679 0.16881,-0.1832791 0.255626,-0.2845649 0.09164,-0.1012858 0.178456,-0.2073948 0.255626,-0.3086805 0.405144,-0.5257215 0.15434,-0.2170411 0.250803,-0.4292589 0.168809,-0.3762045 0.168809,-0.7524089 0,-0.5980686 -0.352089,-0.935688 -0.356911,-0.3424425 -0.979096,-0.3424425 -0.863341,0 -1.938899,0.6414768 V 4.8361023 q 0.491959,-0.2363335 0.979096,-0.3569119 0.47749,-0.1205783 0.945334,-0.1205783 0.501606,0 0.940511,0.1350477 0.438905,0.1350478 0.766878,0.4244358 0.289388,0.2556261 0.463021,0.6270074 0.173633,0.3665582 0.173633,0.829579 0,0.4726671 -0.212218,0.9501574 -0.106109,0.2411567 -0.274919,0.4726671 -0.163986,0.2266873 -0.424435,0.540191 Q 31.270225,8.501684 31.077299,8.718725 30.884374,8.9357661 30.628748,9.2106847 30.445469,9.4084332 30.286305,9.5675966 30.131965,9.72676 29.958332,9.9003928 29.7847,10.069203 29.558012,10.300713 29.336148,10.5274 29.012998,10.869843 h 3.356901 v 0.819932 h -4.374582 z"
style="stroke-width:0.26458332"
/>
</g>
</g>
</svg>
</template>
<script>
export default {
name: "Sist2Icon"
}
</script>

View File

@ -1,143 +0,0 @@
export default {
en: {
start: "Start",
stop: "Stop",
go: "Go",
online: "online",
offline: "offline",
view: "View",
delete: "Delete",
runNow: "Index now",
runNowFull: "Full re-index",
create: "Create",
cancel: "Cancel",
test: "Test",
confirmation: "Confirmation",
jobTitle: "job configuration",
tasks: "Tasks",
runningTasks: "Running tasks",
frontends: "Frontends",
jobDisabled: "There is no valid index for this job",
status: "Status",
taskHistory: "Task history",
taskName: "Task name",
taskStarted: "Started",
taskDuration: "Duration",
taskStatus: "Status",
logs: "Logs",
kill: "Kill",
killConfirmation: "SIGTERM signal sent to sist2 process",
killConfirmationTitle: "Confirmation",
follow: "Follow",
wholeFile: "Whole file",
logLevel: "Log level",
logMode: "Follow mode",
logFile: "Reading log file",
jobs: "Jobs",
newJobName: "New job name",
newJobHelp: "Create a new job to get started!",
newFrontendName: "New frontend name",
scanned: "last scan",
autoStart: "Start automatically",
runJobConfirmationTitle: "Task queued",
runJobConfirmation: "Check the Tasks page to monitor the status.",
extraQueryArgs: "Extra query arguments when launching from sist2-admin",
customUrl: "Custom URL when launching from sist2-admin",
searchBackends: "Search backends",
searchBackendTitle: "search backend configuration",
newBackendName: "New search backend name",
frontendTab: "Frontend",
backendTab: "Backend",
scripts: "User Scripts",
script: "User Script",
testScript: "Test/debug User Script",
newScriptName: "New script name",
scriptType: "Script type",
scriptCode: "Script code (Python)",
scriptOptions: "User scripts",
gitRepository: "Git repository URL",
extraArgs: "Extra command line arguments",
couldNotStartFrontend: "Could not start frontend",
couldNotStartFrontendBody: "Unable to start the frontend, check server logs for more details.",
selectJobs: "Available jobs",
selectJob: "Select a job",
webOptions: {
title: "Web options",
lang: "UI Language",
bind: "Listen address",
tagline: "Tagline in navbar",
auth: "Basic auth in user:password format",
tagAuth: "Basic auth in user:password format for tagging",
auth0Audience: "Auth0 audience",
auth0Domain: "Auth0 domain",
auth0ClientId: "Auth0 client ID",
auth0PublicKey: "Auth0 public key",
verbose: "Verbose logs"
},
backendOptions: {
title: "Search backend options",
searchBackend: "Search backend",
type: "Search backend type",
esUrl: "Elasticsearch URL",
esIndex: "Elasticsearch index name",
esInsecure: "Do not verify SSL connections to Elasticsearch.",
threads: "Number of threads",
batchSize: "Index batch size",
script: "User script",
searchIndex: "Search index file location"
},
scanOptions: {
title: "Scanning options",
path: "Path",
threads: "Number of threads",
memThrottle: "Total memory threshold in MiB for scan throttling",
thumbnailQuality: "Thumbnail quality, on a scale of 0 to 100, 100 being the best",
thumbnailCount: "Number of thumbnails to generate. Set a value > 1 to create video previews, set to 0 to disable thumbnails.",
thumbnailSize: "Thumbnail size, in pixels",
contentSize: "Number of bytes to be extracted from text documents. Set to 0 to disable",
rewriteUrl: "Serve files from this url instead of from disk",
depth: "Scan up to this many subdirectories deep",
archive: "Archive file mode",
archivePassphrase: "Passphrase for encrypted archive files",
ocrLang: "Tesseract language",
ocrLangAlert: "You must select at least one language",
ocrEbooks: "Enable OCR'ing of ebook files",
ocrImages: "Enable OCR'ing of image files",
exclude: "Files that match this regex will not be scanned",
excludePlaceholder: "Exclude",
fast: "Only index file names & mime type",
checksums: "Calculate file checksums when scanning",
readSubtitles: "Read subtitles from media files",
memBuffer: "Maximum memory buffer size per thread in MiB for files inside archives",
treemapThreshold: "Relative size threshold for treemap",
optimizeIndex: "Defragment index file after scan to reduce its file size."
},
jobOptions: {
title: "Job options",
cron: "Job schedule",
keepNLogs: "Keep last N log files. Set to -1 to keep all logs.",
deleteNow: "Delete now",
scheduleEnabled: "Enable scheduled re-scan",
noJobAvailable: "No jobs available for this search backend.",
notIndexed: "Has not been indexed yet",
noBackendError: "You must select a search backend to run this job",
desktopNotifications: "Desktop notifications"
},
frontendOptions: {
title: "Advanced options",
noJobSelectedWarning: "You must select at least one job to start this frontend"
},
notifications: {
indexCompleted: "Task completed for [$JOB$]"
}
}
}

View File

@ -1,31 +0,0 @@
import Vue from 'vue'
import { BootstrapVue, IconsPlugin } from 'bootstrap-vue'
import "bootstrap/dist/css/bootstrap.min.css"
import "bootstrap-vue/dist/bootstrap-vue.min.css"
Vue.use(BootstrapVue);
Vue.use(IconsPlugin);
import App from './App.vue';
import router from './router';
import store from './store';
import VueI18n from "vue-i18n";
import messages from "@/i18n/messages";
Vue.use(VueI18n);
const i18n = new VueI18n({
locale: "en",
messages: messages
});
Vue.config.productionTip = false
new Vue({
router,
store,
i18n,
render: h => h(App)
}).$mount('#app')

View File

@ -1,57 +0,0 @@
import Vue from 'vue'
import VueRouter from 'vue-router'
import Home from '../views/Home.vue'
import Job from "@/views/Job";
import Tasks from "@/views/Tasks";
import Frontend from "@/views/Frontend";
import Tail from "@/views/Tail";
import SearchBackend from "@/views/SearchBackend.vue";
import UserScript from "@/views/UserScript.vue";
Vue.use(VueRouter);
const routes = [
{
path: "/task",
name: "Tasks",
component: Tasks
},
{
path: "/:tab?",
name: "Home",
component: Home
},
{
path: "/job/:name",
name: "Job",
component: Job
},
{
path: "/frontend/:name",
name: "Frontend",
component: Frontend
},
{
path: "/searchBackend/:name",
name: "SearchBackend",
component: SearchBackend
},
{
path: "/userScript/:name",
name: "UserScript",
component: UserScript
},
{
path: "/log/:taskId",
name: "Tail",
component: Tail
},
]
const router = new VueRouter({
mode: "hash",
base: process.env.BASE_URL,
routes
})
export default router

View File

@ -1,63 +0,0 @@
import Vue from "vue";
import Vuex from "vuex";
Vue.use(Vuex);
function saveBrowserSettings(state) {
const settings = {
jobDesktopNotificationMap: state.jobDesktopNotificationMap
};
localStorage.setItem("sist2-admin-settings", JSON.stringify(settings));
console.log("SAVED");
console.log(settings);
}
export default new Vuex.Store({
state: {
sist2AdminInfo: null,
jobDesktopNotificationMap: {}
},
mutations: {
setSist2AdminInfo: (state, payload) => state.sist2AdminInfo = payload,
setJobDesktopNotificationMap: (state, payload) => state.jobDesktopNotificationMap = payload,
},
actions: {
notify: async ({state}, notification) => {
if (!state.jobDesktopNotificationMap[notification.job]) {
console.log("pass");
return;
}
new Notification(notification.messageString.replace("$JOB$", notification.job));
},
setJobDesktopNotification: async ({state}, {job, enabled}) => {
if (enabled === true) {
const permission = await Notification.requestPermission()
if (permission !== "granted") {
return false;
}
}
state.jobDesktopNotificationMap[job] = enabled;
saveBrowserSettings(state);
return true;
},
loadBrowserSettings({commit}) {
const settingString = localStorage.getItem("sist2-admin-settings");
if (!settingString) {
return;
}
const settings = JSON.parse(settingString);
commit("setJobDesktopNotificationMap", settings["jobDesktopNotificationMap"]);
}
},
modules: {}
})

View File

@ -1,8 +0,0 @@
export function formatBindAddress(address) {
if (address.startsWith("0.0.0.0")) {
return address.slice("0.0.0.0".length)
}
return address
}

View File

@ -1,145 +0,0 @@
<template>
<b-card>
<b-card-title>
{{ name }}
<small style="vertical-align: top">
<b-badge v-if="!loading && frontend.running" variant="success">{{ $t("online") }}</b-badge>
<b-badge v-else-if="!loading" variant="secondary">{{ $t("offline") }}</b-badge>
</small>
</b-card-title>
<!-- Action buttons-->
<div class="mb-3" v-if="!loading">
<b-button class="mr-1" :disabled="frontend.running || !valid" variant="success" @click="start()">{{
$t("start")
}}
</b-button>
<b-button class="mr-1" :disabled="!frontend.running" variant="danger" @click="stop()">{{
$t("stop")
}}
</b-button>
<b-button class="mr-1" :disabled="!frontend.running" variant="primary" :href="frontendUrl" target="_blank">
{{ $t("go") }}
</b-button>
<b-button variant="danger" @click="deleteFrontend()">{{ $t("delete") }}</b-button>
</div>
<b-progress v-if="loading" striped animated value="100"></b-progress>
<b-card-body v-else>
<h4>{{ $t("backendOptions.title") }}</h4>
<b-card>
<b-alert v-if="!valid" variant="warning" show>{{ $t("frontendOptions.noJobSelectedWarning") }}</b-alert>
<SearchBackendSelect :value="frontend.web_options.search_backend"
@change="onBackendSelect($event)"></SearchBackendSelect>
<br>
<JobCheckboxGroup :frontend="frontend" @input="update()"></JobCheckboxGroup>
</b-card>
<br/>
<WebOptions :options="frontend.web_options" :frontend-name="$route.params.name"
@change="update()"></WebOptions>
<br/>
<h4>{{ $t("frontendOptions.title") }}</h4>
<b-card>
<b-form-checkbox v-model="frontend.auto_start" @change="update()">
{{ $t("autoStart") }}
</b-form-checkbox>
<label>{{ $t("extraQueryArgs") }}</label>
<b-form-input v-model="frontend.extra_query_args" @change="update()"></b-form-input>
<label>{{ $t("customUrl") }}</label>
<b-form-input v-model="frontend.custom_url" @change="update()" placeholder="http://"></b-form-input>
</b-card>
</b-card-body>
</b-card>
</template>
<script>
import Sist2AdminApi from "@/Sist2AdminApi";
import JobCheckboxGroup from "@/components/JobCheckboxGroup";
import WebOptions from "@/components/WebOptions";
import SearchBackendSelect from "@/components/SearchBackendSelect.vue";
export default {
name: 'Frontend',
components: {SearchBackendSelect, JobCheckboxGroup, WebOptions},
data() {
return {
loading: true,
frontend: null,
}
},
computed: {
valid() {
return !this.loading && this.frontend.jobs.length > 0;
},
frontendUrl() {
if (this.frontend.custom_url) {
return this.frontend.custom_url + this.args;
}
if (this.frontend.web_options.bind.startsWith("0.0.0.0")) {
return window.location.protocol + "//" + window.location.hostname + ":" + this.port + this.args;
}
return window.location.protocol + "//" + this.frontend.web_options.bind + this.args;
},
name() {
return this.$route.params.name;
},
port() {
return this.frontend.web_options.bind.split(":")[1]
},
args() {
const args = this.frontend.extra_query_args;
if (args !== "") {
return "#" + (args.startsWith("?") ? (args) : ("?" + args));
}
return "";
}
},
mounted() {
Sist2AdminApi.getFrontend(this.name).then(resp => {
this.frontend = resp.data;
this.loading = false;
});
},
methods: {
start() {
Sist2AdminApi.startFrontend(this.name).then(() => {
this.frontend.running = true;
}).catch(() => {
this.$bvToast.toast(this.$t("couldNotStartFrontendBody"), {
title: this.$t("couldNotStartFrontend"),
variant: "danger",
toaster: "b-toaster-bottom-right"
});
});
},
stop() {
this.frontend.running = false;
Sist2AdminApi.stopFrontend(this.name)
},
deleteFrontend() {
Sist2AdminApi.deleteFrontend(this.name).then(() => {
this.$router.push("/");
});
},
update() {
Sist2AdminApi.updateFrontend(this.name, this.frontend);
},
onBackendSelect(backend) {
this.frontend.web_options.search_backend = backend;
this.frontend.jobs = [];
this.update();
}
}
}
</script>

View File

@ -1,240 +0,0 @@
<template>
<div>
<b-tabs content-class="mt-3" v-model="tab" @input="onTabChange($event)">
<b-tab :title="$t('backendTab')">
<b-card>
<b-card-title>{{ $t("searchBackends") }}</b-card-title>
<b-row>
<b-col>
<b-input v-model="newBackendName" :placeholder="$t('newBackendName')"></b-input>
</b-col>
<b-col>
<b-button variant="primary" @click="createBackend()"
:disabled="!backendNameValid(newBackendName)">
{{ $t("create") }}
</b-button>
</b-col>
</b-row>
<hr/>
<b-progress v-if="backendsLoading" striped animated value="100"></b-progress>
<b-list-group v-else>
<SearchBackendListItem v-for="backend in backends"
:key="backend.name" :backend="backend"></SearchBackendListItem>
</b-list-group>
</b-card>
<br/>
<b-card>
<b-card-title>{{ $t("jobs") }}</b-card-title>
<b-row>
<b-col>
<b-input id="new-job" v-model="newJobName" :placeholder="$t('newJobName')"></b-input>
<b-popover
:show.sync="showHelp"
target="new-job"
placement="top"
triggers="manual"
variant="primary"
:content="$t('newJobHelp')"
></b-popover>
</b-col>
<b-col>
<b-button variant="primary" @click="createJob()" :disabled="!jobNameValid(newJobName)">
{{ $t("create") }}
</b-button>
</b-col>
</b-row>
<hr/>
<b-progress v-if="jobsLoading" striped animated value="100"></b-progress>
<b-list-group v-else>
<JobListItem v-for="job in jobs" :key="job.name" :job="job"></JobListItem>
</b-list-group>
</b-card>
</b-tab>
<b-tab :title="$t('scripts')">
<b-progress v-if="scriptsLoading" striped animated value="100"></b-progress>
<b-card v-else>
<b-card-title>{{ $t("scripts") }}</b-card-title>
<label>Select template</label>
<b-form-radio-group stacked :options="scriptTemplates" v-model="scriptTemplate"></b-form-radio-group>
<br>
<b-row>
<b-col>
<b-form-input v-model="newScriptName" :disabled="!scriptTemplate" :placeholder="$t('newScriptName')"></b-form-input>
</b-col>
<b-col>
<b-button variant="primary" @click="createScript()"
:disabled="!scriptNameValid(newScriptName)">
{{ $t("create") }}
</b-button>
</b-col>
</b-row>
<hr/>
<b-list-group>
<UserScriptListItem v-for="script in scripts"
:key="script.name" :script="script"></UserScriptListItem>
</b-list-group>
</b-card>
</b-tab>
<b-tab :title="$t('frontendTab')">
<b-card>
<b-card-title>{{ $t("frontends") }}</b-card-title>
<b-row>
<b-col>
<b-input v-model="newFrontendName" :placeholder="$t('newFrontendName')"></b-input>
</b-col>
<b-col>
<b-button variant="primary" @click="createFrontend()"
:disabled="!frontendNameValid(newFrontendName)">
{{ $t("create") }}
</b-button>
</b-col>
</b-row>
<hr/>
<b-progress v-if="frontendsLoading" striped animated value="100"></b-progress>
<b-list-group v-else>
<FrontendListItem v-for="frontend in frontends"
:key="frontend.name" :frontend="frontend"></FrontendListItem>
</b-list-group>
</b-card>
</b-tab>
</b-tabs>
</div>
</template>
<script>
import JobListItem from "@/components/JobListItem";
import {formatBindAddress} from "@/util";
import Sist2AdminApi from "@/Sist2AdminApi";
import FrontendListItem from "@/components/FrontendListItem";
import SearchBackendListItem from "@/components/SearchBackendListItem.vue";
import UserScriptListItem from "@/components/UserScriptListItem.vue";
export default {
name: "Jobs",
components: {UserScriptListItem, SearchBackendListItem, JobListItem, FrontendListItem},
data() {
return {
jobsLoading: true,
newJobName: "",
jobs: [],
frontendsLoading: true,
frontends: [],
formatBindAddress,
newFrontendName: "",
backends: [],
backendsLoading: true,
newBackendName: "",
scripts: [],
scriptTemplates: [],
newScriptName: "",
scriptTemplate: null,
scriptsLoading: true,
showHelp: false,
tab: 0
}
},
mounted() {
this.loading = true;
if (this.$route.params.tab) {
console.log("mounted " + this.$route.params.tab)
window.setTimeout(() => {
this.tab = Math.round(Number(this.$route.params.tab));
}, 1)
}
this.reload();
},
methods: {
jobNameValid(name) {
if (this.jobs.some(job => job.name === name)) {
return false;
}
return /^[a-zA-Z0-9-_,.; ]+$/.test(name);
},
frontendNameValid(name) {
if (this.frontends.some(frontend => frontend.name === name)) {
return false;
}
return /^[a-zA-Z0-9-_,.; ]+$/.test(name);
},
backendNameValid(name) {
if (this.backends.some(backend => backend.name === name)) {
return false;
}
return /^[a-zA-Z0-9-_,.; ]+$/.test(name);
},
scriptNameValid(name) {
if (this.scripts.some(script => script.name === name)) {
return false;
}
if (name.length > 16) {
return false;
}
return /^[a-zA-Z0-9-_,.; ]+$/.test(name);
},
reload() {
Sist2AdminApi.getJobs().then(resp => {
this.jobs = resp.data;
this.jobsLoading = false;
this.showHelp = this.jobs.length === 0;
});
Sist2AdminApi.getFrontends().then(resp => {
this.frontends = resp.data;
this.frontendsLoading = false;
});
Sist2AdminApi.getSearchBackends().then(resp => {
this.backends = resp.data;
this.backendsLoading = false;
})
Sist2AdminApi.getUserScripts().then(resp => {
this.scripts = resp.data;
this.scriptTemplates = this.$store.state.sist2AdminInfo.user_script_templates;
this.scriptsLoading = false;
})
},
createJob() {
Sist2AdminApi.createJob(this.newJobName).then(this.reload);
},
createFrontend() {
Sist2AdminApi.createFrontend(this.newFrontendName).then(this.reload)
},
createBackend() {
Sist2AdminApi.createBackend(this.newBackendName).then(this.reload);
},
createScript() {
Sist2AdminApi.createUserScript(this.newScriptName, this.scriptTemplate).then(this.reload)
},
onTabChange(tab) {
if (this.$route.params.tab != tab) {
this.$router.push({params: {tab: tab}})
}
}
}
}
</script>

View File

@ -1,138 +0,0 @@
<template>
<b-card>
<b-card-title>
[{{ getName() }}]
{{ $t("jobTitle") }}
</b-card-title>
<div class="mb-3">
<b-dropdown
split
split-variant="primary"
variant="primary"
:text="$t('runNow')"
class="mr-1"
:disabled="!valid"
@click="runJob()"
>
<b-dropdown-item href="#" @click="runJob(true)">{{ $t("runNowFull") }}</b-dropdown-item>
</b-dropdown>
<b-button variant="danger" @click="deleteJob()">{{ $t("delete") }}</b-button>
</div>
<div v-if="job">
{{ $t("status") }}: <code>{{ job.status }}</code>
</div>
<b-progress v-if="loading" striped animated value="100"></b-progress>
<b-card-body v-else>
<h4>{{ $t("jobOptions.title") }}</h4>
<b-card>
<JobOptions :job="job" @change="update"></JobOptions>
</b-card>
<br/>
<h4>{{ $t("backendOptions.title") }}</h4>
<b-card>
<b-alert v-if="!valid" variant="warning" show>{{ $t("jobOptions.noBackendError") }}</b-alert>
<SearchBackendSelect :value="job.index_options.search_backend"
@change="onBackendSelect($event)"></SearchBackendSelect>
</b-card>
<br/>
<h4>{{ $t("scriptOptions") }}</h4>
<b-card>
<UserScriptPicker :selected-scripts="job.user_scripts"
@change="onScriptChange($event)"></UserScriptPicker>
</b-card>
<br/>
<h4>{{ $t("scanOptions.title") }}</h4>
<b-card>
<ScanOptions :options="job.scan_options" @change="update()"></ScanOptions>
</b-card>
</b-card-body>
</b-card>
</template>
<script>
import ScanOptions from "@/components/ScanOptions";
import Sist2AdminApi from "@/Sist2AdminApi";
import JobOptions from "@/components/JobOptions";
import SearchBackendSelect from "@/components/SearchBackendSelect.vue";
import UserScriptPicker from "@/components/UserScriptPicker.vue";
export default {
name: "Job",
components: {
UserScriptPicker,
SearchBackendSelect,
ScanOptions,
JobOptions
},
data() {
return {
loading: true,
job: null,
console: console
}
},
methods: {
getName() {
return this.$route.params.name;
},
update() {
Sist2AdminApi.updateJob(this.getName(), this.job);
},
runJob(full = false) {
Sist2AdminApi.runJob(this.getName(), full).then(() => {
this.$bvToast.toast(this.$t("runJobConfirmation"), {
title: this.$t("runJobConfirmationTitle"),
variant: "success",
toaster: "b-toaster-bottom-right"
});
});
},
deleteJob() {
Sist2AdminApi.deleteJob(this.getName())
.then(() => {
this.$router.push("/");
})
.catch(err => {
this.$bvToast.toast("Cannot delete job " +
"because it is referenced by a frontend", {
title: "Error",
variant: "danger",
toaster: "b-toaster-bottom-right"
});
})
},
onBackendSelect(backend) {
this.job.index_options.search_backend = backend;
this.update();
},
onScriptChange(scripts) {
this.job.user_scripts = scripts;
this.update();
}
},
mounted() {
Sist2AdminApi.getJob(this.getName()).then(resp => {
this.loading = false;
this.job = resp.data;
})
},
computed: {
valid() {
return this.job?.index_options.search_backend != null;
}
}
}
</script>

View File

@ -1,123 +0,0 @@
<template>
<b-card>
<b-card-title>
<span class="text-monospace">{{ getName() }}</span>
{{ $t("searchBackendTitle") }}
</b-card-title>
<div class="mb-3">
<b-button variant="danger" @click="deleteBackend()">{{ $t("delete") }}</b-button>
</div>
<b-progress v-if="loading" striped animated value="100"></b-progress>
<b-card-body v-else>
<label>{{ $t("backendOptions.type") }}</label>
<b-select :options="backendTypeOptions" v-model="backend.backend_type" @change="update()"></b-select>
<hr/>
<template v-if="backend.backend_type === 'elasticsearch'">
<b-alert :variant="esTestOk ? 'success' : 'danger'" :show="showEsTestAlert" class="mt-1">
{{ esTestMessage }}
</b-alert>
<label>{{ $t("backendOptions.esUrl") }}</label>
<b-input-group>
<b-form-input v-model="backend.es_url" @change="update()"></b-form-input>
<b-input-group-append>
<b-button variant="outline-primary" @click="testEs()">{{ $t("test") }}</b-button>
</b-input-group-append>
</b-input-group>
<b-form-checkbox v-model="backend.es_insecure_ssl" :disabled="!this.backend.es_url.startsWith('https')"
@change="update()">
{{ $t("backendOptions.esInsecure") }}
</b-form-checkbox>
<label>{{ $t("backendOptions.esIndex") }}</label>
<b-form-input v-model="backend.es_index" @change="update()"></b-form-input>
<label>{{ $t("backendOptions.threads") }}</label>
<b-form-input v-model="backend.threads" type="number" min="1" @change="update()"></b-form-input>
<label>{{ $t("backendOptions.batchSize") }}</label>
<b-form-input v-model="backend.batch_size" type="number" min="1" @change="update()"></b-form-input>
</template>
<template v-else>
<label>{{ $t("backendOptions.searchIndex") }}</label>
<b-form-input v-model="backend.search_index" disabled></b-form-input>
</template>
</b-card-body>
</b-card>
</template>
<script>
import sist2AdminApi from "@/Sist2AdminApi";
import Sist2AdminApi from "@/Sist2AdminApi";
export default {
name: "SearchBackend",
data() {
return {
showEsTestAlert: false,
esTestOk: false,
esTestMessage: "",
loading: true,
backend: null,
backendTypeOptions: [
{
text: "Elasticsearch",
value: "elasticsearch"
},
{
text: "SQLite",
value: "sqlite"
}
]
}
},
mounted() {
Sist2AdminApi.getSearchBackend(this.getName()).then(resp => {
this.backend = resp.data;
this.loading = false;
});
},
methods: {
getName() {
return this.$route.params.name;
},
testEs() {
sist2AdminApi.pingEs(this.backend.es_url, this.backend.es_insecure_ssl)
.then((resp) => {
this.showEsTestAlert = true;
this.esTestOk = resp.data.ok;
this.esTestMessage = resp.data.message;
});
},
update() {
Sist2AdminApi.updateSearchBackend(this.getName(), this.backend);
},
deleteBackend() {
Sist2AdminApi.deleteBackend(this.getName())
.then(() => {
this.$router.push("/");
})
.catch(err => {
this.$bvToast.toast("Cannot delete search backend " +
"because it is referenced by a job or frontend", {
title: "Error",
variant: "danger",
toaster: "b-toaster-bottom-right"
});
})
}
}
}
</script>
<style scoped>
</style>

View File

@ -1,175 +0,0 @@
<template>
<b-card>
<b-card-body>
<h4 class="mb-3">{{ taskId }} {{ $t("logs") }}</h4>
<div v-if="$store.state.sist2AdminInfo">
{{ $t("logFile") }}
<code>{{ $store.state.sist2AdminInfo.logs_folder }}/sist2-{{ taskId }}.log</code>
<br/>
<br/>
</div>
<b-row>
<b-col>
<span>{{ $t("logLevel") }}</span>
<b-select :options="levels.slice(0, -1)" v-model="logLevel" @input="connect()"></b-select>
</b-col>
<b-col>
<span>{{ $t("logMode") }}</span>
<b-select :options="modeOptions" v-model="mode" @input="connect()"></b-select>
</b-col>
</b-row>
<div id="log-tail-output" class="mt-3 ml-1"></div>
</b-card-body>
</b-card>
</template>
<script>
export default {
name: "Tail",
data() {
return {
logLevel: "DEBUG",
levels: ["DEBUG", "INFO", "WARNING", "ERROR", "ADMIN", "FATAL"],
socket: null,
mode: "follow",
modeOptions: [
{
"text": this.$t('follow'),
"value": "follow"
},
{
"text": this.$t('wholeFile'),
"value": "wholeFile"
}
]
}
},
computed: {
taskId: function () {
return this.$route.params.taskId;
}
},
methods: {
connect() {
let lineCount = 0;
const outputElem = document.getElementById("log-tail-output")
outputElem.replaceChildren();
if (this.socket !== null) {
this.socket.close();
}
const n = this.mode === "follow" ? 32 : 9999999999;
if (window.location.protocol === "https:") {
this.socket = new WebSocket(`wss://${window.location.host}/log/${this.taskId}?n=${n}`);
} else {
this.socket = new WebSocket(`ws://${window.location.host}/log/${this.taskId}?n=${n}`);
}
this.socket.onopen = () => {
this.socket.send("Hello from client");
}
this.socket.onmessage = e => {
let message;
try {
message = JSON.parse(e.data);
} catch {
console.error(e.data)
return;
}
if ("ping" in message) {
return;
}
if (message.level === undefined) {
if ("stderr" in message) {
message.level = "ERROR";
message.message = message["stderr"];
} else if ("stdout" in message) {
message.level = "INFO";
message.message = message["stdout"];
} else {
message.level = "ADMIN";
message.message = message["sist2-admin"];
}
message.datetime = ""
message.filepath = ""
}
if (this.levels.indexOf(message.level) < this.levels.indexOf(this.logLevel)) {
return;
}
const logLine = `${message.datetime} [${message.level} ${message.filepath}] ${message.message}`;
const span = document.createElement("span");
span.setAttribute("class", message.level);
span.appendChild(document.createTextNode(logLine));
outputElem.appendChild(span);
lineCount += 1;
if (this.mode === "follow" && lineCount >= n) {
outputElem.firstChild.remove();
}
}
}
},
mounted() {
this.connect()
}
}
</script>
<style>
#log-tail-output span {
display: block;
}
span.DEBUG {
color: #9E9E9E;
}
span.WARNING {
color: #FFB300;
}
span.INFO {
color: #039BE5;
}
span.ERROR {
color: #F4511E;
}
span.FATAL {
color: #F4511E;
}
span.ADMIN {
color: #ee05ff;
}
#log-tail-output {
font-size: 13px;
font-family: monospace;
padding: 6px;
background-color: #f5f5f5;
border: 1px solid #ccc;
border-radius: 4px;
margin: 3px;
white-space: pre;
color: #000;
overflow-y: hidden;
}
</style>

View File

@ -1,171 +0,0 @@
<template>
<div>
<b-card v-if="tasks.length > 0">
<h2>{{ $t("runningTasks") }}</h2>
<b-list-group>
<TaskListItem v-for="task in tasks" :key="task.id" :task="task"></TaskListItem>
</b-list-group>
</b-card>
<b-card class="mt-4">
<b-card-title>{{ $t("taskHistory") }}</b-card-title>
<br/>
<b-table
id="task-history"
:items="historyItems"
:fields="historyFields"
:current-page="historyCurrentPage"
:tbody-tr-class="rowClass"
:per-page="10"
>
<template #cell(logs)="data">
<template v-if="data.item._row.has_logs">
<b-button variant="link" size="sm" :to="`/log/${data.item.id}`">
{{ $t("view") }}
</b-button>
/
<b-button variant="link" size="sm" @click="deleteLogs(data.item.id)">
{{ $t("delete") }}
</b-button>
</template>
</template>
<template #cell(delete)="data">
</template>
</b-table>
<b-pagination limit="20" v-model="historyCurrentPage" :total-rows="historyItems.length"
:per-page="10"></b-pagination>
</b-card>
</div>
</template>
<script>
import TaskListItem from "@/components/TaskListItem";
import Sist2AdminApi from "@/Sist2AdminApi";
import moment from "moment";
const DAY = 3600 * 24;
const HOUR = 3600;
const MINUTE = 60;
function humanDuration(sec_num) {
sec_num = sec_num / 1000;
const days = Math.floor(sec_num / DAY);
sec_num -= days * DAY;
const hours = Math.floor(sec_num / HOUR);
sec_num -= hours * HOUR;
const minutes = Math.floor(sec_num / MINUTE);
sec_num -= minutes * MINUTE;
const seconds = Math.floor(sec_num);
if (days > 0) {
return `${days} days ${hours}h ${minutes}m ${seconds}s`;
}
if (hours > 0) {
return `${hours}h ${minutes}m ${seconds}s`;
}
if (minutes > 0) {
return `${minutes}m ${seconds}s`;
}
if (seconds > 0) {
return `${seconds}s`;
}
return "<1s";
}
export default {
name: 'Tasks',
components: {TaskListItem},
data() {
return {
loading: true,
tasks: [],
taskHistory: [],
timerId: null,
historyFields: [
{key: "name", label: this.$t("taskName")},
{key: "time", label: this.$t("taskStarted")},
{key: "duration", label: this.$t("taskDuration")},
{key: "status", label: this.$t("taskStatus")},
{key: "logs", label: this.$t("logs")},
],
historyCurrentPage: 1,
historyItems: []
}
},
props: {
msg: String
},
mounted() {
this.loading = true;
this.update().then(() => this.loading = false);
this.timerId = window.setInterval(this.update, 1000);
this.updateHistory();
},
destroyed() {
if (this.timerId) {
window.clearInterval(this.timerId);
}
},
methods: {
rowClass(row) {
if (row.status === "failed") {
return "table-danger";
}
return null;
},
updateHistory() {
Sist2AdminApi.getTaskHistory().then(resp => {
this.historyItems = resp.data.map(row => ({
id: row.id,
name: row.name,
duration: this.taskDuration(row),
time: moment.utc(row.started).local().format("dd, MMM Do YYYY, HH:mm:ss"),
logs: null,
status: row.return_code === 0 ? "ok" : "failed",
_row: row
}));
});
},
update() {
return Sist2AdminApi.getTasks().then(resp => {
this.tasks = resp.data;
})
},
taskDuration(task) {
const start = moment.utc(task.started);
const end = moment.utc(task.ended);
return humanDuration(end.diff(start))
},
deleteLogs(taskId) {
Sist2AdminApi.deleteTaskLogs(taskId).then(() => {
this.updateHistory();
})
}
}
}
</script>
<style scoped>
#task-history {
font-family: monospace;
font-size: 12px;
}
.btn-link {
padding: 0;
}
</style>

View File

@ -1,117 +0,0 @@
<template>
<b-progress v-if="loading" striped animated value="100"></b-progress>
<b-card v-else>
<b-card-title>
{{ $route.params.name }}
{{ $t("script") }}
</b-card-title>
<div class="mb-3">
<b-button variant="danger" @click="deleteScript()">{{ $t("delete") }}</b-button>
</div>
<b-card>
<h5>{{ $t("testScript") }}</h5>
<b-row>
<b-col cols="11">
<JobSelect @change="onJobSelect($event)"></JobSelect>
</b-col>
<b-col cols="1">
<b-button :disabled="!selectedTestJob" variant="primary" @click="testScript()">{{ $t("test") }}
</b-button>
</b-col>
</b-row>
</b-card>
<br/>
<label>{{ $t("scriptType") }}</label>
<b-form-select :options="['git', 'simple']" v-model="script.type" @change="update()"></b-form-select>
<template v-if="script.type === 'git'">
<label>{{ $t("gitRepository") }}</label>
<b-form-input v-model="script.git_repository" placeholder="https://github.com/example/example.git"
@change="update()"></b-form-input>
<label>{{ $t("extraArgs") }}</label>
<b-form-input v-model="script.extra_args" @change="update()" class="text-monospace"></b-form-input>
</template>
<template v-if="script.type === 'simple'">
<label>{{ $t("scriptCode") }}</label>
<p>Find sist2-python documentation <a href="https://sist2-python.readthedocs.io/" target="_blank">here</a></p>
<b-textarea rows="15" class="text-monospace" v-model="script.script" @change="update()" spellcheck="false"></b-textarea>
</template>
<template v-if="script.type === 'local'">
<!-- TODO-->
</template>
</b-card>
</template>
<script>
import Sist2AdminApi from "@/Sist2AdminApi";
import JobOptions from "@/components/JobOptions.vue";
import JobCheckboxGroup from "@/components/JobCheckboxGroup.vue";
import JobSelect from "@/components/JobSelect.vue";
export default {
name: "UserScript",
components: {JobSelect, JobCheckboxGroup, JobOptions},
data() {
return {
loading: true,
script: null,
selectedTestJob: null
}
},
methods: {
update() {
Sist2AdminApi.updateUserScript(this.name, this.script);
},
onJobSelect(job) {
this.selectedTestJob = job;
},
deleteScript() {
Sist2AdminApi.deleteUserScript(this.name)
.then(() => {
this.$router.push("/");
})
.catch(err => {
this.$bvToast.toast("Cannot delete user script " +
"because it is referenced by a job", {
title: "Error",
variant: "danger",
toaster: "b-toaster-bottom-right"
});
})
},
testScript() {
Sist2AdminApi.testUserScript(this.name, this.selectedTestJob)
.then(() => {
this.$bvToast.toast(this.$t("runJobConfirmation"), {
title: this.$t("runJobConfirmationTitle"),
variant: "success",
toaster: "b-toaster-bottom-right"
});
})
}
},
mounted() {
Sist2AdminApi.getUserScript(this.name).then(resp => {
this.script = resp.data;
this.loading = false;
});
},
computed: {
name() {
return this.$route.params.name;
},
},
}
</script>

View File

@ -1,5 +0,0 @@
module.exports = {
publicPath: "",
filenameHashing: false,
productionSourceMap: false,
};

File diff suppressed because it is too large Load Diff

View File

@ -1,7 +0,0 @@
fastapi
git+https://github.com/simon987/hexlib.git
uvicorn
websockets
pycron
GitPython
git+https://github.com/sist2app/sist2-python.git@2.1

View File

@ -1,595 +0,0 @@
import asyncio
import os
import signal
from datetime import datetime
from time import sleep
from urllib.parse import urlparse
import requests
import uvicorn
from fastapi import FastAPI, HTTPException
from hexlib.db import PersistentState
from requests import ConnectionError
from requests.exceptions import SSLError
from starlette.middleware.cors import CORSMiddleware
from starlette.responses import RedirectResponse
from starlette.staticfiles import StaticFiles
from starlette.websockets import WebSocket
from websockets.exceptions import ConnectionClosed
import cron
from config import LOG_FOLDER, logger, WEBSERVER_PORT, DATA_FOLDER, SIST2_BINARY
from jobs import Sist2Job, Sist2ScanTask, TaskQueue, Sist2IndexTask, JobStatus, Sist2UserScriptTask
from notifications import Subscribe, Notifications
from sist2 import Sist2, Sist2SearchBackend
from state import migrate_v1_to_v2, RUNNING_FRONTENDS, TESSERACT_LANGS, DB_SCHEMA_VERSION, migrate_v3_to_v4, \
get_log_files_to_remove, delete_log_file, create_default_search_backends
from web import Sist2Frontend
from script import UserScript, SCRIPT_TEMPLATES
from util import tail_sync, pid_is_running
sist2 = Sist2(SIST2_BINARY, DATA_FOLDER)
db = PersistentState(dbfile=os.path.join(DATA_FOLDER, "state.db"))
notifications = Notifications()
task_queue = TaskQueue(sist2, db, notifications)
app = FastAPI()
app.add_middleware(
CORSMiddleware,
allow_credentials=True,
allow_origins=["*"],
allow_methods=["*"],
allow_headers=["*"],
)
app.mount("/ui/", StaticFiles(directory="./frontend/dist", html=True), name="static")
@app.get("/")
async def home():
return RedirectResponse("ui")
@app.get("/api")
async def api():
return {
"tesseract_langs": TESSERACT_LANGS,
"logs_folder": LOG_FOLDER,
"user_script_templates": list(SCRIPT_TEMPLATES.keys())
}
@app.get("/api/job/{name:str}")
async def get_job(name: str):
job = db["jobs"][name]
if not job:
raise HTTPException(status_code=404)
return job
@app.get("/api/frontend/{name:str}")
async def get_frontend(name: str):
frontend = db["frontends"][name]
frontend: Sist2Frontend
if frontend:
frontend.running = frontend.name in RUNNING_FRONTENDS
return frontend
raise HTTPException(status_code=404)
@app.get("/api/job")
async def get_jobs():
return list(db["jobs"])
@app.put("/api/job/{name:str}")
async def update_job(name: str, new_job: Sist2Job):
new_job.last_modified = datetime.utcnow()
job = db["jobs"][name]
if not job:
raise HTTPException(status_code=404)
args_that_trigger_full_scan = [
"path",
"thumbnail_count",
"thumbnail_quality",
"thumbnail_size",
"content_size",
"depth",
"archive",
"archive_passphrase",
"ocr_lang",
"ocr_images",
"ocr_ebooks",
"fast",
"checksums",
"read_subtitles",
]
for arg in args_that_trigger_full_scan:
if getattr(new_job.scan_options, arg) != getattr(job.scan_options, arg):
new_job.do_full_scan = True
db["jobs"][name] = new_job
@app.put("/api/frontend/{name:str}")
async def update_frontend(name: str, frontend: Sist2Frontend):
db["frontends"][name] = frontend
return "ok"
@app.get("/api/task")
async def get_tasks():
return list(map(lambda t: t.json(), task_queue.tasks()))
@app.get("/api/task/history")
async def task_history():
return list(db["task_done"].sql("ORDER BY started DESC"))
@app.post("/api/task/{task_id:str}/kill")
async def kill_job(task_id: str):
return task_queue.kill_task(task_id)
@app.post("/api/task/{task_id:str}/delete_logs")
async def delete_task_logs(task_id: str):
if not db["task_done"][task_id]:
raise HTTPException(status_code=404)
delete_log_file(db, task_id)
return "ok"
def _run_job(job: Sist2Job):
job.last_modified = datetime.utcnow()
if job.status == JobStatus("created"):
job.status = JobStatus("started")
db["jobs"][job.name] = job
scan_task = Sist2ScanTask(job, f"Scan [{job.name}]")
index_depends_on = scan_task
script_tasks = []
for script_name in job.user_scripts:
script = db["user_scripts"][script_name]
task = Sist2UserScriptTask(script, job, f"Script <{script_name}> [{job.name}]", depends_on=scan_task)
script_tasks.append(task)
index_depends_on = task
index_task = Sist2IndexTask(job, f"Index [{job.name}]", depends_on=index_depends_on)
task_queue.submit(scan_task)
for task in script_tasks:
task_queue.submit(task)
task_queue.submit(index_task)
@app.get("/api/job/{name:str}/run")
async def run_job(name: str, full: bool = False):
job: Sist2Job = db["jobs"][name]
if not job:
raise HTTPException(status_code=404)
if full:
job.do_full_scan = True
_run_job(job)
return "ok"
@app.get("/api/user_script/{name:str}/run")
def run_user_script(name: str, job: str):
script = db["user_scripts"][name]
if not script:
raise HTTPException(status_code=404)
job = db["jobs"][job]
if not job:
raise HTTPException(status_code=404)
script_task = Sist2UserScriptTask(script, job, f"Script <{name}> [{job.name}]")
task_queue.submit(script_task)
return "ok"
@app.get("/api/job/{name:str}/logs_to_delete")
async def task_history(n: int, name: str):
return get_log_files_to_remove(db, name, n)
@app.delete("/api/job/{name:str}")
async def delete_job(name: str):
job: Sist2Job = db["jobs"][name]
if not job:
raise HTTPException(status_code=404)
if any(name in frontend.jobs for frontend in db["frontends"]):
raise HTTPException(status_code=400, detail="in use (frontend)")
try:
os.remove(job.previous_index)
except:
pass
del db["jobs"][name]
return "ok"
@app.delete("/api/frontend/{name:str}")
async def delete_frontend(name: str):
if name in RUNNING_FRONTENDS:
try:
os.kill(RUNNING_FRONTENDS[name], signal.SIGTERM)
except ProcessLookupError:
pass
del RUNNING_FRONTENDS[name]
frontend = db["frontends"][name]
if frontend:
del db["frontends"][name]
else:
raise HTTPException(status_code=404)
@app.post("/api/job/{name:str}")
async def create_job(name: str):
if db["jobs"][name]:
raise ValueError("Job with the same name already exists")
job = Sist2Job.create_default(name)
db["jobs"][name] = job
return job
@app.post("/api/frontend/{name:str}")
async def create_frontend(name: str):
if db["frontends"][name]:
raise ValueError("Frontend with the same name already exists")
frontend = Sist2Frontend.create_default(name)
db["frontends"][name] = frontend
return frontend
@app.get("/api/ping_es")
async def ping_es(url: str, insecure: bool):
return check_es_version(url, insecure)
def check_es_version(es_url: str, insecure: bool):
try:
url = urlparse(es_url)
if url.username:
auth = (url.username, url.password)
es_url = f"{url.scheme}://{url.hostname}:{url.port}"
else:
auth = None
r = requests.get(es_url, verify=not insecure, auth=auth)
except SSLError:
return {
"ok": False,
"message": "Invalid SSL certificate"
}
except ConnectionError as e:
return {
"ok": False,
"message": "Connection refused"
}
except ValueError as e:
return {
"ok": False,
"message": str(e)
}
if r.status_code == 401:
return {
"ok": False,
"message": "Authentication failure"
}
try:
return {
"ok": True,
"message": "Elasticsearch version " + r.json()["version"]["number"]
}
except:
return {
"ok": False,
"message": "Could not read version"
}
def start_frontend_(frontend: Sist2Frontend):
frontend.web_options.indices = [
os.path.join(DATA_FOLDER, db["jobs"][j].index_path)
for j in frontend.jobs
]
backend_name = frontend.web_options.search_backend
search_backend = db["search_backends"][backend_name]
if search_backend is None:
logger.error(
f"Error while running task: search backend not found: {backend_name}")
return -1
logger.debug(f"Fetched search backend options for {backend_name}")
pid = sist2.web(frontend.web_options, search_backend, frontend.name)
sleep(0.2)
if not pid_is_running(pid):
frontend_log = frontend.get_log_path(LOG_FOLDER)
logger.error(f"Frontend exited too quickly, check {frontend_log} for more details:")
for line in tail_sync(frontend.get_log_path(LOG_FOLDER), 3):
logger.error(line.strip())
return False
RUNNING_FRONTENDS[frontend.name] = pid
return True
@app.post("/api/frontend/{name:str}/start")
async def start_frontend(name: str):
frontend = db["frontends"][name]
if not frontend:
raise HTTPException(status_code=404)
ok = start_frontend_(frontend)
if not ok:
raise HTTPException(status_code=500)
return "ok"
@app.post("/api/frontend/{name:str}/stop")
async def stop_frontend(name: str):
if name in RUNNING_FRONTENDS:
os.kill(RUNNING_FRONTENDS[name], signal.SIGTERM)
del RUNNING_FRONTENDS[name]
@app.get("/api/frontend")
async def get_frontends():
res = []
for frontend in db["frontends"]:
frontend: Sist2Frontend
frontend.running = frontend.name in RUNNING_FRONTENDS
res.append(frontend)
return res
@app.get("/api/search_backend")
async def get_search_backends():
return list(db["search_backends"])
@app.put("/api/search_backend/{name:str}")
async def update_search_backend(name: str, backend: Sist2SearchBackend):
if not db["search_backends"][name]:
raise HTTPException(status_code=404)
db["search_backends"][name] = backend
return "ok"
@app.get("/api/search_backend/{name:str}")
def get_search_backend(name: str):
backend = db["search_backends"][name]
if not backend:
raise HTTPException(status_code=404)
return backend
@app.delete("/api/search_backend/{name:str}")
def delete_search_backend(name: str):
backend: Sist2SearchBackend = db["search_backends"][name]
if not backend:
raise HTTPException(status_code=404)
if any(frontend.web_options.search_backend == name for frontend in db["frontends"]):
raise HTTPException(status_code=400, detail="in use (frontend)")
if any(job.index_options.search_backend == name for job in db["jobs"]):
raise HTTPException(status_code=400, detail="in use (job)")
del db["search_backends"][name]
try:
os.remove(os.path.join(DATA_FOLDER, backend.search_index))
except:
pass
return "ok"
@app.post("/api/search_backend/{name:str}")
def create_search_backend(name: str):
if db["search_backends"][name] is not None:
return HTTPException(status_code=400, detail="already exists")
backend = Sist2SearchBackend.create_default(name)
db["search_backends"][name] = backend
return backend
@app.delete("/api/user_script/{name:str}")
def delete_user_script(name: str):
if db["user_scripts"][name] is None:
return HTTPException(status_code=404)
if any(name in job.user_scripts for job in db["jobs"]):
raise HTTPException(status_code=400, detail="in use (job)")
script: UserScript = db["user_scripts"][name]
script.delete_dir()
del db["user_scripts"][name]
return "ok"
@app.post("/api/user_script/{name:str}")
def create_user_script(name: str, template: str):
if db["user_scripts"][name] is not None:
return HTTPException(status_code=400, detail="already exists")
script = SCRIPT_TEMPLATES[template](name)
db["user_scripts"][name] = script
return script
@app.get("/api/user_script")
async def get_user_scripts():
return list(db["user_scripts"])
@app.get("/api/user_script/{name:str}")
async def get_user_script(name: str):
backend = db["user_scripts"][name]
if not backend:
raise HTTPException(status_code=404)
return backend
@app.put("/api/user_script/{name:str}")
async def update_user_script(name: str, script: UserScript):
previous_version: UserScript = db["user_scripts"][name]
if previous_version and previous_version.git_repository != script.git_repository:
script.force_clone = True
db["user_scripts"][name] = script
return "ok"
def tail(filepath: str, n: int):
with open(filepath) as file:
reached_eof = False
buffer = []
line = ""
while True:
tmp = file.readline()
if tmp:
line += tmp
if line.endswith("\n"):
if reached_eof:
yield line
else:
if len(buffer) > n:
buffer.pop(0)
buffer.append(line)
line = ""
else:
if not reached_eof:
reached_eof = True
yield from buffer
yield None
@app.websocket("/notifications")
async def ws_tail_log(websocket: WebSocket):
await websocket.accept()
try:
await websocket.receive_text()
async with Subscribe(notifications) as ob:
async for notification in ob.notifications():
await websocket.send_json(notification)
except ConnectionClosed:
return
@app.websocket("/log/{task_id}")
async def ws_tail_log(websocket: WebSocket, task_id: str, n: int):
log_file = os.path.join(LOG_FOLDER, f"sist2-{task_id}.log")
await websocket.accept()
try:
await websocket.receive_text()
except ConnectionClosed:
return
while True:
for line in tail(log_file, n):
try:
if line:
await websocket.send_text(line)
else:
await websocket.send_json({"ping": ""})
await asyncio.sleep(0.1)
except ConnectionClosed:
return
def main():
uvicorn.run(app, port=WEBSERVER_PORT, host="0.0.0.0", timeout_graceful_shutdown=0)
def initialize_db():
db["sist2_admin"]["info"] = {"version": DB_SCHEMA_VERSION}
frontend = Sist2Frontend.create_default("default")
db["frontends"]["default"] = frontend
create_default_search_backends(db)
logger.info("Initialized database.")
def start_frontends():
for frontend in db["frontends"]:
frontend: Sist2Frontend
if frontend.auto_start and len(frontend.jobs) > 0:
start_frontend_(frontend)
if __name__ == '__main__':
if not db["sist2_admin"]["info"]:
initialize_db()
if db["sist2_admin"]["info"]["version"] == "1":
logger.info("Migrating to v2 database schema")
migrate_v1_to_v2(db)
if db["sist2_admin"]["info"]["version"] == "2":
logger.error("Cannot migrate database from v2 to v3. Delete state.db to proceed.")
exit(-1)
if db["sist2_admin"]["info"]["version"] == "3":
logger.info("Migrating to v4 database schema")
migrate_v3_to_v4(db)
if db["sist2_admin"]["info"]["version"] != DB_SCHEMA_VERSION:
raise Exception(f"Incompatible database {db.dbfile}. "
f"Automatic migration is not available, please delete the database file to continue.")
start_frontends()
cron.initialize(db, _run_job)
logger.info("Started sist2-admin. Hello!")
main()

View File

@ -1,32 +0,0 @@
import os
import logging
import sys
from logging import StreamHandler
from logging.handlers import RotatingFileHandler
MAX_LOG_SIZE = 1 * 1024 * 1024
SIST2_BINARY = os.environ.get("SIST2_BINARY", "/root/sist2")
DATA_FOLDER = os.environ.get("DATA_FOLDER", "/sist2-admin/")
LOG_FOLDER = os.path.join(DATA_FOLDER, "logs")
SCRIPT_FOLDER = os.path.join(DATA_FOLDER, "scripts")
WEBSERVER_PORT = 8080
os.makedirs(LOG_FOLDER, exist_ok=True)
os.makedirs(SCRIPT_FOLDER, exist_ok=True)
os.makedirs(DATA_FOLDER, exist_ok=True)
logger = logging.Logger("sist2-admin")
_log_file = os.path.join(LOG_FOLDER, "sist2-admin.log")
_log_fmt = "%(asctime)s [%(levelname)s] %(message)s"
_log_formatter = logging.Formatter(_log_fmt, datefmt='%Y-%m-%d %H:%M:%S')
console_handler = StreamHandler(sys.stdout)
console_handler.setFormatter(_log_formatter)
file_handler = RotatingFileHandler(_log_file, mode="a", maxBytes=MAX_LOG_SIZE, backupCount=1)
file_handler.setFormatter(_log_formatter)
logger.addHandler(console_handler)
logger.addHandler(file_handler)

View File

@ -1,35 +0,0 @@
from threading import Thread
import pycron
import time
from hexlib.db import PersistentState
from config import logger
from jobs import Sist2Job
def _check_schedule(db: PersistentState, run_job):
jobs = list(db["jobs"])
for job in jobs:
job: Sist2Job
if job.schedule_enabled:
if pycron.is_now(job.cron_expression):
logger.info(f"Submit scan task to queue for [{job.name}]")
run_job(job)
def _cron_thread(db, run_job):
time.sleep(60 - (time.time() % 60))
start = time.time()
while True:
_check_schedule(db, run_job)
time.sleep(60 - ((time.time() - start) % 60))
def initialize(db, run_job):
t = Thread(target=_cron_thread, args=(db, run_job), daemon=True, name="timer")
t.start()

View File

@ -1,411 +0,0 @@
import json
import logging
import os.path
import shlex
import signal
import uuid
from datetime import datetime
from enum import Enum
from io import TextIOWrapper
from logging import FileHandler
from subprocess import Popen
import subprocess
from threading import Lock, Thread
from time import sleep
from typing import List
from uuid import uuid4, UUID
from hexlib.db import PersistentState
from pydantic import BaseModel
from config import logger, LOG_FOLDER, DATA_FOLDER
from notifications import Notifications
from sist2 import ScanOptions, IndexOptions, Sist2
from state import RUNNING_FRONTENDS, get_log_files_to_remove, delete_log_file
from web import Sist2Frontend
from script import UserScript
class JobStatus(Enum):
CREATED = "created"
STARTED = "started"
INDEXED = "indexed"
FAILED = "failed"
class Sist2Job(BaseModel):
name: str
scan_options: ScanOptions
index_options: IndexOptions
user_scripts: List[str] = []
cron_expression: str
schedule_enabled: bool = False
keep_last_n_logs: int = -1
previous_index: str = None
index_path: str = None
previous_index_path: str = None
last_index_date: datetime = None
status: JobStatus = JobStatus("created")
last_modified: datetime
etag: str = None
do_full_scan: bool = False
def __init__(self, **kwargs):
super().__init__(**kwargs)
@staticmethod
def create_default(name: str):
return Sist2Job(
name=name,
scan_options=ScanOptions(path="/"),
index_options=IndexOptions(),
last_modified=datetime.utcnow(),
cron_expression="0 0 * * *"
)
class Sist2TaskProgress:
def __init__(self, done: int = 0, count: int = 0, index_size: int = 0, tn_size: int = 0, waiting: bool = False):
self.done = done
self.count = count
self.index_size = index_size
self.store_size = tn_size
self.waiting = waiting
def percent(self):
return (self.done / self.count) if self.count else 0
class Sist2Task:
def __init__(self, job: Sist2Job, display_name: str, depends_on: uuid.UUID = None):
self.job = job
self.display_name = display_name
self.progress = Sist2TaskProgress()
self.id = uuid4()
self.pid = None
self.started = None
self.ended = None
self.depends_on = depends_on
self._logger = logging.Logger(name=f"{self.id}")
self._logger.addHandler(FileHandler(os.path.join(LOG_FOLDER, f"sist2-{self.id}.log")))
def json(self):
return {
"id": self.id,
"job": self.job,
"display_name": self.display_name,
"progress": self.progress,
"started": self.started,
"ended": self.ended,
"depends_on": self.depends_on,
}
def log_callback(self, log_json):
if "progress" in log_json:
self.progress = Sist2TaskProgress(**log_json["progress"])
elif self._logger:
self._logger.info(json.dumps(log_json))
def run(self, sist2: Sist2, db: PersistentState):
self.started = datetime.utcnow()
logger.info(f"Started task {self.display_name}")
def set_pid(self, pid):
self.pid = pid
class Sist2ScanTask(Sist2Task):
def run(self, sist2: Sist2, db: PersistentState):
super().run(sist2, db)
self.job.scan_options.name = self.job.name
if self.job.index_path is not None and not self.job.do_full_scan:
self.job.scan_options.output = self.job.index_path
else:
self.job.scan_options.output = None
return_code = sist2.scan(self.job.scan_options, logs_cb=self.log_callback, set_pid_cb=self.set_pid)
self.ended = datetime.utcnow()
is_ok = (return_code in (0, 1)) if "debug" in sist2.bin_path else (return_code == 0)
if not is_ok:
self._logger.error(json.dumps({"sist2-admin": f"Process returned non-zero exit code ({return_code})"}))
logger.info(f"Task {self.display_name} failed ({return_code})")
else:
self.job.index_path = self.job.scan_options.output
self.job.last_index_date = datetime.utcnow()
self.job.do_full_scan = False
db["jobs"][self.job.name] = self.job
self._logger.info(json.dumps({"sist2-admin": f"Save last_index_date={self.job.last_index_date}"}))
logger.info(f"Completed {self.display_name} ({return_code=})")
# Remove old index
if is_ok:
if self.job.previous_index_path is not None and self.job.previous_index_path != self.job.index_path:
self._logger.info(json.dumps({"sist2-admin": f"Remove {self.job.previous_index_path=}"}))
try:
os.remove(self.job.previous_index_path)
except FileNotFoundError:
pass
self.job.previous_index_path = self.job.index_path
db["jobs"][self.job.name] = self.job
if is_ok:
return 0
return return_code
class Sist2IndexTask(Sist2Task):
def __init__(self, job: Sist2Job, display_name: str, depends_on: Sist2Task):
super().__init__(job, display_name, depends_on=depends_on.id)
def run(self, sist2: Sist2, db: PersistentState):
super().run(sist2, db)
self.job.index_options.path = self.job.scan_options.output
search_backend = db["search_backends"][self.job.index_options.search_backend]
if search_backend is None:
logger.error(f"Error while running task: search backend not found: {self.job.index_options.search_backend}")
return -1
logger.debug(f"Fetched search backend options for {self.job.index_options.search_backend}")
return_code = sist2.index(self.job.index_options, search_backend, logs_cb=self.log_callback, set_pid_cb=self.set_pid)
self.ended = datetime.utcnow()
duration = self.ended - self.started
ok = return_code in (0, 1)
if ok:
self.restart_running_frontends(db, sist2)
# Update status
self.job.status = JobStatus("indexed") if ok else JobStatus("failed")
self.job.previous_index_path = self.job.index_path
db["jobs"][self.job.name] = self.job
self._logger.info(json.dumps({"sist2-admin": f"Sist2Scan task finished {return_code=}, {duration=}, {ok=}"}))
logger.info(f"Completed {self.display_name} ({return_code=})")
return return_code
def restart_running_frontends(self, db: PersistentState, sist2: Sist2):
for frontend_name, pid in RUNNING_FRONTENDS.items():
frontend = db["frontends"][frontend_name]
frontend: Sist2Frontend
try:
os.kill(pid, signal.SIGTERM)
except ProcessLookupError:
pass
try:
os.wait()
except ChildProcessError:
pass
backend_name = frontend.web_options.search_backend
search_backend = db["search_backends"][backend_name]
if search_backend is None:
logger.error(f"Error while running task: search backend not found: {backend_name}")
return -1
logger.debug(f"Fetched search backend options for {backend_name}")
frontend.web_options.indices = [
os.path.join(DATA_FOLDER, db["jobs"][j].index_path)
for j in frontend.jobs
]
pid = sist2.web(frontend.web_options, search_backend, frontend.name)
RUNNING_FRONTENDS[frontend_name] = pid
self._logger.info(json.dumps({"sist2-admin": f"Restart frontend {pid=} {frontend_name=}"}))
class Sist2UserScriptTask(Sist2Task):
def __init__(self, user_script: UserScript, job: Sist2Job, display_name: str, depends_on: Sist2Task = None):
super().__init__(job, display_name, depends_on=depends_on.id if depends_on else None)
self.user_script = user_script
def run(self, sist2: Sist2, db: PersistentState):
super().run(sist2, db)
try:
self.user_script.setup(self.log_callback, self.set_pid)
except Exception as e:
logger.error(f"Setup for {self.user_script.name} failed: ")
logger.exception(e)
self.log_callback({"sist2-admin": f"Setup for {self.user_script.name} failed: {e}"})
return -1
executable = self.user_script.get_executable()
index_path = os.path.join(DATA_FOLDER, self.job.index_path)
extra_args = self.user_script.extra_args
args = [
executable,
index_path,
*shlex.split(extra_args)
]
self.log_callback({"sist2-admin": f"Starting user script with {executable=}, {index_path=}, {extra_args=}"})
proc = Popen(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE, cwd=self.user_script.script_dir())
self.set_pid(proc.pid)
t_stderr = Thread(target=self._consume_logs, args=(self.log_callback, proc, "stderr", False))
t_stderr.start()
self._consume_logs(self.log_callback, proc, "stdout", True)
self.ended = datetime.utcnow()
return 0
@staticmethod
def _consume_logs(logs_cb, proc, stream, wait):
pipe_wrapper = TextIOWrapper(getattr(proc, stream), encoding="utf8", errors="ignore")
try:
for line in pipe_wrapper:
if line.strip() == "":
continue
if line.startswith("$PROGRESS"):
progress = json.loads(line[len("$PROGRESS "):])
logs_cb({"progress": progress})
continue
logs_cb({stream: line})
finally:
if wait:
proc.wait()
pipe_wrapper.close()
class TaskQueue:
def __init__(self, sist2: Sist2, db: PersistentState, notifications: Notifications):
self._lock = Lock()
self._sist2 = sist2
self._db = db
self._notifications = notifications
self._tasks = {}
self._queue = []
self._sem = 0
self._thread = Thread(target=self._check_new_task, daemon=True)
self._thread.start()
def _tasks_failed(self):
done = set()
for row in self._db["task_done"].sql("WHERE return_code != 0"):
done.add(uuid.UUID(row["id"]))
return done
def _tasks_done(self):
done = set()
for row in self._db["task_done"]:
done.add(uuid.UUID(row["id"]))
return done
def _check_new_task(self):
while True:
with self._lock:
for task in list(self._queue):
task: Sist2Task
if self._sem >= 1:
break
if not task.depends_on or task.depends_on in self._tasks_done():
self._queue.remove(task)
if task.depends_on in self._tasks_failed():
# The task which we depend on failed, continue
continue
self._sem += 1
t = Thread(target=self._run_task, args=(task,))
self._tasks[task.id] = {
"task": task,
"thread": t,
}
t.start()
break
sleep(1)
def tasks(self):
return list(map(lambda t: t["task"], self._tasks.values()))
def kill_task(self, task_id):
task = self._tasks.get(UUID(task_id))
if task:
pid = task["task"].pid
logger.info(f"Killing task {task_id} (pid={pid})")
os.kill(pid, signal.SIGTERM)
return True
return False
def _run_task(self, task: Sist2Task):
task_result = task.run(self._sist2, self._db)
with self._lock:
del self._tasks[task.id]
self._sem -= 1
self._db["task_done"][task.id] = {
"ended": task.ended,
"started": task.started,
"name": task.display_name,
"return_code": task_result,
"has_logs": 1
}
logs_to_delete = get_log_files_to_remove(self._db, task.job.name, task.job.keep_last_n_logs)
for row in logs_to_delete:
delete_log_file(self._db, row["id"])
if isinstance(task, Sist2IndexTask):
self._notifications.notify({
"message": "notifications.indexCompleted",
"job": task.job.name
})
def submit(self, task: Sist2Task):
logger.info(f"Submitted task to queue {task.display_name}")
with self._lock:
self._queue.append(task)

View File

@ -1,40 +0,0 @@
import asyncio
from typing import List
class Notifications:
def __init__(self):
self._subscribers: List[Subscribe] = []
def subscribe(self, ob):
self._subscribers.append(ob)
def unsubscribe(self, ob):
self._subscribers.remove(ob)
def notify(self, notification: dict):
for ob in self._subscribers:
ob.notify(notification)
class Subscribe:
def __init__(self, notifications: Notifications):
self._queue = []
self._notifications = notifications
async def __aenter__(self):
self._notifications.subscribe(self)
return self
async def __aexit__(self, exc_type, exc_val, exc_tb):
self._notifications.unsubscribe(self)
def notify(self, notification: dict):
self._queue.append(notification)
async def notifications(self):
while True:
try:
yield self._queue.pop(0)
except IndexError:
await asyncio.sleep(0.1)

View File

@ -1,130 +0,0 @@
import os
import shutil
import stat
import subprocess
from enum import Enum
from git import Repo
from pydantic import BaseModel
from config import SCRIPT_FOLDER
class ScriptType(Enum):
LOCAL = "local"
SIMPLE = "simple"
GIT = "git"
def set_executable(file):
os.chmod(file, os.stat(file).st_mode | stat.S_IEXEC)
def _initialize_git_repository(url, path, log_cb, force_clone, set_pid_cb):
log_cb({"sist2-admin": f"Cloning {url}"})
if force_clone or not os.path.exists(os.path.join(path, ".git")):
if force_clone:
shutil.rmtree(path, ignore_errors=True)
Repo.clone_from(url, path)
else:
repo = Repo(path)
repo.remote("origin").pull()
setup_script = os.path.join(path, "setup.sh")
if setup_script:
log_cb({"sist2-admin": f"Executing setup script {setup_script}"})
set_executable(setup_script)
proc = subprocess.Popen([setup_script], cwd=path, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
set_pid_cb(proc.pid)
proc.wait()
stdout = proc.stdout.read()
for line in stdout.split(b"\n"):
if line:
log_cb({"stdout": line.decode()})
log_cb({"stdout": f"Executed setup script {setup_script}, return code = {proc.returncode}"})
if proc.returncode != 0:
raise Exception("Error when running setup script!")
log_cb({"sist2-admin": f"Initialized git repository in {path}"})
class UserScript(BaseModel):
name: str
type: ScriptType
git_repository: str = None
force_clone: bool = False
script: str = None
extra_args: str = ""
def script_dir(self):
return os.path.join(SCRIPT_FOLDER, self.name)
def setup(self, log_cb, set_pid_cb):
os.makedirs(self.script_dir(), exist_ok=True)
if self.type == ScriptType.GIT:
_initialize_git_repository(self.git_repository, self.script_dir(), log_cb, self.force_clone, set_pid_cb)
self.force_clone = False
elif self.type == ScriptType.SIMPLE:
self._setup_simple()
set_executable(self.get_executable())
def _setup_simple(self):
with open(self.get_executable(), "w") as f:
f.write(
"#!/bin/bash\n"
"python run.py \"$@\""
)
with open(os.path.join(self.script_dir(), "run.py"), "w") as f:
f.write(self.script)
def get_executable(self):
return os.path.join(self.script_dir(), "run.sh")
def delete_dir(self):
shutil.rmtree(self.script_dir(), ignore_errors=True)
SCRIPT_TEMPLATES = {
"CLIP - Generate embeddings to predict the most relevant image based on the text prompt": lambda name: UserScript(
name=name,
type=ScriptType.GIT,
git_repository="https://github.com/sist2app/sist2-script-clip",
extra_args="--num-tags=1 --tags-file=general.txt --color=#dcd7ff"
),
"Whisper - Speech to text with OpenAI Whisper": lambda name: UserScript(
name=name,
type=ScriptType.GIT,
git_repository="https://github.com/simon987/sist2-script-whisper",
extra_args="--model=base --num-threads=4 --color=#51da4c --tag"
),
"Hamburger - Simple script example": lambda name: UserScript(
name=name,
type=ScriptType.SIMPLE,
script=
'from sist2 import Sist2Index\n'
'import sys\n'
'\n'
'index = Sist2Index(sys.argv[1])\n'
'for doc in index.document_iter():\n'
' doc.json_data["tag"] = ["hamburger.#00FF00"]\n'
' index.update_document(doc)\n'
'\n'
'index.sync_tag_table()\n'
'index.commit()\n'
'\n'
'print("Done!")\n'
),
"(Blank)": lambda name: UserScript(
name=name,
type=ScriptType.SIMPLE,
script=""
)
}

View File

@ -1,363 +0,0 @@
import datetime
import json
import logging
import os.path
import sys
from datetime import datetime
from enum import Enum
from io import TextIOWrapper
from logging import FileHandler, StreamHandler
from subprocess import Popen, PIPE
from tempfile import NamedTemporaryFile
from threading import Thread
from typing import List
from pydantic import BaseModel
from config import logger, LOG_FOLDER, DATA_FOLDER
class Sist2Version:
def __init__(self, version: str):
self._version = version
self.major, self.minor, self.patch = [int(x) for x in version.split(".")]
def __str__(self):
return f"{self.major}.{self.minor}.{self.patch}"
class SearchBackendType(Enum):
SQLITE = "sqlite"
ELASTICSEARCH = "elasticsearch"
class Sist2SearchBackend(BaseModel):
backend_type: SearchBackendType = SearchBackendType("elasticsearch")
name: str
search_index: str = ""
es_url: str = "http://elasticsearch:9200"
es_insecure_ssl: bool = False
es_index: str = "sist2"
threads: int = 1
batch_size: int = 70
@staticmethod
def create_default(name: str, backend_type: SearchBackendType = SearchBackendType("elasticsearch")):
return Sist2SearchBackend(
name=name,
search_index=f"search-index-{name.replace('/', '_')}.sist2",
backend_type=backend_type
)
class IndexOptions(BaseModel):
path: str = None
incremental_index: bool = True
search_backend: str = None
def __init__(self, **kwargs):
super().__init__(**kwargs)
def args(self, search_backend):
absolute_path = os.path.join(DATA_FOLDER, self.path)
if search_backend.backend_type == SearchBackendType("sqlite"):
search_index_absolute = os.path.join(DATA_FOLDER, search_backend.search_index)
args = ["sqlite-index", absolute_path, "--search-index", search_index_absolute]
else:
args = ["index", absolute_path, f"--threads={search_backend.threads}",
f"--es-url={search_backend.es_url}",
f"--es-index={search_backend.es_index}",
f"--batch-size={search_backend.batch_size}"]
if search_backend.es_insecure_ssl:
args.append(f"--es-insecure-ssl")
if self.incremental_index:
args.append(f"--incremental-index")
return args
ARCHIVE_SKIP = "skip"
ARCHIVE_LIST = "list"
ARCHIVE_SHALLOW = "shallow"
ARCHIVE_RECURSE = "recurse"
class ScanOptions(BaseModel):
path: str
threads: int = 1
thumbnail_quality: int = 50
thumbnail_size: int = 552
thumbnail_count: int = 1
content_size: int = 32768
depth: int = -1
archive: str = ARCHIVE_RECURSE
archive_passphrase: str = None
ocr_lang: str = None
ocr_images: bool = False
ocr_ebooks: bool = False
exclude: str = None
fast: bool = False
treemap_threshold: float = 0.0005
mem_buffer: int = 2000
read_subtitles: bool = False
fast_epub: bool = False
checksums: bool = False
incremental: bool = True
optimize_index: bool = False
output: str = None
name: str = None
rewrite_url: str = None
list_file: str = None
def __init__(self, **kwargs):
super().__init__(**kwargs)
def args(self):
output_path = os.path.join(DATA_FOLDER, self.output)
args = ["scan", self.path, f"--threads={self.threads}", f"--thumbnail-quality={self.thumbnail_quality}",
f"--thumbnail-count={self.thumbnail_count}", f"--thumbnail-size={self.thumbnail_size}",
f"--content-size={self.content_size}", f"--output={output_path}", f"--depth={self.depth}",
f"--archive={self.archive}", f"--mem-buffer={self.mem_buffer}"]
if self.incremental:
args.append(f"--incremental")
if self.optimize_index:
args.append(f"--optimize-index")
if self.rewrite_url:
args.append(f"--rewrite-url={self.rewrite_url}")
if self.name:
args.append(f"--name={self.name}")
if self.archive_passphrase:
args.append(f"--archive-passphrase={self.archive_passphrase}")
if self.ocr_lang:
args.append(f"--ocr-lang={self.ocr_lang}")
if self.ocr_ebooks:
args.append(f"--ocr-ebooks")
if self.ocr_images:
args.append(f"--ocr-images")
if self.exclude:
args.append(f"--exclude={self.exclude}")
if self.fast:
args.append(f"--fast")
if self.treemap_threshold:
args.append(f"--treemap-threshold={self.treemap_threshold}")
if self.read_subtitles:
args.append(f"--read-subtitles")
if self.fast_epub:
args.append(f"--fast-epub")
if self.checksums:
args.append(f"--checksums")
if self.list_file:
args.append(f"--list_file={self.list_file}")
return args
class Sist2Index:
def __init__(self, path):
self.path = path
with open(os.path.join(path, "descriptor.json")) as f:
self._descriptor = json.load(f)
def to_json(self):
return {
"path": self.path,
"version": self.version(),
"timestamp": self.timestamp(),
"name": self.name()
}
def version(self) -> Sist2Version:
return Sist2Version(self._descriptor["version"])
def timestamp(self) -> datetime:
return datetime.fromtimestamp(self._descriptor["timestamp"])
def name(self) -> str:
return self._descriptor["name"]
class WebOptions(BaseModel):
indices: List[str] = []
search_backend: str = "elasticsearch"
bind: str = "0.0.0.0:4090"
auth: str = None
tag_auth: str = None
tagline: str = "Lightning-fast file system indexer and search tool"
dev: bool = False
lang: str = "en"
auth0_audience: str = None
auth0_domain: str = None
auth0_client_id: str = None
auth0_public_key: str = None
auth0_public_key_file: str = None
verbose: bool = False
def __init__(self, **kwargs):
super().__init__(**kwargs)
def args(self, search_backend: Sist2SearchBackend):
args = ["web", f"--bind={self.bind}", f"--tagline={self.tagline}",
f"--lang={self.lang}"]
if search_backend.backend_type == SearchBackendType("sqlite"):
search_index_absolute = os.path.join(DATA_FOLDER, search_backend.search_index)
args.append(f"--search-index={search_index_absolute}")
else:
args.append(f"--es-url={search_backend.es_url}")
args.append(f"--es-index={search_backend.es_index}")
if search_backend.es_insecure_ssl:
args.append(f"--es-insecure-ssl")
if self.auth0_audience:
args.append(f"--auth0-audience={self.auth0_audience}")
if self.auth0_domain:
args.append(f"--auth0-domain={self.auth0_domain}")
if self.auth0_client_id:
args.append(f"--auth0-client-id={self.auth0_client_id}")
if self.auth0_public_key_file:
args.append(f"--auth0-public-key-file={self.auth0_public_key_file}")
if self.auth:
args.append(f"--auth={self.auth}")
if self.tag_auth:
args.append(f"--tag-auth={self.tag_auth}")
if self.dev:
args.append(f"--dev")
if self.verbose:
args.append(f"--very-verbose")
args.extend(self.indices)
return args
class Sist2:
def __init__(self, bin_path: str, data_directory: str):
self.bin_path = bin_path
self._data_dir = data_directory
def index(self, options: IndexOptions, search_backend: Sist2SearchBackend, logs_cb, set_pid_cb):
args = [
self.bin_path,
*options.args(search_backend),
"--json-logs",
"--very-verbose"
]
logs_cb({"sist2-admin": f"Starting sist2 command with args {args}"})
proc = Popen(args, stdout=PIPE, stderr=PIPE)
set_pid_cb(proc.pid)
t_stderr = Thread(target=self._consume_logs_stderr, args=(logs_cb, None, proc))
t_stderr.start()
self._consume_logs_stdout(logs_cb, proc)
t_stderr.join()
return proc.returncode
def scan(self, options: ScanOptions, logs_cb, set_pid_cb):
if options.output is None:
options.output = f"scan-{options.name.replace('/', '_')}-{datetime.utcnow()}.sist2"
args = [
self.bin_path,
*options.args(),
"--json-logs",
"--very-verbose"
]
logs_cb({"sist2-admin": f"Starting sist2 command with args {args}"})
proc = Popen(args, stdout=PIPE, stderr=PIPE)
set_pid_cb(proc.pid)
t_stderr = Thread(target=self._consume_logs_stderr, args=(logs_cb, None, proc))
t_stderr.start()
self._consume_logs_stdout(logs_cb, proc)
t_stderr.join()
return proc.returncode
@staticmethod
def _consume_logs_stderr(logs_cb, exit_cb, proc):
pipe_wrapper = TextIOWrapper(proc.stderr, encoding="utf8", errors="ignore")
try:
for line in pipe_wrapper:
if line.strip() == "":
continue
logs_cb({"stderr": line})
finally:
return_code = proc.wait()
if exit_cb:
exit_cb(return_code)
pipe_wrapper.close()
@staticmethod
def _consume_logs_stdout(logs_cb, proc):
pipe_wrapper = TextIOWrapper(proc.stdout, encoding="utf8", errors="ignore")
for line in pipe_wrapper:
try:
if line.strip() == "":
continue
log_object = json.loads(line)
logs_cb(log_object)
except Exception as e:
try:
logs_cb({"sist2-admin": f"Could not decode log line: {line}; {e}"})
except NameError:
pass
def web(self, options: WebOptions, search_backend: Sist2SearchBackend, name: str):
if options.auth0_public_key:
with NamedTemporaryFile("w", prefix="sist2-admin", suffix=".txt", delete=False) as f:
f.write(options.auth0_public_key)
options.auth0_public_key_file = f.name
else:
options.auth0_public_key_file = None
args = [
self.bin_path,
*options.args(search_backend)
]
web_logger = logging.Logger(name=f"sist2-frontend-{name}")
web_logger.addHandler(FileHandler(os.path.join(LOG_FOLDER, f"frontend-{name}.log")))
web_logger.addHandler(StreamHandler())
def logs_cb(message):
web_logger.info(json.dumps(message))
def exit_cb(return_code):
logger.info(f"Web frontend exited with return code {return_code}")
logger.info(f"Starting frontend {' '.join(args)}")
proc = Popen(args, stdout=PIPE, stderr=PIPE)
t_stderr = Thread(target=self._consume_logs_stderr, args=(logs_cb, exit_cb, proc))
t_stderr.start()
t_stdout = Thread(target=self._consume_logs_stdout, args=(logs_cb, proc))
t_stdout.start()
return proc.pid

View File

@ -1,136 +0,0 @@
from typing import Dict
import os
import shutil
from hexlib.db import Table, PersistentState
import pickle
from tesseract import get_tesseract_langs
import sqlite3
from config import LOG_FOLDER, logger
from sist2 import SearchBackendType, Sist2SearchBackend
RUNNING_FRONTENDS: Dict[str, int] = {}
TESSERACT_LANGS = get_tesseract_langs()
DB_SCHEMA_VERSION = "5"
from pydantic import BaseModel
def _serialize(item):
if isinstance(item, BaseModel):
return pickle.dumps(item)
if isinstance(item, bytes):
raise Exception("FIXME: bytes in PickleTable")
return item
def _deserialize(item):
if isinstance(item, bytes):
return pickle.loads(item)
return item
class PickleTable(Table):
def __getitem__(self, item):
row = super().__getitem__(item)
if row:
return dict((k, _deserialize(v)) for k, v in row.items())
return row
def __setitem__(self, key, value):
value = dict((k, _serialize(v)) for k, v in value.items())
super().__setitem__(key, value)
def __iter__(self):
for row in super().__iter__():
yield dict((k, _deserialize(v)) for k, v in row.items())
def sql(self, where_clause, *params):
for row in super().sql(where_clause, *params):
yield dict((k, _deserialize(v)) for k, v in row.items())
def get_log_files_to_remove(db: PersistentState, job_name: str, n: int):
if n < 0:
return []
counter = 0
to_remove = []
for row in db["task_done"].sql("WHERE has_logs=1 ORDER BY started DESC"):
if row["name"].endswith(f"[{job_name}]"):
counter += 1
if counter > n:
to_remove.append(row)
return to_remove
def delete_log_file(db: PersistentState, task_id: str):
db["task_done"][task_id] = {
"has_logs": 0
}
try:
os.remove(os.path.join(LOG_FOLDER, f"sist2-{task_id}.log"))
except:
pass
def migrate_v1_to_v2(db: PersistentState):
shutil.copy(db.dbfile, db.dbfile + "-before-migrate-v2.bak")
# Frontends
db._table_factory = PickleTable
frontends = [row["frontend"] for row in db["frontends"]]
del db["frontends"]
db._table_factory = Table
for frontend in frontends:
db["frontends"][frontend.name] = frontend
list(db["frontends"])
# Jobs
db._table_factory = PickleTable
jobs = [row["job"] for row in db["jobs"]]
del db["jobs"]
db._table_factory = Table
for job in jobs:
db["jobs"][job.name] = job
list(db["jobs"])
db["sist2_admin"]["info"] = {
"version": "2"
}
def create_default_search_backends(db: PersistentState):
es_backend = Sist2SearchBackend.create_default(name="elasticsearch",
backend_type=SearchBackendType("elasticsearch"))
db["search_backends"]["elasticsearch"] = es_backend
sqlite_backend = Sist2SearchBackend.create_default(name="sqlite", backend_type=SearchBackendType("sqlite"))
db["search_backends"]["sqlite"] = sqlite_backend
def migrate_v3_to_v4(db: PersistentState):
shutil.copy(db.dbfile, db.dbfile + "-before-migrate-v4.bak")
create_default_search_backends(db)
try:
conn = sqlite3.connect(db.dbfile)
conn.execute("ALTER TABLE task_done ADD COLUMN has_logs INTEGER DEFAULT 1")
conn.commit()
conn.close()
except Exception as e:
logger.exception(e)
db["sist2_admin"]["info"] = {
"version": "4"
}

View File

@ -1,14 +0,0 @@
import subprocess
def get_tesseract_langs():
res = subprocess.check_output([
"tesseract",
"--list-langs"
]).decode()
languages = res.split("\n")[1:]
return list(filter(lambda lang: lang and lang != "osd", languages))

View File

@ -1,41 +0,0 @@
from glob import glob
import os
from config import DATA_FOLDER
def get_old_index_files(name):
files = glob(os.path.join(DATA_FOLDER, f"scan-{name.replace('/', '_')}-*.sist2"))
files = list(sorted(files, key=lambda f: os.stat(f).st_mtime))
files = files[-1:]
return files
def tail_sync(filename, lines=1, _buffer=4098):
with open(filename) as f:
lines_found = []
block_counter = -1
while len(lines_found) < lines:
try:
f.seek(block_counter * _buffer, os.SEEK_END)
except IOError:
f.seek(0)
lines_found = f.readlines()
break
lines_found = f.readlines()
block_counter -= 1
return lines_found[-lines:]
def pid_is_running(pid):
try:
os.kill(pid, 0)
except OSError:
return False
return True

Some files were not shown because too many files have changed in this diff Show More