Compare commits

...

107 Commits

Author SHA1 Message Date
a8505cb8c1 Fix for #28 2020-02-20 16:42:13 -05:00
ae8652d86e UI tweaks, search syntax (#25) 2020-02-16 15:24:29 -05:00
849beb09d8 hotfix 2020-02-15 19:33:18 -05:00
e1aaaee617 UI tweak 2020-02-15 09:30:14 -05:00
c02b940945 (I forgot to commit this) 2020-02-14 20:58:10 -05:00
2934ddb07f Add image viewer (#2) 2020-02-14 18:28:55 -05:00
7f6f3c02fa OCR tweaks 2020-02-11 21:13:47 -05:00
7f98d5a682 Fix buffer overflow (whoops) 2020-02-09 18:11:29 -05:00
7eb9c5d7d5 Fix web/index issue with NULL mime types 2020-02-09 17:23:49 -05:00
184439aa38 increase minimum image size for OCR 2020-02-09 14:06:59 -05:00
1ce8b298a1 Display EXIF tags on document info panel, remove march=native on openjp 2020-02-09 13:21:19 -05:00
75f99025d9 add exif dateTime, allow some special characters in text meta 2020-02-09 08:47:13 -05:00
ebe852bd5a Fix rewrite-url arg 2020-02-09 08:23:17 -05:00
402b103c49 Fix total count for ES 7.5 2020-02-08 09:25:00 -05:00
e9b6e1cdc2 Turn off auto optimisation in libtesseract build 2020-02-08 08:32:04 -05:00
ed1ce8ab5e Handle XML errors #18 2020-02-07 10:08:01 -05:00
d1fa4febc4 Improve scroll feature, UI fix 2020-02-07 10:08:01 -05:00
048c55df7b Update README.md 2020-02-06 19:56:29 -05:00
f77bc6a025 Update README.md 2020-02-06 19:55:32 -05:00
efdde2734e version bump 2020-02-06 19:28:05 -05:00
66658fa8f7 Remove trailing/leading white space in text meta fields 2020-02-06 19:27:30 -05:00
df41c251e4 (Breaking!) Add some exif tags 2020-02-06 19:21:50 -05:00
3282ab56ba Version bump 2020-02-02 09:26:54 -05:00
8300838d30 Suppress XML parsing errors (#18) 2020-02-02 09:26:03 -05:00
c9870a6d3d Remove -march=native for release build... 2020-02-02 09:03:06 -05:00
a143cc4fcf bundle openssl... 2020-02-02 08:39:20 -05:00
9ef1f3781d fix attempt for #11 2020-02-01 20:04:26 -05:00
bbee8aa721 tesseract ocr path fix 2020-02-01 20:03:59 -05:00
d22f83c797 curl fix 2020-02-01 15:22:43 -05:00
50615486a4 curl fix attempt 2020-02-01 14:42:42 -05:00
ca79e4f797 add /status endpoint 2020-01-28 10:18:37 -05:00
6a9fd08a80 Merge pull request #21 from simon987/wip-20
Fixes #20
2020-01-27 09:16:00 -05:00
cab890dc9b #20 wip 2020-01-27 09:09:42 -05:00
b3c4faf2df Update README.md 2020-01-26 12:37:13 -05:00
353937171a Update README.md 2020-01-20 15:54:53 -05:00
c80002bea4 Bundle libcurl attempt 2 2020-01-18 11:53:12 -05:00
56adee9d81 Bundle libcurl, libopc bugfix #18 2020-01-18 10:25:02 -05:00
d6493d6d5f Bundle libpng 2020-01-16 16:21:38 -05:00
0967e9676d remove static build in CI... 2020-01-16 15:45:18 -05:00
487e998ea0 Display error message on /d/ error 2020-01-16 15:04:50 -05:00
919f45c79c Document info modal #19 2020-01-16 14:37:19 -05:00
d42129cfcb CI fix attempt 2020-01-15 20:11:45 -05:00
754983e34a Minor cleanup 2020-01-15 18:16:06 -05:00
7c8a3e2f9d Support for external json indices 2020-01-14 15:44:31 -05:00
3bb24b4453 Use bundled libtiff 2020-01-14 12:21:26 -05:00
9a56b959d3 Fix build problems... 2020-01-14 10:55:02 -05:00
5e3a2dbcc2 Update README 2020-01-14 10:47:00 -05:00
573f94f24e OCR support, remove static build 2020-01-14 10:26:40 -05:00
f5db78a69f Ignore special ascii chars, strip binary in docker build 2020-01-12 10:59:17 -05:00
5a2820d339 UI tweak auto-select based on query args 2020-01-11 17:48:51 -05:00
b7f13f425c Fix memory leaks (whoops) 2020-01-11 17:34:34 -05:00
d1a2f9b1d5 Strip binary (CI) 2020-01-07 14:32:39 -05:00
71f17986db build settings 2020-01-06 21:34:41 -05:00
acdd2fb3c1 Use bundled ffmpeg libraries 2020-01-06 16:25:34 -05:00
0cda6c00e1 CI attempt 2020-01-03 20:21:07 -05:00
14d0e5a1e1 possible fix for #18 2019-12-28 14:32:42 -05:00
0d06d39281 Path in list view #16 2019-12-28 14:32:05 -05:00
80708ca636 Merge pull request #17 from dpieski/patch-1
maybe a typo in cli.c
2019-12-23 18:33:28 -05:00
Andrew
43b7b40dc4 maybe a typo in cli.c
possibly corrected a typo
2019-12-23 13:18:18 -06:00
d051f541e2 Show client error on ES connection failure, fixes #13 2019-12-21 20:52:53 -05:00
0eefbac7b4 Update libopc. should fix #14 2019-12-21 19:43:33 -05:00
663f8e21c1 Better logging, fixes #15 2019-12-21 12:32:08 -05:00
80fbcb2a01 empty docx bugfix 2019-12-19 17:26:11 -05:00
8451109ecd OOXML files support 2019-12-19 16:53:18 -05:00
d6fe61cfdc Clarify help string for es url #12 2019-12-19 16:52:22 -05:00
254094130f Fix submodules 2019-12-13 12:35:39 -05:00
eaaa75c04c Fix submodules 2019-12-13 11:24:17 -05:00
bb87f4270f Update docker script 2019-12-13 11:16:17 -05:00
be23201210 Archive file support 2019-12-13 10:53:51 -05:00
9778acda77 uifix 2019-12-12 19:19:53 -05:00
8d187926d9 Bugfix with incremental comparison 2019-12-12 15:41:31 -05:00
88c37e3523 Update README.md 2019-12-04 20:56:52 -05:00
d816dae8b3 UI fix, disable thumbnail option, batch index size option 2019-12-01 10:57:29 -05:00
4346c3e063 Also use static libraries in sist2 build 2019-11-30 20:02:26 -05:00
1a1032a8a7 Cleaner shutdown 2019-11-30 19:59:11 -05:00
4ab2ba1a02 #8 Skip PDF scan when content-size is 0 2019-11-21 16:06:31 -05:00
d089601dc5 Add sfv & m3u 2019-11-20 12:31:31 -05:00
11df6cc88f Add nfo to ext list 2019-11-20 11:41:50 -05:00
373ac01e4e Fix for #3 and maximum scan depth 2019-11-19 11:23:30 -05:00
893ff145c5 List mode tweak 2019-11-17 16:28:47 -05:00
6111ded77f Merge pull request #6 from simon987/wip
List mode #5
2019-11-17 16:15:36 -05:00
34cc26b2fd List mode #5 wip 2019-11-17 15:03:24 -05:00
204034d859 Add basic auth. Fixes #4 2019-11-17 10:00:17 -05:00
16ccc6c0d3 Show error message on elasticsearch connection fail 2019-11-17 09:55:16 -05:00
94c617fdc3 Bug fix 2019-11-12 22:11:50 -05:00
ebfd7e03ce User scripts, bug fixes, docker image 2019-11-12 20:58:43 -05:00
6931d320a2 bugfix with invalid/corrupted index path 2019-11-11 20:49:38 -05:00
fc22e52eae Image placeholder 2019-11-09 23:26:49 -05:00
ba81748a74 Update build 2019-11-09 17:15:20 -05:00
e72fa1587b EXIF metadata for images 2019-11-09 15:18:44 -05:00
ea4fb7fa0d Bug fixes 2019-11-09 12:00:07 -05:00
b0a868bb73 remove 'must match' 2019-11-08 21:46:54 -05:00
d761a3b595 update readme 2019-11-08 19:42:36 -05:00
2d7a8a2fdc fuzzy toggle 2019-11-08 16:15:10 -05:00
152d2ddf8a bug fix in deserialize 2019-11-08 09:03:44 -05:00
bc5f22b759 update readme 2019-11-05 18:59:00 -05:00
534b397876 update readme, UI tweak: don't show broken images 2019-11-03 10:39:02 -05:00
7962a994e2 utf8 update + bug fixes 2019-11-03 07:50:31 -05:00
f8f1a27180 video metadata 2019-10-31 11:54:13 -04:00
784c3c9435 Font rendering fixes 2019-10-31 10:15:01 -04:00
f8b081a3f4 UI tweaks, path autocomplete 2019-10-31 08:26:19 -04:00
5661573b06 Dark theme, pdf meta, de-serialize bugfix 2019-10-30 22:20:22 -04:00
130fb78787 Fix some memory leaks 2019-10-27 15:40:48 -04:00
2943ca9365 UI tweak 2019-10-27 14:10:24 -04:00
7234c22d2f epub fix 2019-10-27 14:00:52 -04:00
bdbd7ca7ed cbz fix 2019-10-27 13:33:55 -04:00
9b7c56a608 Static build (scan only) 2019-10-27 12:25:34 -04:00
105 changed files with 7706 additions and 1211 deletions

3
.gitignore vendored
View File

@@ -11,7 +11,8 @@ Makefile
LOG
sist2*
index.sist2/
bundle.css
bundle*.css
bundle.js
*.a
vgcore.*
build/

45
.gitmodules vendored
View File

@@ -4,15 +4,42 @@
[submodule "cJSON"]
path = cJSON
url = https://github.com/DaveGamble/cJSON
[submodule "lib/mupdf"]
path = lib/mupdf
url = git://git.ghostscript.com/mupdf.git
[submodule "lib/onion"]
path = lib/onion
url = https://github.com/davidmoreno/onion
[submodule "lib/ffmpeg"]
path = lib/ffmpeg
url = https://git.ffmpeg.org/ffmpeg.git
[submodule "lmdb"]
path = lmdb
url = https://github.com/LMDB/lmdb
[submodule "utf8.h"]
path = utf8.h
url = https://github.com/sheredom/utf8.h
[submodule "lib/bzip2-1.0.6"]
path = lib/bzip2-1.0.6
url = https://github.com/enthought/bzip2-1.0.6
[submodule "lib/libmagic"]
path = lib/libmagic
url = https://github.com/threatstack/libmagic
[submodule "lib/harfbuzz"]
path = lib/harfbuzz
url = https://github.com/harfbuzz/harfbuzz
[submodule "lib/openjpeg"]
path = lib/openjpeg
url = https://github.com/uclouvain/openjpeg
[submodule "lib/ffmpeg"]
path = lib/ffmpeg
url = https://git.ffmpeg.org/ffmpeg.git
[submodule "lib/onion"]
path = lib/onion
url = https://github.com/davidmoreno/onion
[submodule "lib/mupdf"]
path = lib/mupdf
url = git://git.ghostscript.com/mupdf.git
[submodule "lib/tesseract"]
path = lib/tesseract
url = https://github.com/tesseract-ocr/tesseract
[submodule "lib/leptonica"]
path = lib/leptonica
url = https://github.com/danbloomberg/leptonica
[submodule "lib/libtiff"]
path = lib/libtiff
url = https://gitlab.com/libtiff/libtiff
[submodule "lib/libpng"]
path = lib/libpng
url = https://github.com/glennrp/libpng

69
.teamcity/settings.kts vendored Normal file
View File

@@ -0,0 +1,69 @@
import jetbrains.buildServer.configs.kotlin.v2019_2.*
import jetbrains.buildServer.configs.kotlin.v2019_2.buildSteps.ExecBuildStep
import jetbrains.buildServer.configs.kotlin.v2019_2.buildSteps.exec
import jetbrains.buildServer.configs.kotlin.v2019_2.triggers.vcs
import jetbrains.buildServer.configs.kotlin.v2019_2.vcs.GitVcsRoot
/*
The settings script is an entry point for defining a TeamCity
project hierarchy. The script should contain a single call to the
project() function with a Project instance or an init function as
an argument.
VcsRoots, BuildTypes, Templates, and subprojects can be
registered inside the project using the vcsRoot(), buildType(),
template(), and subProject() methods respectively.
To debug settings scripts in command-line, run the
mvnDebug org.jetbrains.teamcity:teamcity-configs-maven-plugin:generate
command and attach your debugger to the port 8000.
To debug in IntelliJ Idea, open the 'Maven Projects' tool window (View
-> Tool Windows -> Maven Projects), find the generate task node
(Plugins -> teamcity-configs -> teamcity-configs:generate), the
'Debug' option is available in the context menu for the task.
*/
version = "2019.2"
project {
vcsRoot(HttpsGithubComSimon987sist2refsHeadsMaster)
buildType(Build)
}
object Build : BuildType({
name = "Build"
artifactRules = """
sist2
sist2_scan
""".trimIndent()
vcs {
root(HttpsGithubComSimon987sist2refsHeadsMaster)
}
steps {
exec {
name = "Build"
path = "./ci/build.sh"
dockerImage = "simon987/general_ci"
dockerImagePlatform = ExecBuildStep.ImagePlatform.Linux
dockerPull = true
}
}
triggers {
vcs {
}
}
})
object HttpsGithubComSimon987sist2refsHeadsMaster : GitVcsRoot({
name = "https://github.com/simon987/sist2#refs/heads/master"
url = "https://github.com/simon987/sist2"
})

View File

@@ -19,9 +19,13 @@ add_executable(
src/parsing/text.h src/parsing/text.c
src/index/web.c src/index/web.h
src/web/serve.c src/web/serve.h
src/web/auth_basic.h src/web/auth_basic.c
src/index/elastic.c src/index/elastic.h
src/util.c src/util.h
src/ctx.h src/types.h src/parsing/font.c src/parsing/font.h
src/parsing/arc.c src/parsing/arc.h
src/parsing/doc.c src/parsing/doc.h
src/log.c src/log.h
# argparse
argparse/argparse.h argparse/argparse.c
@@ -32,60 +36,55 @@ add_executable(
# LMDB
lmdb/libraries/liblmdb/lmdb.h lmdb/libraries/liblmdb/mdb.c
lmdb/libraries/liblmdb/midl.h lmdb/libraries/liblmdb/midl.c
src/cli.c src/cli.h)
src/cli.c src/cli.h
# utf8.h
utf8.h/utf8.h
)
find_package(PkgConfig REQUIRED)
set(ENV{PKG_CONFIG_PATH} "$ENV{PKG_CONFIG_PATH}:/usr/local/lib/pkgconfig/")
find_package(LibMagic REQUIRED)
find_package(FFmpeg REQUIRED)
find_package(OpenSSL REQUIRED)
find_package(Freetype REQUIRED)
pkg_check_modules(GLIB REQUIRED glib-2.0)
pkg_check_modules(GOBJECT REQUIRED gobject-2.0)
pkg_check_modules(UUID REQUIRED uuid)
include_directories(${LIBMAGIC_INCLUDE_DIRS})
link_directories(${LIBMAGIC_LIBRARY_DIRS})
add_definitions(${LIBMAGIC_CFLAGS_OTHER})
link_directories(${UUID_LIBRARY_DIRS})
include_directories(${UUID_INCLUDE_DIRS})
add_definitions(${UUID_CFLAGS_OTHER})
include_directories(${GLIB_INCLUDE_DIRS})
link_directories(${GLIB_LIBRARY_DIRS})
add_definitions(${GLIB_CFLAGS_OTHER})
include_directories(${GOBJECT_INCLUDE_DIRS})
link_directories(${GOBJECT_LIBRARY_DIRS})
add_definitions(${GOBJECT_CFLAGS_OTHER})
link_directories(${FFMPEG_LIBRARY_DIRS})
include_directories(${FFMPEG_INCLUDE_DIRS})
include_directories(${OPENSSL_INCLUDE_DIR})
link_directories(${OPENSSL_CRYPTO_LIBRARY})
add_definitions(${FREETYPE_CFLAGS_OTHER})
list(REMOVE_ITEM GLIB_LIBRARIES pcre)
list(REMOVE_ITEM GOBJECT_LIBRARIES pcre)
list(REMOVE_ITEM UUID_LIBRARIES pcre)
include_directories(${FREETYPE_INCLUDE_DIRS})
add_definitions(${FREETYPE_CFLAGS_OTHER})
include_directories(
target_include_directories(
sist2 PUBLIC
${GOBJECT_INCLUDE_DIRS}
${GLIB_INCLUDE_DIRS}
${PROJECT_SOURCE_DIR}/lib/ffmpeg/
${FREETYPE_INCLUDE_DIRS}
${UUID_INCLUDE_DIRS}
${PROJECT_SOURCE_DIR}/
${PROJECT_SOURCE_DIR}/lmdb/libraries/liblmdb/
${PROJECT_SOURCE_DIR}/lib/onion/src/
${PROJECT_SOURCE_DIR}/lib/mupdf/include/
${PROJECT_SOURCE_DIR}/include/
/usr/include/libxml2/
${PROJECT_SOURCE_DIR}/lib/tesseract/include/
)
target_link_directories(
sist2 PUBLIC
${UUID_LIBRARY_DIRS}
)
target_compile_options(sist2
PRIVATE
-O3
-Ofast
# -march=native
-fPIC
-fno-stack-protector
-fomit-frame-pointer
)
@@ -103,8 +102,6 @@ TARGET_LINK_LIBRARIES(
${PROJECT_SOURCE_DIR}/lib/libavutil.a
${PROJECT_SOURCE_DIR}/lib/libswscale.a
${PROJECT_SOURCE_DIR}/lib/libswresample.a
# ${FFMPEG_LIBRARIES}
# swscale
# mupdf
${PROJECT_SOURCE_DIR}/lib/libmupdf.a
@@ -114,14 +111,36 @@ TARGET_LINK_LIBRARIES(
${PROJECT_SOURCE_DIR}/lib/libonion_static.a
pthread
curl
m
bz2
magic
${PROJECT_SOURCE_DIR}/lib/libmagic.a
${PROJECT_SOURCE_DIR}/lib/libharfbuzz.a
${PROJECT_SOURCE_DIR}/lib/libopenjp2.a
freetype
archive
xml2
${PROJECT_SOURCE_DIR}/lib/libopc/libmce.a
${PROJECT_SOURCE_DIR}/lib/libopc/libopc.a
${PROJECT_SOURCE_DIR}/lib/libopc/libplib.a
${PROJECT_SOURCE_DIR}/lib/libtesseract.a
${PROJECT_SOURCE_DIR}/lib/liblept.a
${PROJECT_SOURCE_DIR}/lib/libtiff.a
${PROJECT_SOURCE_DIR}/lib/libpng16.a
stdc++
# curl
${PROJECT_SOURCE_DIR}/lib/libcurl.a
${PROJECT_SOURCE_DIR}/lib/libcrypto.a
${PROJECT_SOURCE_DIR}/lib/libssl.a
dl
)
add_custom_target(
before_sist2
COMMAND ${CMAKE_CURRENT_SOURCE_DIR}/scripts/before_build.sh
)
add_dependencies(sist2 before_sist2)

22
Docker/Dockerfile Normal file
View File

@@ -0,0 +1,22 @@
FROM ubuntu:19.10
MAINTAINER simon987 <me@simon987.net>
RUN apt update
RUN apt install -y libglib2.0-0 libcurl4 libmagic1 libharfbuzz-bin libopenjp2-7 libarchive13 liblzma5 libzstd1 liblz4-1 \
curl libtiff5 libpng16-16
RUN mkdir -p /usr/share/tessdata && \
cd /usr/share/tessdata/ && \
curl -o /usr/share/tessdata/hin.traineddata https://raw.githubusercontent.com/tesseract-ocr/tessdata/master/hin.traineddata &&\
curl -o /usr/share/tessdata/jpn.traineddata https://raw.githubusercontent.com/tesseract-ocr/tessdata/master/jpn.traineddata &&\
curl -o /usr/share/tessdata/eng.traineddata https://raw.githubusercontent.com/tesseract-ocr/tessdata/master/eng.traineddata &&\
curl -o /usr/share/tessdata/fra.traineddata https://raw.githubusercontent.com/tesseract-ocr/tessdata/master/fra.traineddata &&\
curl -o /usr/share/tessdata/rus.traineddata https://raw.githubusercontent.com/tesseract-ocr/tessdata/master/rus.traineddata &&\
curl -o /usr/share/tessdata/spa.traineddata https://raw.githubusercontent.com/tesseract-ocr/tessdata/master/spa.traineddata && ls -lh
ADD sist2 /root/sist2
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
ENTRYPOINT ["/root/sist2"]

15
Docker/build.sh Executable file
View File

@@ -0,0 +1,15 @@
rm ./sist2
cp ../sist2 .
strip sist2
version=$(./sist2 --version)
echo "Version ${version}"
docker build . -t simon987/sist2:${version} -t simon987/sist2:latest \
-t docker.pkg.github.com/simon987/sist2/sist2:latest -t docker.pkg.github.com/simon987/sist2/sist2:${version}
docker push simon987/sist2:${version}
docker push simon987/sist2:latest
docker push docker.pkg.github.com/simon987/sist2/sist2:latest
docker push docker.pkg.github.com/simon987/sist2/sist2:${version}
docker run --rm -it simon987/sist2 -v

106
README.md
View File

@@ -1,5 +1,6 @@
![GitHub](https://img.shields.io/github/license/simon987/sist2.svg)
[![CodeFactor](https://www.codefactor.io/repository/github/simon987/sist2/badge?s=05daa325188aac4eae32c786f3d9cf4e0593f822)](https://www.codefactor.io/repository/github/simon987/sist2)
[![Development snapshots](https://ci.simon987.net/app/rest/builds/buildType(Sist2_Build)/statusIcon)](https://files.simon987.net/artifacts/Sist2/Build/)
# sist2
@@ -9,27 +10,36 @@ sist2 (Simple incremental search tool)
## Features
* Fast, low memory usage
* Fast, low memory usage, multi-threaded
* Portable (all its features are packaged in a single executable)
* Extracts text from common file types\*
* Generates thumbnails\*
* Extracts text from common file types \*
* Generates thumbnails \*
* Incremental scanning
* Automatic tagging from file attributes via [user scripts](scripting/README.md)
* Recursive scan inside archive files \*\*
* OCR support with tesseract \*\*\*
\* See [format support](#format-support)
\* See [format support](#format-support)
\*\* See [Archive files](#archive-files)
\*\*\* See [OCR](#ocr)
## Getting Started
1. Have an [Elasticsearch](https://www.elastic.co/downloads/elasticsearch) instance running
1. Download the [latest sist2 release](https://github.com/simon987/sist2/releases)
1.
1. Download the [latest sist2 release](https://github.com/simon987/sist2/releases) *
1. *(or)* Download a [development snapshot](https://files.simon987.net/artifacts/Sist2/Build/) *(Not recommended!)*
1. *(or)* `docker pull simon987/sist2:latest`
*Windows users*: `sist2` runs under [WSL](https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux)
*Mac users*: See [#1](https://github.com/simon987/sist2/issues/1)
\* *Windows users*: **sist2** runs under [WSL](https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux)
\* *Mac users*: See [#1](https://github.com/simon987/sist2/issues/1)
## Example usage
![demo](demo.gif)
See help page `sist2 --help` for more details.
@@ -52,19 +62,76 @@ sist2 index --print ./my_idx > raw_documents.ndjson
sist2 web --bind 0.0.0.0 --port 4321 ./my_idx1 ./my_idx2 ./my_idx3
```
### Use sist2 with docker
**scan**
```bash
docker run -it \
-v /path/to/files/:/files \
-v $PWD/out/:/out \
simon987/sist2 scan -t 4 /files -o /out/my_idx1
```
**index**
```bash
docker run -it --network host\
-v $PWD/out/:/out \
simon987/sist2 index /out/my_idx1
```
**web**
```bash
docker run --rm --network host -d --name sist2\
-v $PWD/out/my_idx:/idx \
-v $PWD/my/files:/files
simon987/sist2 web --bind 0.0.0.0 /idx
docker stop sist2
```
## Format support
File type | Library | Content | Thumbnail | Metadata
:---|:---|:---|:---|:---
pdf,xps,cbz,cbr,fb2,epub | MuPDF | yes | yes, `png` | *planned* |
`audio/*` | libav | - | yes, `jpeg` | ID3 tags |
`video/*` | libav | - | yes, `jpeg` | *planned* |
`image/*` | libav | - | yes, `jpeg` | *planned* |
pdf,xps,cbz,fb2,epub | MuPDF | text+ocr | yes, `png` | title |
`audio/*` | ffmpeg | - | yes, `jpeg` | ID3 tags |
`video/*` | ffmpeg | - | yes, `jpeg` | title, comment, artist |
`image/*` | ffmpeg | - | yes, `jpeg` | [Common EXIF tags](https://github.com/simon987/sist2/blob/efdde2734eca9b14a54f84568863b7ffd59bdba3/src/parsing/media.c#L190) |
ttf,ttc,cff,woff,fnt,otf | Freetype2 | - | yes, `bmp` | Name & style |
`text/plain` | *(none)* | yes | no | - |
docx, xlsx, pptx | | *planned* | no | *planned* |
tar, zip, rar, 7z, ar ... | Libarchive | yes\* | - | no |
docx, xlsx, pptx | libOPC | yes | no | no |
\* *See [Archive files](#archive-files)*
### Archive files
**sist2** will scan files stored into archive files (zip, tar, 7z...) as if
they were directly in the file system. Recursive (archives inside archives)
scan is also supported.
**Limitations**:
* Parsing media files with formats that require
*seek* (e.g. `.gif`, `.mp4` w/ fragmented metadata etc.) is not supported.
* Archive files are scanned sequentially, by a single thread. On systems where
**sist2** is not I/O bound, scans might be faster when larger archives are split
into smaller parts.
To check if a media file can be parsed without *seek*, execute `cat file.mp4 | ffprobe -`
### OCR
You can enable OCR support for pdf,xps,cbz,fb2,epub file types with the
`--ocr <lang>` option. Download the language data files with your
package manager (`apt install tesseract-ocr-eng`) or directly [from Github](https://github.com/tesseract-ocr/tesseract/wiki/Data-Files).
The `simon987/sist2` github image comes with common languages
(hin, jpn, eng, fra, rus, spa) pre-installed.
Examples
```bash
sist2 scan --ocr jpn ~/Books/Manga/
sist2 scan --ocr eng ~/Books/Textbooks/
```
## Build from source
@@ -76,15 +143,16 @@ binaries.
*(Debian)*
```bash
apt install git cmake pkg-config libglib2.0-dev\
libssl-dev uuid-dev libavformat-dev libswscale-dev \
python3 libmagic-dev libfreetype6-dev libcurl-dev \
libbz2-dev yasm
apt install git cmake pkg-config libglib2.0-dev \
libssl-dev uuid-dev python3 libmagic-dev libfreetype6-dev \
libcurl-dev libbz2-dev yasm libharfbuzz-dev ragel \
libarchive-dev libtiff5 libpng16-16 libpango1.0-dev
```
2. Build
```bash
git clone --recurse-submodules https://github.com/simon987/sist2
./scripts/get_static_libs.sh
cmake .
make
```
```

2
cJSON

Submodule cJSON updated: 2de7d04aaf...2d4ad84192

7
ci/build.sh Normal file
View File

@@ -0,0 +1,7 @@
#!/usr/bin/env bash
./scripts/get_static_libs.sh
cmake .
make
strip sist2

53
include/mce/config.h Normal file
View File

@@ -0,0 +1,53 @@
/*
Copyright (c) 2010, Florian Reuter
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Florian Reuter nor the names of its contributors
may be used to endorse or promote products derived from this
software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
OF THE POSSIBILITY OF SUCH DAMAGE.
*/
/**@file config/mce/config.h
*/
#ifndef MCE_CONFIG_H
#define MCE_CONFIG_H
#include <libxml/xmlstring.h>
#include <stdio.h>
#include <plib/plib.h>
#include <assert.h>
#ifdef __cplusplus
extern "C" {
#endif
#define MCE_NAMESPACE_SUBSUMPTION_ENABLED 0
#ifdef __cplusplus
} /* extern "C" */
#endif
#endif /* MCE_CONFIG_H */

189
include/mce/helper.h Normal file
View File

@@ -0,0 +1,189 @@
/*
Copyright (c) 2010, Florian Reuter
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Florian Reuter nor the names of its contributors
may be used to endorse or promote products derived from this
software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
OF THE POSSIBILITY OF SUCH DAMAGE.
*/
/** @file mce/helper.h
Helper functions needed by mce/textreader.h and mce/textwriter.h to implement MCE:
- mceQNameLevelAdd(), mceQNameLevelLookup() and mceQNameLevelCleanup() maintain a set of mceQNameLevel_t tuples.
- mceQNameLevelPush() and mceQNameLevelPopIfMatch() maintain a stack of mceQNameLevel_t tuples.
- mceCtxInit(), mceCtxCleanup() and mceCtxUnderstandsNamespace() manage a context which holds all information needed to do MCE proprocessing.
*/
#include <mce/config.h>
#ifndef MCE_HELPER_H
#define MCE_HELPER_H
#ifdef __cplusplus
extern "C" {
#endif
/**
Tiple (ns, ln, level).
*/
typedef struct MCE_QNAME_LEVEL {
xmlChar *ns;
xmlChar *ln;
puint32_t level;
puint32_t flag; // used by mceTextWriter
} mceQNameLevel_t;
/**
*/
typedef enum MCE_SKIP_STATE_ENUM {
MCE_SKIP_STATE_IGNORE,
MCE_SKIP_STATE_ALTERNATE_CONTENT,
MCE_SKIP_STATE_CHOICE_MATCHED
} mceSkipState_t;
/**
Represents an intervall of levels which are "skipped" i.e. ignored.
*/
typedef struct MCE_SKIP_ITEM {
puint32_t level_start;
puint32_t level_end;
mceSkipState_t state;
} mceSkipItem_t;
/**
Either represents a set of (ns, ln, level) triples.
*/
typedef struct MCE_QNAME_LEVEL_SET {
mceQNameLevel_t *list_array;
puint32_t list_items;
puint32_t max_level;
} mceQNameLevelSet_t;
/**
The skip stack.
*/
typedef struct MCE_SKIP_STACK {
mceSkipItem_t *stack_array;
puint32_t stack_items;
} mceSkipStack_t;
typedef enum MCE_ERROR_ENUM {
MCE_ERROR_NONE,
MCE_ERROR_XML,
MCE_ERROR_MUST_UNDERSTAND,
MCE_ERROR_VALIDATION,
MCE_ERROR_MEMORY
} mceError_t;
/**
Holds all information to do MCE preprocessing.
*/
typedef struct MCE_CONTEXT {
mceQNameLevelSet_t ignorable_set;
mceQNameLevelSet_t understands_set;
mceQNameLevelSet_t processcontent_set;
mceQNameLevelSet_t suspended_set;
#if (MCE_NAMESPACE_SUBSUMPTION_ENABLED)
mceQNameLevelSet_t subsume_namespace_set;
mceQNameLevelSet_t subsume_exclude_set;
mceQNameLevelSet_t subsume_prefix_set;
#endif
mceSkipStack_t skip_stack;
mceError_t error;
pbool_t mce_disabled;
puint32_t suspended_level;
} mceCtx_t;
/**
Add a new tiple (ns, ln, level) to the triple set \c qname_level_set.
The \c ns_sub string is optional and will not be touched.
*/
pbool_t mceQNameLevelAdd(mceQNameLevelSet_t *qname_level_set, const xmlChar *ns, const xmlChar *ln, puint32_t level);
/**
Lookup a tiple (ns, ln, level) via \c ns and \c ln. If \c ignore_ln is PTRUE then the first tiple matching \c ns will be returned.
*/
mceQNameLevel_t* mceQNameLevelLookup(mceQNameLevelSet_t *qname_level_set, const xmlChar *ns, const xmlChar *ln, pbool_t ignore_ln);
/**
Remove all triples (ns, ln, level) where the level greater or equal to \c level.
*/
pbool_t mceQNameLevelCleanup(mceQNameLevelSet_t *qname_level_set, puint32_t level);
/**
Push a new skip intervall (level_start, level_end, state) on the stack \c skip_stack.
*/
pbool_t mceSkipStackPush(mceSkipStack_t *skip_stack, puint32_t level_start, puint32_t level_end, mceSkipState_t state);
/**
Pop the intervall (ns, ln, level) from the stack \c qname_level_array.
*/
void mceSkipStackPop(mceSkipStack_t *skip_stack);
/**
Returns top item or NULL.
*/
mceSkipItem_t *mceSkipStackTop(mceSkipStack_t *skip_stack);
/**
Returns TRUE, if the \c level is in the top skip intervall.
*/
pbool_t mceSkipStackSkip(mceSkipStack_t *skip_stack, puint32_t level);
/**
Initialize the mceCtx_t \c ctx.
*/
pbool_t mceCtxInit(mceCtx_t *ctx);
/**
Cleanup, i.e. release all resourced from the mceCtx_t \c ctx.
*/
pbool_t mceCtxCleanup(mceCtx_t *ctx);
/**
Register the namespace \ns in \c ctx.
*/
pbool_t mceCtxUnderstandsNamespace(mceCtx_t *ctx, const xmlChar *ns);
/**
Register the namespace \ns in \c ctx.
*/
pbool_t mceCtxSuspendProcessing(mceCtx_t *ctx, const xmlChar *ns, const xmlChar *ln);
#if (MCE_NAMESPACE_SUBSUMPTION_ENABLED)
/**
Subsume namespace \c ns_new with \c ns_old.
*/
pbool_t mceCtxSubsumeNamespace(mceCtx_t *ctx, const xmlChar *prefix_new, const xmlChar *ns_new, const xmlChar *ns_old);
#endif
#ifdef __cplusplus
} /* extern "C" */
#endif
#endif /* MCE_HELPER_H */

464
include/mce/textreader.h Normal file
View File

@@ -0,0 +1,464 @@
/*
Copyright (c) 2010, Florian Reuter
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Florian Reuter nor the names of its contributors
may be used to endorse or promote products derived from this
software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
OF THE POSSIBILITY OF SUCH DAMAGE.
*/
/** @file mce/textreader.h
*/
#ifndef MCE_TEXTREADER_H
#define MCE_TEXTREADER_H
#ifdef __cplusplus
extern "C" {
#endif
/**
A handle to an MCE-aware libxml2 xmlTextReader.
*/
typedef struct MCE_TEXTREADER mceTextReader_t;
#ifdef __cplusplus
} /* extern "C" */
#endif
#include <mce/config.h>
#include <opc/opc.h>
#include <mce/helper.h>
#include <libxml/xmlwriter.h>
#ifdef __cplusplus
extern "C" {
#endif
struct MCE_TEXTREADER {
xmlTextReaderPtr reader;
mceCtx_t mceCtx;
};
/**
Wrapper around an libxml2 xmlTextReaderRead function.
\see http://xmlsoft.org/html/libxml-xmlreader.html#xmlTextReaderRead
*/
int mceTextReaderRead(mceTextReader_t *mceTextReader);
/**
Wrapper around a libxml2 xmlTextReaderNext function.
\see http://xmlsoft.org/html/libxml-xmlreader.html#xmlTextReaderNext
*/
int mceTextReaderNext(mceTextReader_t *mceTextReader);
/**
Creates an mceTextReader from an XmlTextReader.
\code
mceTextReader reader;
mceTextReaderInit(&reader, xmlNewTextReaderFilename("sample.xml"));
// reader is ready to use.
mceTextReaderCleanup(&reader);
\endcode
\see http://xmlsoft.org/html/libxml-xmlreader.html#xmlNewTextReaderFilename
*/
int mceTextReaderInit(mceTextReader_t *mceTextReader, xmlTextReaderPtr reader);
/**
Cleanup MCE reader, i.e. free all resources. Also calls xmlTextReaderClose and xmlFreeTextReader.
\see http://xmlsoft.org/html/libxml-xmlreader.html#xmlTextReaderClose
\see http://xmlsoft.org/html/libxml-xmlreader.html#xmlFreeTextReader
*/
int mceTextReaderCleanup(mceTextReader_t *mceTextReader);
/**
Reads all events \c mceTextReader and pipes them to \writer.
\code
mceTextReader reader;
mceTextReaderInit(&reader, xmlNewTextReaderFilename("sample.xml"));
mceTextReaderUnderstandsNamespace(&reader, _X("http://myextension"));
xmlTextWriterPtr writer=xmlNewTextWriterFilename("out.xml", 0);
mceTextReaderDump(&reader, writer, P_FALSE);
xmlFreeTextWriter(writer);
mceTextReaderCleanup(&reader);
\endcode
*/
int mceTextReaderDump(mceTextReader_t *mceTextReader, xmlTextWriter *writer, pbool_t fragment);
/**
Registers an MCE namespace.
\see mceTextReaderDump()
*/
int mceTextReaderUnderstandsNamespace(mceTextReader_t *mceTextReader, const xmlChar *ns);
/**
Disable MCE processing.
\return Returns old value.
*/
pbool_t mceTextReaderDisableMCE(mceTextReader_t *mceTextReader, pbool_t flag);
/**
Signal an error to the MCE processor.
*/
void mceRaiseError(xmlTextReader *reader, mceCtx_t *ctx, mceError_t error, const xmlChar *str, ...);
/**
Internal function which does the MCE postprocessing. E.g. mceTextReaderRead() is implemented as
\code
mceTextReaderPostprocess(mceTextReader->reader, &mceTextReader->mceCtx, xmlTextReaderRead(mceTextReader->reader))
\endcode
This function is exposed to make existing libxm2 xmlTextReader MCE aware.
*/
int mceTextReaderPostprocess(xmlTextReader *reader, mceCtx_t *ctx, int ret);
/**
Get the error code.
*/
mceError_t mceTextReaderGetError(mceTextReader_t *mceTextReader);
/**
Helper macro to declare a start/end document block in a declarative way:
\code
mce_start_document(reader) {
} mce_end_document(reader);
\endcode
\hideinitializer
*/
#define mce_start_document(_reader_) \
if (NULL!=(_reader_)) { \
mceTextReaderRead(_reader_); \
if (0)
/**
\see mce_start_document.
\hideinitializer
*/
#define mce_end_document(_reader_) \
} /* if (NULL!=reader) */ \
/**
Container for mce_start_element and mce_start_attribute declarations.
\see mce_match_element
\see mce_match_attribute
\hideinitializer
*/
#define mce_start_choice(_reader_) \
if (0)
/**
\see mce_start_choice
\hideinitializer
*/
#define mce_end_choice(_reader_)
/**
Skips the attributes.
\see mce_match_element.
\hideinitializer
*/
#define mce_skip_attributes(_reader_) \
mce_start_attributes(_reader_) { \
} mce_end_attributes(_reader_);
/**
Skips the attributes.
\see mce_match_attribute.
\hideinitializer
*/
#define mce_skip_children(_reader_) \
mce_start_children(_reader_) { \
} mce_end_children(_reader_);
/**
\see mce_start_element.
\hideinitializer
*/
#define mce_start_children(_reader_) \
if (!xmlTextReaderIsEmptyElement((_reader_)->reader)) { \
mceTextReaderRead(_reader_); do { \
if (0)
/**
\see mce_start_element.
\hideinitializer
*/
#define mce_end_children(_reader_) \
else { \
if (XML_READER_TYPE_END_ELEMENT!=xmlTextReaderNodeType((_reader_)->reader)) { \
mceTextReaderNext(_reader_); /*skip unhandled element */ \
} \
} \
} while(XML_READER_TYPE_END_ELEMENT!=xmlTextReaderNodeType((_reader_)->reader) && \
XML_READER_TYPE_NONE!=xmlTextReaderNodeType((_reader_)->reader)); \
} /* if (!xmlTextReaderIsEmptyElement(reader->reader)) */
/**
Helper macro to match an element. Usefull for calling code in a seperate function:
\code
void handleElement(reader) {
mce_start_choice(reader) {
mce_start_element(reader, _X("ns"), _X("element")) {
} mce_end_element(reader)
} mce_end_choice(reader);
}
void parse(reader) {
mce_start_document(reader) {
mce_start_element(reader, _X("ns"), _X("ln")) {
mce_skip_attributes(reader);
mce_start_children(reader) {
mce_match_element(reader, _X("ns"), _X("element")) {
handleElement(reader);
}
} mce_end_children(reader);
} mce_end_element();
} mce_end_document(reader);
}
\endcode
\hideinitializer
*/
#define mce_match_element(_reader_, ns, ln) \
} else if (XML_READER_TYPE_ELEMENT==xmlTextReaderNodeType((_reader_)->reader) \
&& (NULL==ns || 0==xmlStrcmp(ns, xmlTextReaderConstNamespaceUri((_reader_)->reader))) \
&& (NULL==ln || 0==xmlStrcmp(ln, xmlTextReaderConstLocalName((_reader_)->reader)))) {
/**
Helper macro to declare a element block in a declarative way:
\code
mce_start_element(reader) {
mce_start_attributes(reader) {
mce_start_attribute(reader, _X("ns"), _X("lnA")) {
// code for handling lnA.
} mce_end_attribute(reader);
mce_start_attribute(reader, _X("ns"), _X("lnB")) {
// code for handling lnB.
} mce_end_attribute(reader);
} mce_end_attributes(reader);
mce_start_children(reader) {
mce_start_element(reader, _X("ns"), _X("lnA")) {
// code for handling lnA.
} mce_end_element(reader);
mce_start_element(reader, _X("ns"), _X("lnB")) {
// code for handling lnB.
} mce_end_element(reader);
mce_start_text(reader) {
// code for handling text.
} mce_end_text(reader);
} mce_end_children(reader);
} mce_end_element(reader);
\endcode
\hideinitializer
*/
#define mce_start_element(_reader_, ns, ln) \
mce_match_element(_reader_, ns, ln)
/**
\see mce_start_element.
\hideinitializer
*/
#define mce_end_element(_reader_) \
mceTextReaderNext(_reader_)
/**
Matches #TEXT without consuming it.
\hideinitializer
*/
#define mce_match_text(_reader_) \
} else if (XML_READER_TYPE_TEXT==xmlTextReaderNodeType((_reader_)->reader) \
|| XML_READER_TYPE_SIGNIFICANT_WHITESPACE==xmlTextReaderNodeType((_reader_)->reader)) {
/**
\see mce_start_element.
\hideinitializer
*/
#define mce_start_text(_reader_) \
mce_match_text(_reader_)
/**
\see mce_start_element.
\hideinitializer
*/
#define mce_end_text(_reader_) \
mceTextReaderNext(_reader_)
/**
\see mce_start_element.
\hideinitializer
*/
#define mce_start_attributes(_reader_) \
if (1==xmlTextReaderMoveToFirstAttribute((_reader_)->reader)) { \
do { \
if (0)
/**
\see mce_start_element.
\hideinitializer
*/
#define mce_end_attributes(_reader_) \
else { /* skipped attribute */ } \
} while(1==xmlTextReaderMoveToNextAttribute((_reader_)->reader)); \
xmlTextReaderMoveToElement((_reader_)->reader); }
/**
Helper macro to match an attribute. Usefull for calling code in a seperate function:
\code
void handleA(reader) {
mce_start_choice(reader) {
mce_start_attribute(reader, _X("ns"), _X("attr")) {
} mce_end_attribute(reader);
} mce_end_choice(reader);
}
void parse(reader) {
mce_start_document(reader) {
mce_start_element(reader, _X("ns"), _X("ln")) {
mce_start_attributes(reader) {
mce_match_attribute(reader, _X("ns"), _X("attr")) {
handleA(reader);
}
} mce_end_attributes(reader);
mce_skip_children(reader);
} mce_end_element();
} mce_end_document(reader);
}
\endcode
\hideinitializer
*/
#define mce_match_attribute(_reader_, ns, ln) \
} else if ((NULL==ns || 0==xmlStrcmp(ns, xmlTextReaderConstNamespaceUri((_reader_)->reader))) \
&& (NULL==ln || 0==xmlStrcmp(ln, xmlTextReaderConstLocalName((_reader_)->reader)))) {
/**
\see mce_start_element.
\hideinitializer
*/
#define mce_start_attribute(_reader_, ns, ln) \
mce_match_attribute(_reader_, ns, ln)
/**
\see mce_start_element.
\hideinitializer
*/
#define mce_end_attribute(_reader_)
/**
Error handling for MCE parsers.
\code
mce_start_element(&reader, NULL, _X("Default")) {
const xmlChar *ext=NULL;
const xmlChar *type=NULL;
mce_start_attributes(&reader) {
mce_start_attribute(&reader, NULL, _X("Extension")) {
ext=xmlTextReaderConstValue(reader.reader);
} mce_end_attribute(&reader);
mce_start_attribute(&reader, NULL, _X("ContentType")) {
type=xmlTextReaderConstValue(reader.reader);
} mce_end_attribute(&reader);
} mce_end_attributes(&reader);
mce_error_guard_start(&reader) {
mce_error(&reader, NULL==ext || ext[0]==0, MCE_ERROR_VALIDATION, "Missing @Extension attribute!");
mce_error(&reader, NULL==type || type[0]==0, MCE_ERROR_VALIDATION, "Missing @ContentType attribute!");
opcContainerType *ct=insertType(c, type, OPC_TRUE);
mce_error(&reader, NULL==ct, MCE_ERROR_MEMORY, NULL);
opcContainerExtension *ce=opcContainerInsertExtension(c, ext, OPC_TRUE);
mce_error(&reader, NULL==ce, MCE_ERROR_MEMORY, NULL);
mce_errorf(&reader, NULL!=ce->type && 0!=xmlStrcmp(ce->type, type), MCE_ERROR_VALIDATION, "Extension \"%s\" is mapped to type \"%s\" as well as \"%s\"", ext, type, ce->type);
ce->type=ct->type;
} mce_error_guard_end(&reader);
mce_skip_children(&reader);
} mce_end_element(&reader);
\endcode
\hideinitializer
*/
#define mce_error_guard_start(_reader_) if (MCE_ERROR_NONE==(_reader_)->mceCtx.error) do {
/**
\see mce_error_guard_start
\hideinitializer
*/
#define mce_error_guard_end(_reader_) } while(0)
/**
Signal an error if guard if false.
\hideinitializer
*/
#define mce_error(_reader_, guard, err, msg) if (guard) { (_reader_)->mceCtx.error=(err); fprintf(stderr, (NULL!=msg?msg:#err)); continue; }
/**
Signal an error if guard if false.
\hideinitializer
*/
#if defined(__GNUC__)
#define mce_errorf(_reader_, guard, err, msg, ...) if (guard) { mceRaiseError((_reader_)->reader, &(_reader_)->mceCtx, err, _X((NULL!=msg?msg:#err)), ##__VA_ARGS__ ); continue; }
#else
#define mce_errorf(_reader_, guard, err, msg, ...) if (guard) { mceRaiseError((_reader_)->reader, &(_reader_)->mceCtx, err, _X((NULL!=msg?msg:#err)), __VA_ARGS__ ); continue; }
#endif
/**
Only issues the error when in "strict mode".
\hideinitializer
*/
#define mce_error_strict mce_error
/**
\see mce_error_strict
\hideinitializer
*/
#define mce_error_strictf mce_errorf
/**
Marker for a MCE defintion.
\hideinitializer
*/
#define mce_def
/**
Marker for a MCE reference.
\hideinitializer
*/
#define mce_ref(r) (r)
#ifdef __cplusplus
} /* extern "C" */
#endif
#endif /* MCE_TEXTREADER_H */

176
include/mce/textwriter.h Normal file
View File

@@ -0,0 +1,176 @@
/*
Copyright (c) 2010, Florian Reuter
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Florian Reuter nor the names of its contributors
may be used to endorse or promote products derived from this
software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
OF THE POSSIBILITY OF SUCH DAMAGE.
*/
/** @file mce/textwriter.h
*/
#include <mce/config.h>
#include <libxml/xmlwriter.h>
#include <mce/helper.h>
#ifndef MCE_TEXTWRITER_H
#define MCE_TEXTWRITER_H
#ifdef __cplusplus
extern "C" {
#endif
/**
Default flags for an MCE namespace declaration.
*/
#define MCE_DEFAULT 0x0
/**
Flags MCE namespace declaration "ignorable".
*/
#define MCE_IGNORABLE 0x1
/**
Flags MCE namespace declaration "must understand".
*/
#define MCE_MUSTUNDERSTAND 0x2
/**
The MCE text writer context.
*/
typedef struct MCE_TEXTWRITER_STRUCT mceTextWriter;
/**
Create a new MCE text writer.
\see http://xmlsoft.org/html/libxml-xmlIO.html#xmlOutputBufferCreateIO
\see http://xmlsoft.org/html/libxml-xmlwriter.html#xmlNewTextWriter
*/
mceTextWriter *mceTextWriterCreateIO(xmlOutputWriteCallback iowrite, xmlOutputCloseCallback ioclose, void *ioctx, xmlCharEncodingHandlerPtr encoder);
/**
Helper which create a new MCE text writer for a FILE handle.
*/
mceTextWriter *mceNewTextWriterFile(FILE *file);
/**
Free all resources for \w.
*/
int mceTextWriterFree(mceTextWriter *w);
/**
\see http://xmlsoft.org/html/libxml-xmlwriter.html#xmlTextWriterStartDocument
*/
int mceTextWriterStartDocument(mceTextWriter *w);
/**
\see http://xmlsoft.org/html/libxml-xmlwriter.html#xmlTextWriterEndDocument
*/
int mceTextWriterEndDocument(mceTextWriter *w);
/**
Start a new XML element. If ns==NULL then there is no namespace and ""==ns means the default namespace.
\see http://xmlsoft.org/html/libxml-xmlwriter.html#xmlTextWriterStartElement
\see http://xmlsoft.org/html/libxml-xmlwriter.html#xmlTextWriterStartElementNS
*/
int mceTextWriterStartElement(mceTextWriter *w, const xmlChar *ns, const xmlChar *ln);
/**
\see http://xmlsoft.org/html/libxml-xmlwriter.html#xmlTextWriterEndElement
*/
int mceTextWriterEndElement(mceTextWriter *w, const xmlChar *ns, const xmlChar *ln);
/**
\see http://xmlsoft.org/html/libxml-xmlwriter.html#xmlTextWriterWriteString
*/
int mceTextWriterWriteString(mceTextWriter *w, const xmlChar *content);
/**
Register a namespace. Must be called before mceTextWriterStartElement.
\see MCE_DEFAULT
\see MCE_IGNORABLE
\see MCE_MUSTUNDERSTAND
*/
const xmlChar *mceTextWriterRegisterNamespace(mceTextWriter *w, const xmlChar *ns, const xmlChar *prefix, int flags);
/**
Register qname (ns, ln) as a "process content" element wrt. MCE. Must be called before mceTextWriterStartElement.
*/
int mceTextWriterProcessContent(mceTextWriter *w, const xmlChar *ns, const xmlChar *ln);
/**
Writes a formatted attribute.
\see http://xmlsoft.org/html/libxml-xmlwriter.html#xmlTextWriterWriteFormatAttribute
*/
int mceTextWriterAttributeF(mceTextWriter *w, const xmlChar *ns, const xmlChar *ln, const char *value, ...);
/**
Starts an MCE alternate content section.
*/
int mceTextWriterStartAlternateContent(mceTextWriter *w);
/**
Ends an MCE alternate content section.
*/
int mceTextWriterEndAlternateContent(mceTextWriter *w);
/**
Start an MCE choice.
*/
int mceTextWriterStartChoice(mceTextWriter *w, const xmlChar *ns);
/**
Ends an MCE choice.
*/
int mceTextWriterEndChoice(mceTextWriter *w);
/**
Start an MCE fallback.
*/
int mceTextWriterStartFallback(mceTextWriter *w);
/**
Ends an MCE fallback.
*/
int mceTextWriterEndFallback(mceTextWriter *w);
/**
Returns the underlying xmlTextWriter.
*/
xmlTextWriterPtr mceTextWriterIntern(mceTextWriter *w);
/**
Helper which create a new xmlTextWriterPtr for a FILE handle.
*/
xmlTextWriterPtr xmlNewTextWriterFile(FILE *file);
#ifdef __cplusplus
} /* extern "C" */
#endif
#endif /* MCE_TEXTWRITER_H */

189
include/opc/config.h Normal file
View File

@@ -0,0 +1,189 @@
/*
Copyright (c) 2010, Florian Reuter
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Florian Reuter nor the names of its contributors
may be used to endorse or promote products derived from this
software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
OF THE POSSIBILITY OF SUCH DAMAGE.
*/
/**@file config/opc/config.h
*/
#ifndef OPC_CONFIG_H
#define OPC_CONFIG_H
#include <libxml/xmlstring.h>
#include <plib/plib.h>
#include <assert.h>
#ifdef __cplusplus
extern "C" {
#endif
/**
Assert expression e is true. Will be removed entirely in release mode.
\hideinitializer
*/
#define OPC_ASSERT(e) assert(e)
/**
Assert expression e is true. Expression will be executed in release mode too.
\hideinitializer
*/
#ifdef NDEBUG
#define OPC_ENSURE(e) (void)(e)
#else
#define OPC_ENSURE(e) assert(e)
#endif
/**
Constant for boolean true.
\hideinitializer
*/
#define OPC_TRUE (0==0)
/**
Constant for boolean false.
\hideinitializer
*/
#define OPC_FALSE (0==1)
/**
Boolean type.
\hideinitializer
*/
typedef pbool_t opc_bool_t;
/**
Type which represents an offset in e.g. a file.
\hideinitializer
*/
typedef pofs_t opc_ofs_t;
/**
8-bit unsigned integer.
\hideinitializer
*/
typedef puint8_t opc_uint8_t;
/**
16-bit unsigned integer.
\hideinitializer
*/
typedef puint16_t opc_uint16_t;
/**
32-bit unsigned integer.
\hideinitializer
*/
typedef puint32_t opc_uint32_t;
/**
64-bit unsigned integer.
\hideinitializer
*/
typedef puint64_t opc_uint64_t;
/**
8-bit signed integer.
\hideinitializer
*/
typedef pint8_t opc_int8_t;
/**
16-bit signed integer.
\hideinitializer
*/
typedef pint16_t opc_int16_t;
/**
32-bit signed integer.
\hideinitializer
*/
typedef pint32_t opc_int32_t;
/**
64-bit signed integer.
\hideinitializer
*/
typedef pint64_t opc_int64_t;
/**
Default size fo the deflate buffer used by zlib.
*/
#define OPC_DEFLATE_BUFFER_SIZE 4096
/**
Max system path len.
*/
#define OPC_MAX_PATH 512
/**
Error codes for the OPC module.
*/
typedef enum OPC_ERROR_ENUM {
OPC_ERROR_NONE,
OPC_ERROR_STREAM,
OPC_ERROR_SEEK, // can't seek
OPC_ERROR_UNSUPPORTED_DATA_DESCRIPTOR,
OPC_ERROR_UNSUPPORTED_COMPRESSION,
OPC_ERROR_DEFLATE,
OPC_ERROR_HEADER,
OPC_ERROR_MEMORY,
OPC_ERROR_XML,
OPC_ERROR_USER // user triggered an abort
} opc_error_t;
/**
Compression options for OPC streams.
*/
typedef enum OPC_COMPRESSIONOPTION_ENUM {
OPC_COMPRESSIONOPTION_NONE,
OPC_COMPRESSIONOPTION_NORMAL,
OPC_COMPRESSIONOPTION_MAXIMUM,
OPC_COMPRESSIONOPTION_FAST,
OPC_COMPRESSIONOPTION_SUPERFAST
} opcCompressionOption_t;
/**
Helper for debug logs.
\hideinitializer
*/
#define opc_logf printf
/**
Abstraction for memset(m, 0, s).
\hideinitializer
*/
#define opc_bzero_mem(m,s) memset(m, 0, s)
#ifdef __cplusplus
} /* extern "C" */
#endif
#endif /* OPC_CONFIG_H */

300
include/opc/container.h Normal file
View File

@@ -0,0 +1,300 @@
/*
Copyright (c) 2010, Florian Reuter
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Florian Reuter nor the names of its contributors
may be used to endorse or promote products derived from this
software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
OF THE POSSIBILITY OF SUCH DAMAGE.
*/
/** @file opc/container.h
The container.h module has the fundamental methods for dealing with ZIP-based OPC container.
OPC container can be opened in READ-ONLY mode, WRITE-ONLY mode, READ/WRITE mode, TEMPLATE mode and TRANSITION mode.
The most notable mode is the READ/WRITE mode, which gives you concurrent stream-based READ and WRITE access to a
single ZIP-based OPC container. This is achieved without the use of temporary files by taking advantage of the
OPC specific “interleave” mode. \see http://standards.iso.org/ittf/PubliclyAvailableStandards/c051459_ISOIEC_29500-2_2008(E).zip
The TEMPLATE mode allows very fast customized "cloning" of ZIP-based OPC container by using "RAW access" to the ZIP streams.
The TRANSITION mode is a special version of the TEMPLATE mode, which allows transition-based READ/WRITE access to the
ZIP-based OPC container using a temporary file.
*/
#include <opc/config.h>
#include <opc/file.h>
#ifndef OPC_CONTAINER_H
#define OPC_CONTAINER_H
#ifdef __cplusplus
extern "C" {
#endif
/**
Handle to an OPC container created by \ref opcContainerOpen.
\see opcContainerOpen.
*/
typedef struct OPC_CONTAINER_STRUCT opcContainer;
/**
Modes for opcContainerOpen();
\see opcContainerOpen
*/
typedef enum {
/**
Opens the OPC container denoted by \a fileName in READ-ONLY mode. The \a destName parameter must be \a NULL.
\hideinitializer
*/
OPC_OPEN_READ_ONLY=0,
/**
Opens the OPC container denoted by \a fileName in WRITE-ONLY mode. The \a destName parameter must be \a NULL.
\hideinitializer
*/
OPC_OPEN_WRITE_ONLY=1,
/**
Opens the OPC container denoted by \a fileName in READ/WRITE mode. The \a destName parameter must be \a NULL.
\hideinitializer
*/
OPC_OPEN_READ_WRITE=2,
/**
This mode will open the container denoted by \a fileName in READ-ONLY mode and the container denoted by
\a destName in write-only mode. Any modifications will be written to the container denoted by \a destName
and the unmodified streams from \a fileName will be written to \a destName on closing.
\warning Currently not implemented.
\hideinitializer
*/
OPC_OPEN_TEMPLATE=3,
/**
Like the OPC_OPEN_TEMPLATE mode, but the \a destName will be renamed to the \a fileName on closing. If \a destName
is \a NULL, then the name of the temporary file will be generated automatically.
\warning Currently not implemented.
\hideinitializer
*/
OPC_OPEN_TRANSITION=4
} opcContainerOpenMode;
/** Modes for opcContainerClose.
\see opcContainerClose.
*/
typedef enum {
/**
Close the OPC container without any further postprocessing.
\hideinitializer
*/
OPC_CLOSE_NOW = 0,
/**
Close the OPC container and trim the file by removing unused fragments like e.g.
deleted parts.
\hideinitializer
*/
OPC_CLOSE_TRIM = 1,
/**
Close the OPC container like in \a OPC_CLOSE_TRIM mode, but additionally remove any
"interleaved" parts by reordering them.
\warning Currently not implemented. Same semantic as OPC_CLOSE_TRIM.
\hideinitializer
*/
OPC_CLOSE_DEFRAG = 2
} opcContainerCloseMode;
/**
Opens a ZIP-based OPC container.
@param[in] fileName. For more details see \ref opcContainerOpenMode.
@param[in] mode. For more details see \ref opcContainerOpenMode.
@param[in] userContext. Will not be modified by libopc. Can be used to e.g. store the "this" pointer for C++ bindings.
@param[in] destName. For more details see \ref opcContainerOpenMode.
@return \a NULL if failed.
\see opcContainerOpenMode
\see opcContainerDump
*/
opcContainer* opcContainerOpen(const xmlChar *fileName,
opcContainerOpenMode mode,
void *userContext,
const xmlChar *destName);
/**
Opens a ZIP-based OPC container from memory.
@param[in] data.
@param[in] data_len.
@param[in] userContext. Will not be modified by libopc. Can be used to e.g. store the "this" pointer for C++ bindings.
@param[in] mode. For more details see \ref opcContainerOpenMode.
@return \a NULL if failed.
*/
opcContainer* opcContainerOpenMem(const opc_uint8_t *data, opc_uint32_t data_len,
opcContainerOpenMode mode,
void *userContext);
/**
Opens a ZIP-based OPC container from memory.
@param[in] ioread.
@param[in] iowrite.
@param[in] ioclose.
@param[in] ioseek.
@param[in] iotrim.
@param[in] ioflush.
@param[in] iocontext.
@param[in] file_size.
@param[in] userContext. Will not be modified by libopc. Can be used to e.g. store the "this" pointer for C++ bindings.
@param[in] mode. For more details see \ref opcContainerOpenMode.
@return \a NULL if failed.
*/
opcContainer* opcContainerOpenIO(opcFileReadCallback *ioread,
opcFileWriteCallback *iowrite,
opcFileCloseCallback *ioclose,
opcFileSeekCallback *ioseek,
opcFileTrimCallback *iotrim,
opcFileFlushCallback *ioflush,
void *iocontext,
pofs_t file_size,
opcContainerOpenMode mode,
void *userContext);
/**
Close an OPC container.
@param[in] c. \ref opcContainer openered by \ref opcContainerOpen.
@param[in] mode. For more information see \ref opcContainerCloseMode.
@return Non-zero if successful.
\see opcContainerOpen
\see opcContainerCloseMode
*/
opc_error_t opcContainerClose(opcContainer *c, opcContainerCloseMode mode);
/**
Returns the unmodified user context passed to \ref opcContainerOpen.
\see opcContainerOpen
*/
void *opcContainerGetUserContext(opcContainer *c);
/**
List all types, relations and parts of the container \a c to \a out.
\par Sample:
\include opc_dump.c
*/
opc_error_t opcContainerDump(opcContainer *c, FILE *out);
/**
Exports the OPC container to "Flat OPC" (http://blogs.msdn.com/b/ericwhite/archive/2008/09/29/the-flat-opc-format.aspx).
The flat versions of an OPC file are very important when dealing with e.g XSL(T)-based or Javascript-based transformations.
\see opcContainerFlatImport.
\todo Implementation needed.
*/
int opcContainerFlatExport(opcContainer *c, const xmlChar *fileName);
/**
Imports the flat version of an OPC container.
\see opcContainerFlatExport.
\todo Implementation needed.
*/
int opcContainerFlatImport(opcContainer *c, const xmlChar *fileName);
/**
Iterate all types.
\code
for(xmlChar *type=opcContentTypeFirst(c);
NULL!=type;
type=opcContentTypeNext(c, type)) {
printf("%s\n", type);
}
\endcode
*/
const xmlChar *opcContentTypeFirst(opcContainer *container);
/**
\see opcContentTypeNext()
*/
const xmlChar *opcContentTypeNext(opcContainer *container, const xmlChar *type);
/**
Iterate extensions.
\code
for(const xmlChar *ext=opcExtensionFirst(c);
NULL!=ext;
ext=opcExtensionNext(ext)) {
printf("%s\n", ext);
}
\endcode
*/
const xmlChar *opcExtensionFirst(opcContainer *container);
/**
\see opcExtensionFirst()
*/
const xmlChar *opcExtensionNext(opcContainer *container, const xmlChar *ext);
/**
Get registered type for extension.
\see opcExtensionRegister()
*/
const xmlChar *opcExtensionGetType(opcContainer *container, const xmlChar *ext);
/**
Register a mime-type and and extension.
\see opcExtensionGetType()
*/
const xmlChar *opcExtensionRegister(opcContainer *container, const xmlChar *ext, const xmlChar *type);
/**
Iterator through all relation types of the container:
\code
for(xmlChar *type=opcRelationTypeFirst(c);
NULL!=type;
type=opcRelationTypeNext(c, type)) {
printf("%s\n", type);
}
\endcode
*/
const xmlChar *opcRelationTypeFirst(opcContainer *container);
/**
\see opcRelationTypeFirst()
*/
const xmlChar *opcRelationTypeNext(opcContainer *container, const xmlChar *type);
/**
Iterator through all relation types of the container:
\code
for(xmlChar *target=opcExternalTargetFirst(c);
NULL!=target;
type=opcExternalTargetNext(c, target)) {
printf("%s\n", target);
}
\endcode
*/
const xmlChar *opcExternalTargetFirst(opcContainer *container);
/**
\see opcExternalTargetFirst()
*/
const xmlChar *opcExternalTargetNext(opcContainer *container, const xmlChar *target);
#ifdef __cplusplus
} /* extern "C" */
#endif
#endif /* OPC_CONTAINER_H */

200
include/opc/file.h Normal file
View File

@@ -0,0 +1,200 @@
/*
Copyright (c) 2010, Florian Reuter
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Florian Reuter nor the names of its contributors
may be used to endorse or promote products derived from this
software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
OF THE POSSIBILITY OF SUCH DAMAGE.
*/
/** @file opc/file.h
The opc module contains the file library functions.
*/
#include <opc/config.h>
#ifndef OPC_FILE_H
#define OPC_FILE_H
#ifdef __cplusplus
extern "C" {
#endif
/**
Flag for READ access.
\hideinitializer
*/
#define OPC_FILE_READ (1<<0)
/**
Flag for WRITE access.
\hideinitializer
*/
#define OPC_FILE_WRITE (1<<1)
/**
Flag indicates that file will be truncated when opened.
\hideinitializer
*/
#define OPC_FILE_TRUNC (1<<2)
/**
Abstraction for see modes.
*/
typedef enum OPC_FILESEEKMODE_ENUM {
opcFileSeekSet = SEEK_SET,
opcFileSeekCur = SEEK_CUR,
opcFileSeekEnd = SEEK_END
} opcFileSeekMode;
/**
Callback to read a file. E.g. for a FILE * context this can be implemented as
\code
static int opcFileRead(void *iocontext, char *buffer, int len) {
return fread(buffer, sizeof(char), len, (FILE*)iocontext);
}
\endcode
*/
typedef int opcFileReadCallback(void *iocontext, char *buffer, int len);
/**
Callback to write a file. E.g. for a FILE * context this can be implemented as
\code
static int opcFileWrite(void *iocontext, const char *buffer, int len) {
return fwrite(buffer, sizeof(char), len, (FILE*)iocontext);
}
\endcode
*/
typedef int opcFileWriteCallback(void *iocontext, const char *buffer, int len);
/**
Callback to close a file. E.g. for a FILE * context this can be implemented as
\code
static int opcFileClose(void *iocontext) {
return fclose((FILE*)iocontext);
}
\endcode
*/
typedef int opcFileCloseCallback(void *iocontext);
/**
Callback to seek a file. E.g. for a FILE * context this can be implemented as
\code
static opc_ofs_t opcFileSeek(void *iocontext, opc_ofs_t ofs) {
int ret=fseek((FILE*)iocontext, ofs, SEEK_SET);
if (ret>=0) {
return ftell((FILE*)iocontext);
} else {
return ret;
}
}
\endcode
*/
typedef opc_ofs_t opcFileSeekCallback(void *iocontext, opc_ofs_t ofs);
/**
Callback to trim a file. E.g. for a FILE * context this can be implemented as
\code
static int opcFileTrim(void *iocontext, opc_ofs_t new_size) {
#ifdef WIN32
return _chsize(fileno((FILE*)iocontext), new_size);
#else
return ftruncate(fileno((FILE*)iocontext), new_size);
#endif
}
\endcode
*/
typedef int opcFileTrimCallback(void *iocontext, opc_ofs_t new_size);
/**
Callback to flush a file. E.g. for a FILE * context this can be implemented as
\code
static int opcFileFlush(void *iocontext) {
return fflush((FILE*)iocontext);
}
\endcode
*/
typedef int opcFileFlushCallback(void *iocontext);
/**
Represents a state of a file, i.e. file position (buf_pos) and error status (err).
*/
typedef struct OPC_FILERAWSTATE_STRUCT {
opc_error_t err;
opc_ofs_t buf_pos; // current pos in file
} opcFileRawState;
/**
File IO context.
*/
typedef struct OPC_IO_STRUCT {
opcFileReadCallback *_ioread;
opcFileWriteCallback *_iowrite;
opcFileCloseCallback *_ioclose;
opcFileSeekCallback *_ioseek;
opcFileTrimCallback *_iotrim;
opcFileFlushCallback *_ioflush;
void *iocontext;
int flags;
opcFileRawState state;
opc_ofs_t file_size;
} opcIO_t;
/**
Initialize an IO context.
*/
opc_error_t opcFileInitIO(opcIO_t *io,
opcFileReadCallback *ioread,
opcFileWriteCallback *iowrite,
opcFileCloseCallback *ioclose,
opcFileSeekCallback *ioseek,
opcFileTrimCallback *iotrim,
opcFileFlushCallback *ioflush,
void *iocontext,
pofs_t file_size,
int flags);
/**
Initialize an IO context for a file.
*/
opc_error_t opcFileInitIOFile(opcIO_t *io, const xmlChar *filename, int flags);
/**
Initialize an IO for memory.
\warning Currently supports READ-ONLY file access.
*/
opc_error_t opcFileInitIOMemory(opcIO_t *io, const opc_uint8_t *data, opc_uint32_t data_len, int flags);
/**
Cleanup an IO context, i.e. release all system resources.
*/
opc_error_t opcFileCleanupIO(opcIO_t *io);
#ifdef __cplusplus
} /* extern "C" */
#endif
#endif /* OPC_FILE_H */

60
include/opc/helper.h Normal file
View File

@@ -0,0 +1,60 @@
/*
Copyright (c) 2010, Florian Reuter
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Florian Reuter nor the names of its contributors
may be used to endorse or promote products derived from this
software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
OF THE POSSIBILITY OF SUCH DAMAGE.
*/
/** @file opc/helper.h
Contains helper functions for the opc module.
*/
#include <opc/config.h>
#ifndef OPC_HELPER_H
#define OPC_HELPER_H
#ifdef __cplusplus
extern "C" {
#endif
#ifdef __cplusplus
} /* extern "C" */
#endif
/**
Constructs a segment name.
*/
opc_uint16_t opcHelperAssembleSegmentName(char *out, opc_uint16_t out_size, const xmlChar *name, opc_uint32_t segment_number, opc_uint32_t next_segment_id, opc_bool_t rels_segment, opc_uint16_t *out_max);
/**
Splits a filename into the segment informations.
*/
opc_error_t opcHelperSplitFilename(opc_uint8_t *filename, opc_uint32_t filename_length, opc_uint32_t *segment_number, opc_bool_t *last_segment, opc_bool_t *rel_segment);
#endif /* OPC_HELPER_H */

74
include/opc/inputstream.h Normal file
View File

@@ -0,0 +1,74 @@
/*
Copyright (c) 2010, Florian Reuter
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Florian Reuter nor the names of its contributors
may be used to endorse or promote products derived from this
software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
OF THE POSSIBILITY OF SUCH DAMAGE.
*/
/** @file opc/inputstream.h
*/
#include <opc/config.h>
#ifndef OPC_INPUTSTREAM_H
#define OPC_INPUTSTREAM_H
#ifdef __cplusplus
extern "C" {
#endif
/**
Internal type which represents a binary input stream.
*/
typedef struct OPC_CONTAINER_INPUTSTREAM_STRUCT opcContainerInputStream;
/**
Opens the part \c name of the \c container for reading.
*/
opcContainerInputStream* opcContainerOpenInputStream(opcContainer *container, const xmlChar *name);
/**
Reads maximal \c buffer_len bytes from the input \c stream to \c buffer.
\return The number of byes read or "0" in case of an error or end-of-stream.
*/
opc_uint32_t opcContainerReadInputStream(opcContainerInputStream* stream, opc_uint8_t *buffer, opc_uint32_t buffer_len);
/**
Closes the input stream and releases all system resources.
*/
opc_error_t opcContainerCloseInputStream(opcContainerInputStream* stream);
/**
Returns the type of compression used for the stream.
*/
opcCompressionOption_t opcContainerGetInputStreamCompressionOption(opcContainerInputStream* stream);
#ifdef __cplusplus
} /* extern "C" */
#endif
#endif /* OPC_INPUTSTREAM_H */

73
include/opc/opc.h Normal file
View File

@@ -0,0 +1,73 @@
/*
Copyright (c) 2010, Florian Reuter
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Florian Reuter nor the names of its contributors
may be used to endorse or promote products derived from this
software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
OF THE POSSIBILITY OF SUCH DAMAGE.
*/
/** @file opc/opc.h
The opc module contains the basic library functions.
*/
#include <opc/config.h>
#include <opc/container.h>
#include <opc/part.h>
#include <opc/relation.h>
#include <opc/inputstream.h>
#include <opc/outputstream.h>
#include <opc/zip.h>
#include <opc/xmlreader.h>
#include <opc/xmlwriter.h>
#include <opc/properties.h>
#ifndef OPC_OPC_H
#define OPC_OPC_H
#ifdef __cplusplus
extern "C" {
#endif
/**
* Initialize libopc.
* Sample:
* \include opc_helloworld.c
* @return Non-zero if successful.
*/
opc_error_t opcInitLibrary();
/**
* Free libopc. Clean up all resources.
* @return Non-zero if successful.
* \see opcInitLibrary.
*/
opc_error_t opcFreeLibrary();
#ifdef __cplusplus
} /* extern "C" */
#endif
#endif /* OPC_OPC_H */

View File

@@ -0,0 +1,71 @@
/*
Copyright (c) 2010, Florian Reuter
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Florian Reuter nor the names of its contributors
may be used to endorse or promote products derived from this
software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
OF THE POSSIBILITY OF SUCH DAMAGE.
*/
/** @file opc/outputstream.h
*/
#include <opc/config.h>
#ifndef OPC_OUTPUTSTREAM_H
#define OPC_OUTPUTSTREAM_H
#ifdef __cplusplus
extern "C" {
#endif
/**
Internal type which represents a binary output stream.
*/
typedef struct OPC_CONTAINER_OUTPUTSTREAM_STRUCT opcContainerOutputStream;
/**
Open the part \c name or writing in \c container with compression \c compression_option.
\note Make sure the part exists!
\see opcPartCreate.
*/
opcContainerOutputStream* opcContainerCreateOutputStream(opcContainer *container, const xmlChar *name, opcCompressionOption_t compression_option);
/**
Write \c buffer_len bytes from \c buffer to \c stream.
\return Returns the number of bytes written.
*/
opc_uint32_t opcContainerWriteOutputStream(opcContainerOutputStream* stream, const opc_uint8_t *buffer, opc_uint32_t buffer_len);
/**
Close the \c stream and free all associated resources.
*/
opc_error_t opcContainerCloseOutputStream(opcContainerOutputStream* stream);
#ifdef __cplusplus
} /* extern "C" */
#endif
#endif /* OPC_OUTPUTSTREAM_H */

118
include/opc/part.h Normal file
View File

@@ -0,0 +1,118 @@
/*
Copyright (c) 2010, Florian Reuter
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Florian Reuter nor the names of its contributors
may be used to endorse or promote products derived from this
software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
OF THE POSSIBILITY OF SUCH DAMAGE.
*/
/** @file opc/part.h
*/
#include <opc/config.h>
#ifndef OPC_PART_H
#define OPC_PART_H
#ifdef __cplusplus
extern "C" {
#endif
/**
Handle to an OPC part created by \ref opcPartOpen.
\see opcPartOpen.
*/
typedef xmlChar* opcPart;
/**
Represents an invalid (resp. NULL) part.
In releations OPC_PART_INVALID also represents the root part.
\hideinitializer
*/
#define OPC_PART_INVALID NULL
/**
Find a part in a \ container by \c absolutePath and/or \c type.
Currently no flags are supported.
*/
opcPart opcPartFind(opcContainer *container,
const xmlChar *absolutePath,
const xmlChar *type,
int flags);
/**
Creates a part in a \ container with \c absolutePath and \c type.
Currently no flags are supported.
*/
opcPart opcPartCreate(opcContainer *container,
const xmlChar *absolutePath,
const xmlChar *type,
int flags);
/**
Returns the type of the container.
The string is interned and must not be freed.
*/
const xmlChar *opcPartGetType(opcContainer *c, opcPart part);
/**
Returns the type of the container.
If \c override_only then the return value will be NULL for parts not having an override type.
The string is interned and must not be freed.
*/
const xmlChar *opcPartGetTypeEx(opcContainer *c, opcPart part, opc_bool_t override_only);
/**
Deleted that part \c absolutePath in the \c container.
*/
opc_error_t opcPartDelete(opcContainer *container, const xmlChar *absolutePath);
/**
Get the first part.
\code
for(opcPart part=opcPartGetFirst(c);OPC_PART_INVALID!=part;part=opcPartGetNext(c, part)) {
printf("%s; \n", part, opcPartGetType(c, part));
}
\endcode
*/
opcPart opcPartGetFirst(opcContainer *container);
/**
Get the next part.
\see opcPartGetFirst
*/
opcPart opcPartGetNext(opcContainer *container, opcPart part);
/**
Returns the size in bytes of the \c part.
*/
opc_ofs_t opcPartGetSize(opcContainer *c, opcPart part);
#ifdef __cplusplus
} /* extern "C" */
#endif
#endif /* OPC_PART_H */

121
include/opc/properties.h Executable file
View File

@@ -0,0 +1,121 @@
/*
Copyright (c) 2010, Florian Reuter
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Florian Reuter nor the names of its contributors
may be used to endorse or promote products derived from this
software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
OF THE POSSIBILITY OF SUCH DAMAGE.
*/
/** @file opc/properties.h
*/
#include <opc/config.h>
#include <opc/container.h>
#ifndef OPC_PROPERTIES_H
#define OPC_PROPERTIES_H
#ifdef __cplusplus
extern "C" {
#endif
/**
Represents a simple Dublin Core type.
*/
typedef struct OPC_DC_SIMPLE_TYPE {
xmlChar *str;
xmlChar *lang;
} opcDCSimpleType_t;
/**
Represents the core properties of an OPC container.
*/
typedef struct OPC_PROPERTIES_STRUCT {
xmlChar *category; /* xsd:string */
xmlChar *contentStatus; /* xsd:string */
xmlChar *created; /* dc:date */
opcDCSimpleType_t creator; /* dc:any */
opcDCSimpleType_t description; /* dc:any */
opcDCSimpleType_t identifier; /* dc:any */
opcDCSimpleType_t *keyword_array; /* cp:CT_Keywords */
opc_uint32_t keyword_items;
opcDCSimpleType_t language; /* dc:any */
xmlChar *lastModifiedBy; /* xsd:string */
xmlChar *lastPrinted; /* xsd:dateTime */
xmlChar *modified; /* dc:date */
xmlChar *revision; /* xsd:string */
opcDCSimpleType_t subject; /* dc:any */
opcDCSimpleType_t title; /* dc:any */
xmlChar *version; /* xsd:string */
} opcProperties_t;
/**
Initialize the core properties \c cp.
\see opcCorePropertiesSetString
*/
opc_error_t opcCorePropertiesInit(opcProperties_t *cp);
/**
Cleanup the core properties \c cp, i.e. release all resources.
\see opcCorePropertiesSetString
*/
opc_error_t opcCorePropertiesCleanup(opcProperties_t *cp);
/**
Rease the core properties \c cp from the container \c.
*/
opc_error_t opcCorePropertiesRead(opcProperties_t *cp, opcContainer *c);
/**
Write/Update the core properties \c cp in the container \c.
*/
opc_error_t opcCorePropertiesWrite(opcProperties_t *cp, opcContainer *c);
/**
Update a string in the core properties the right way.
\code
opcProperties_t cp;
opcCorePropertiesInit(&cp);
opcCorePropertiesSetString(&cp.revision, "1");
opcCorePropertiesSetStringLang(&cp.creator, "Florian Reuter", NULL);
opcCorePropertiesCleanup(&cp);
\endcode
*/
opc_error_t opcCorePropertiesSetString(xmlChar **prop, const xmlChar *str);
/**
Update a core properties the right way.
\see opcCorePropertiesSetString
*/
opc_error_t opcCorePropertiesSetStringLang(opcDCSimpleType_t *prop, const xmlChar *str, const xmlChar *lang);
#ifdef __cplusplus
} /* extern "C" */
#endif
#endif /* OPC_PROPERTIES_H */

140
include/opc/relation.h Normal file
View File

@@ -0,0 +1,140 @@
/*
Copyright (c) 2010, Florian Reuter
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Florian Reuter nor the names of its contributors
may be used to endorse or promote products derived from this
software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
OF THE POSSIBILITY OF SUCH DAMAGE.
*/
/** @file opc/relation.h
*/
#include <opc/config.h>
#ifndef OPC_RELATION_H
#define OPC_RELATION_H
#ifdef __cplusplus
extern "C" {
#endif
/**
Indentifier for an OPC relation.
*/
typedef opc_uint32_t opcRelation;
/**
Constant which represents an invalid relation.
*/
#define OPC_RELATION_INVALID (-1)
/**
Find a relation originating from \c part in \c container with \c relationId and/or \c mimeType.
If \c part is OPC_PART_INVALID then part represents the root part.
@param[in] relationId The relationId (e.g. "rId1") or NULL.
@param[in] mimeType The mimeType or NULL.
*/
opcRelation opcRelationFind(opcContainer *container, opcPart part, const xmlChar *relationId, const xmlChar *mimeType);
/**
Deleted the relation from the container.
\see opcRelationFind.
*/
opc_error_t opcRelationDelete(opcContainer *container, opcPart part, const xmlChar *relationId, const xmlChar *mimeType);
/**
Returns the first relation.
The following code will dump all relations:
\code
for(opcPart part=opcPartGetFirst(c);OPC_PART_INVALID!=part;part=opcPartGetNext(c, part)) {
for(opcRelation rel=opcRelationFirst(part, c);
OPC_PART_INVALID!=rel;
rel=opcRelationNext(c, rel)) {
opcPart internal_target=opcRelationGetInternalTarget(c, part, rel);
const xmlChar *external_target=opcRelationGetExternalTarget(c, part, rel);
const xmlChar *target=(NULL!=internal_target?internal_target:external_target);
const xmlChar *prefix=NULL;
opc_uint32_t counter=-1;
const xmlChar *type=NULL;
opcRelationGetInformation(c, part, rel, &prefix, &counter, &type);
if (-1==counter) { // no counter after prefix
printf("%s;%s;%s;%s\n", part, prefix, target, type);
} else {
printf("%s;%s%i;%s;%s\n", part, prefix, counter, target, type);
}
}
}
\endcode
*/
opcRelation opcRelationFirst(opcContainer *container, opcPart part);
/**
\see opcRelationFirst
*/
opcRelation opcRelationNext(opcContainer *container, opcPart part, opcRelation relation);
/**
Returns the internal target.
\note To test for an external target use opcRelationGetExternalTarget.
\see opcRelationGetExternalTarget
*/
opcPart opcRelationGetInternalTarget(opcContainer *container, opcPart part, opcRelation relation);
/**
Returns the external target or NULL if it is an internal target.
The string is interned. Must not be freed.
\see opcRelationGetExternalTarget
*/
const xmlChar *opcRelationGetExternalTarget(opcContainer *container, opcPart part, opcRelation relation);
/**
Returns the relations type.
The string is interned. Must not be freed.
*/
const xmlChar *opcRelationGetType(opcContainer *container, opcPart part, opcRelation relation);
/**
Get information about a relation.
\see opcRelationFirst
*/
void opcRelationGetInformation(opcContainer *container, opcPart part, opcRelation relation, const xmlChar **prefix, opc_uint32_t *counter, const xmlChar **type);
/**
Add a relation to \c container from \c src part to \c dest part with id \c rid and type \c type.
*/
opc_uint32_t opcRelationAdd(opcContainer *container, opcPart src, const xmlChar *rid, opcPart dest, const xmlChar *type);
/**
Add an external relation to \c container from \c src part to \c target URL with id \c rid and type \c type.
*/
opc_uint32_t opcRelationAddExternal(opcContainer *container, opcPart src, const xmlChar *rid, const xmlChar *target, const xmlChar *type);
#ifdef __cplusplus
} /* extern "C" */
#endif
#endif /* OPC_RELATION_H */

69
include/opc/xmlreader.h Normal file
View File

@@ -0,0 +1,69 @@
/*
Copyright (c) 2010, Florian Reuter
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Florian Reuter nor the names of its contributors
may be used to endorse or promote products derived from this
software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
OF THE POSSIBILITY OF SUCH DAMAGE.
*/
/** @file opc/xmlreader.h
*/
#ifndef OPC_XMLREADER_H
#define OPC_XMLREADER_H
#include <opc/config.h>
#include <libxml/xmlreader.h>
#include <mce/textreader.h>
#ifdef __cplusplus
extern "C" {
#endif
/**
Open an MCE reader for \c partName. Parameters \c URL, \c encoding and \c options will be passed unmodified to
http://xmlsoft.org/html/libxml-xmlreader.html#xmlReaderForIO and they can we NULL, NULL, 0.
\note Make sure the part exists.
\see opcPartFind
*/
opc_error_t opcXmlReaderOpen(opcContainer *container, mceTextReader_t *mceTextReader, const xmlChar *partName, const char * URL, const char * encoding, int options);
/**
Returns an libxml DOM document. Parameters \c URL, \c encoding and \c options will be passed unmodified to
http://xmlsoft.org/html/libxml-parser.html#xmlReadIO and they can we NULL, NULL, 0.
\note Make sure the part exists.
\see opcPartFind
*/
xmlDocPtr opcXmlReaderReadDoc(opcContainer *container, const xmlChar *partName, const char * URL, const char * encoding, int options);
#ifdef __cplusplus
} /* extern "C" */
#endif
#endif /* OPC_XMLREADER_H */

57
include/opc/xmlwriter.h Normal file
View File

@@ -0,0 +1,57 @@
/*
Copyright (c) 2010, Florian Reuter
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Florian Reuter nor the names of its contributors
may be used to endorse or promote products derived from this
software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
OF THE POSSIBILITY OF SUCH DAMAGE.
*/
/** @file opc/xmlwriter.h
*/
#include <opc/config.h>
#include <mce/textwriter.h>
#ifndef OPC_XMLWRITER_H
#define OPC_XMLWRITER_H
#ifdef __cplusplus
extern "C" {
#endif
/**
Create an MCE text writer for \c part in \c container with compression \c compression_option.
\note Make sure the part exists.
\see opcPartFind
*/
mceTextWriter *mceTextWriterOpen(opcContainer *c, opcPart part, opcCompressionOption_t compression_option);
#ifdef __cplusplus
} /* extern "C" */
#endif
#endif /* OPC_XMLWRITER_H */

255
include/opc/zip.h Normal file
View File

@@ -0,0 +1,255 @@
/*
Copyright (c) 2010, Florian Reuter
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Florian Reuter nor the names of its contributors
may be used to endorse or promote products derived from this
software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
OF THE POSSIBILITY OF SUCH DAMAGE.
*/
/** @file opc/zip.h
The ZIP file backend of an OPC container.
*/
#include <opc/config.h>
#include <opc/file.h>
#include <opc/container.h>
#ifndef OPC_ZIP_H
#define OPC_ZIP_H
#ifdef __cplusplus
extern "C" {
#endif
/**
Default growth hint of an OPC stream.
*/
#define OPC_DEFAULT_GROWTH_HINT 512
/**
Handle to a ZIP archive.
\see internal.h
*/
typedef struct OPC_ZIP_STRUCT opcZip;
/**
Handle to a raw ZIP input stream.
\see internal.h
*/
typedef struct OPC_ZIPINPUTSTREAM_STRUCT opcZipInputStream;
/**
Handle to a raw ZIP output stream.
\see internal.h
*/
typedef struct OPC_ZIPOUTPUTSTREAM_STRUCT opcZipOutputStream;
/**
Holds all information of a ZIP segment.
*/
typedef struct OPC_ZIP_SEGMENT_INFO_STRUCT {
xmlChar name[OPC_MAX_PATH];
opc_uint32_t name_len;
opc_uint32_t segment_number;
opc_bool_t last_segment;
opc_bool_t rels_segment;
opc_uint32_t header_size;
opc_uint32_t min_header_size;
opc_uint32_t trailing_bytes;
opc_uint32_t compressed_size;
opc_uint32_t uncompressed_size;
opc_uint16_t bit_flag;
opc_uint32_t data_crc;
opc_uint16_t compression_method;
opc_ofs_t stream_ofs;
opc_uint16_t growth_hint;
} opcZipSegmentInfo_t;
/**
\see opcZipLoader
*/
typedef int opcZipLoaderOpenCallback(void *iocontext);
/**
\see opcZipLoader
*/
typedef int opcZipLoaderSkipCallback(void *iocontext);
/**
\see opcZipLoader
*/
typedef int opcZipLoaderReadCallback(void *iocontext, char *buffer, int len);
/**
\see opcZipLoader
*/
typedef int opcZipLoaderCloseCallback(void *iocontext);
/**
\see opcZipLoader
*/
typedef opc_error_t (opcZipLoaderSegmentCallback_t)(void *iocontext, void *userctx, opcZipSegmentInfo_t *info, opcZipLoaderOpenCallback *open, opcZipLoaderReadCallback *read, opcZipLoaderCloseCallback *close, opcZipLoaderSkipCallback *skip);
/**
Walks every segment in a ZIP archive and calls the \c segmentCallback callback method.
The implementer \c segmentCallback method must then eiher use the passed \c open, \c read and \c close methods
to read the stream or the passed \c skip methods to skip the stream.
This method can be used to e.g. read ZIP file in stream mode.
*/
opc_error_t opcZipLoader(opcIO_t *io, void *userctx, opcZipLoaderSegmentCallback_t *segmentCallback);
/**
\see opcZipClose
*/
typedef opc_error_t (opcZipSegmentReleaseCallback)(opcZip *zip, opc_uint32_t segment_id);
/**
Closes the ZIP archive \c zip and will call \c releaseCallback for every segment to give the implementer a chance
to free user resources.
*/
void opcZipClose(opcZip *zip, opcZipSegmentReleaseCallback* releaseCallback);
/**
Creates an empty ZIP archive with the given \c io.
*/
opcZip *opcZipCreate(opcIO_t *io);
/**
Commits all buffers and writes the ZIP archives local header directories.
if \c trim is true then padding bytes will be removed, i.e. the ZIP file size fill be minimalized.
*/
opc_error_t opcZipCommit(opcZip *zip, opc_bool_t trim);
/**
Garbage collection on the passed \c zip archive. This will e.g. make deleted files available as free space.
*/
opc_error_t opcZipGC(opcZip *zip);
/**
Load segment information into \c info.
If \c rels_segment is -1 then load the info for part with name \c partName.
Otherwise load the segment information for the ".rels." segment of \c partName.
\return Returns the segment_id.
*/
opc_uint32_t opcZipLoadSegment(opcZip *zip, const xmlChar *partName, opc_bool_t rels_segment, opcZipSegmentInfo_t *info);
/**
Create a segment with the given parameters.
\return Returns the segment_id.
*/
opc_uint32_t opcZipCreateSegment(opcZip *zip,
const xmlChar *partName,
opc_bool_t relsSegment,
opc_uint32_t segment_size,
opc_uint32_t growth_hint,
opc_uint16_t compression_method,
opc_uint16_t bit_flag);
/**
Creates an input stream for the segment with \c segment_id.
\see opcZipLoadSegment
\see opcZipCreateSegment
*/
opcZipInputStream *opcZipOpenInputStream(opcZip *zip, opc_uint32_t segment_id);
/**
Free all resources of the input stream.
*/
opc_error_t opcZipCloseInputStream(opcZip *zip, opcZipInputStream *stream);
/**
Read maximal \c buf_len bytes from the input stream into \buf.
\return Returns the number of bytes read.
*/
opc_uint32_t opcZipReadInputStream(opcZip *zip, opcZipInputStream *stream, opc_uint8_t *buf, opc_uint32_t buf_len);
/**
Creates an output stream for the segment with \c segment_id.
If \c *segment_id is -1 then a new segment will be created.
Otherwise the segment with \c *segment_id will be overwritten.
*/
opcZipOutputStream *opcZipCreateOutputStream(opcZip *zip,
opc_uint32_t *segment_id,
const xmlChar *partName,
opc_bool_t relsSegment,
opc_uint32_t segment_size,
opc_uint32_t growth_hint,
opc_uint16_t compression_method,
opc_uint16_t bit_flag);
/**
Opens an existing ouput stream for reading.
The \c *segment_id will be set to -1 and reset on opcZipCloseOutputStream.
\see opcZipCloseOutputStream
*/
opcZipOutputStream *opcZipOpenOutputStream(opcZip *zip, opc_uint32_t *segment_id);
/**
Will close the stream and free all resources. Additionally the new segment id will be stored in \c *segment_id.
\see opcZipOpenOutputStream
*/
opc_error_t opcZipCloseOutputStream(opcZip *zip, opcZipOutputStream *stream, opc_uint32_t *segment_id);
/**
Write \c buf_len bytes to \c buf.
\return Returns the number of bytes written.
*/
opc_uint32_t opcZipWriteOutputStream(opcZip *zip, opcZipOutputStream *stream, const opc_uint8_t *buf, opc_uint32_t buf_len);
/**
Returns the first segment id or -1.
Use the following code to iterarte through all segments.
\code
for(opc_uint32_t segment_id=opcZipGetFirstSegmentId(zip);
-1!=segment_id;
segment_id=opcZipGetNextSegmentId(zip, segment_id) {
...
}
\endcode
\see opcZipGetNextSegmentId
*/
opc_uint32_t opcZipGetFirstSegmentId(opcZip *zip);
/**
Returns the next segment id or -1.
\see opcZipGetFirstSegmentId
*/
opc_uint32_t opcZipGetNextSegmentId(opcZip *zip, opc_uint32_t segment_id);
/**
Returns info about the given segment id.
*/
opc_error_t opcZipGetSegmentInfo(opcZip *zip, opc_uint32_t segment_id, const xmlChar **name, opc_bool_t *rels_segment, opc_uint32_t *crc);
/**
Marks a given segments as deleted.
\see opcZipGC
*/
opc_bool_t opcZipSegmentDelete(opcZip *zip, opc_uint32_t *first_segment, opc_uint32_t *last_segment, opcZipSegmentReleaseCallback* releaseCallback);
#ifdef __cplusplus
} /* extern "C" */
#endif
#endif /* OPC_ZIP_H */

168
include/plib/plib.h Normal file
View File

@@ -0,0 +1,168 @@
/* include/plib/plib.h. Generated from plib.h by configure. */
/*
Copyright (c) 2010, Florian Reuter
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Florian Reuter nor the names of its contributors
may be used to endorse or promote products derived from this
software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef _PLIB_PLIB_H_
#define _PLIB_PLIB_H_
#ifdef __cplusplus
extern "C" {
#endif
#define HAVE_STDINT_H 1
#define HAVE_STDDEF_H 1
#define HAVE_STDIO_H 1
#define HAVE_STRING_H 1
#define HAVE_LIMITS_H 1
#define HAVE_STDLIB_H 1
/* #undef HAVE_IO_H */
#define HAVE_UNISTD_H 1
#define HAVE_SYS_TYPES_H 1
#define IS_CONFIGURED 1
#if !defined(IS_CONFIGURED)
#if defined(WIN32)
#define HAVE_STRING_H 1
#define HAVE_STDINT_H 1
#define HAVE_LIMITS_H 1
#define HAVE_STDDEF_H 1
#define HAVE_STDIO_H 1
#define HAVE_STDLIB_H 1
#define HAVE_IO_H
#define snprintf _snprintf
#else
#error "configure not executed and we are not on a win32 machine? please run configure or define WIN32 is you are on a WIN32 platform."
#endif
#endif
#ifdef HAVE_STDDEF_H
#include <stddef.h>
typedef size_t pofs_t; // maximum file offset for eg. read write ops
#else
#error "system types can not be determined"
#endif
#ifdef HAVE_STDIO_H
#include <stdio.h>
#else
#error "system io can not be determined"
#endif
#ifdef HAVE_STDINT_H
#include <stdint.h>
typedef int8_t pint8_t;
typedef uint8_t puint8_t;
typedef int16_t pint16_t;
typedef uint16_t puint16_t;
typedef int32_t pint32_t;
typedef uint32_t puint32_t;
typedef int64_t pint64_t;
typedef uint64_t puint64_t;
typedef int pbool_t;
typedef size_t psize_t;
// INTN_MAX, INTN_MIN, UINTN_MAX
#else
#error "system types can not be determined"
#endif
#ifdef HAVE_STRING_H
#include <string.h>
#endif
#ifdef HAVE_LIMITS_H
#include <limits.h>
#define PUINT8_MAX UCHAR_MAX
#define PINT32_MAX INT_MAX
#define PINT32_MIN INT_MIN
#define PUINT32_MAX UINT_MAX
#define PUINT32_MIN 0
#define PUINT16_MAX USHRT_MAX
#define PUINT16_MIN 0
#else
#error "limits can not be determined"
#endif
#ifdef HAVE_STDLIB_H
#include <stdlib.h>
#endif
#ifdef HAVE_IO_H
#include <io.h>
#endif
#ifdef HAVE_UNISTD_H
#include <unistd.h>
#endif
#ifdef HAVE_SYS_TYPES_H
#include <sys/types.h>
#endif
/**
Converts an ASCII string to a xmlChar string. This only works for ASCII strings.
*/
#ifndef _X
#define _X(s) BAD_CAST(s)
#endif
/**
Converts an xmlChar string to an ASCII string. This only works for ASCII charsets.
*/
#ifndef _X2C
#define _X2C(s) ((char*)(s))
#endif
#define PASSERT(e) assert(e)
#ifdef NDEBUG
#define PENSURE(e) (void)(e)
#else
#define PENSURE(e) assert(e)
#endif
#define PTRUE (0==0)
#define PFALSE (0==1)
#ifdef __cplusplus
} /* extern "C" */
#endif
#endif /* _PLIB_PLIB_H_ */

1
lib/bzip2-1.0.6 Submodule

Submodule lib/bzip2-1.0.6 added at 288acf97a1

1
lib/harfbuzz Submodule

Submodule lib/harfbuzz added at b28c282585

1
lib/leptonica Submodule

Submodule lib/leptonica added at cc03be70fd

1
lib/libmagic Submodule

Submodule lib/libmagic added at 1249b5cd02

BIN
lib/libopc/libmce.a Normal file

Binary file not shown.

BIN
lib/libopc/libopc.a Normal file

Binary file not shown.

BIN
lib/libopc/libplib.a Normal file

Binary file not shown.

1
lib/libpng Submodule

Submodule lib/libpng added at 301f7a1429

1
lib/libtiff Submodule

Submodule lib/libtiff added at 3db0ff91bc

1
lib/openjpeg Submodule

Submodule lib/openjpeg added at ac3737372a

1
lib/tesseract Submodule

Submodule lib/tesseract added at f268e6615e

View File

@@ -91,7 +91,7 @@ application/x-esrehber, es
application/x-excel, xla|xld|xlk|xlt|xlv
application/x-executable, exe
application/x-font-sfn,
application/x-font-ttf, ttf
application/x-font-ttf, ttf|ttc
application/x-freelance, pre
application/x-git,
application/x-gsp, gsp
@@ -252,8 +252,9 @@ text/html, acgi|htm|html|htmls|htx|shtml
text/javascript, js
text/mcf, mcf
text/pascal, pas
text/plain, com|cmd|conf|def|g|idc|list|lst|mar|sdml|text|txt|md|groovy|license|properties|desktop|ini|rst|cmake|ipynb|readme|less|lo|go|yml|d|cs|hpp|srt
text/plain, com|cmd|conf|def|g|idc|list|lst|mar|sdml|text|txt|md|groovy|license|properties|desktop|ini|rst|cmake|ipynb|readme|less|lo|go|yml|d|cs|hpp|srt|nfo|sfv|m3u|csv|eml
text/richtext, rt|rtf|rtx
text/rtf,
text/scriplet, wsc
text/x-awk, awk
!video/x-jng, jng
@@ -263,7 +264,7 @@ image/x-xwindowdump, xwd
!image/vnd.adobe.photoshop, psd
text/tab-separated-values, tsv
text/troff, man|me|ms|roff|t|tr
text/uri-list, uni|unis|uri|uris
text/uri-list, uji|unis|uri|uris
text/vnd.abc, abc
text/vnd.fmi.flexstor, flx
text/vnd.wap.wmlscript, wmls
@@ -319,7 +320,7 @@ video/x-dv, dif|dv
video/x-fli, fli
video/x-isvideo, isu
video/x-motion-jpeg, mjpg
video/x-ms-asf, asf|asx
video/x-ms-asf, asf|asx|wmv
video/x-qtc, qtc
video/x-sgi-movie, movie|mv
application/x-7z-compressed, 7z
@@ -356,4 +357,62 @@ text/x-vcard, vcf
application/x-innosetup,
application/winhelp, hlp
image/x-tga,
application/x-wine-extension-ini,
application/x-wine-extension-ini,
application/x-cbz, cbz
application/x-cbr, cbr
application/x-ms-compress-szdd, fon
application/x-atari-7800-rom, a78
application/x-nes-rom, nes
application/x-font-pfm, pfm
application/x-gettext-translation,
image/wmf,
application/pgp-keys,
image/x-3ds, 3ds
application/x-lz4, lz4
application/vnd.openxmlformats-officedocument.presentationml.presentation, pptx
application/vnd.oasis.opendocument.presentation, odp
application/x-msaccess, accdb
application/vnd.oasis.opendocument.spreadsheet, ods
audio/x-aiff, aiff|aif
text/x-ms-regedit, reg
application/x-gamecube-rom,
application/x-nintendo-ds-rom,
text/x-objective-c,
application/x-font-gdos,
application/x-apple-diskimage,
application/x-zstd, zst
video/x-m4v, m4v
message/news,
application/vnd.symbian.install,
application/x-lzh-compressed,
application/x-dosdriver,
application/vnd.tcpdump.pcap, pcap
x-epoc/x-sisx-app,
application/x-avira-qua,
video/MP2T,
application/x-snappy-framed,
application/x-lz4+json, jsonlz4
application/x-dmp, dmp
application/zlib, z
application/x-pgp-keyring,
application/x-gdbm,
application/x-font-pf2, pf2
application/x-zip,
application/x-coredump,
application/x-java-jmod, jmod
application/x-terminfo,
application/x-terminfo2,
application/x-arc,
application/vnd.lotus-1-2-3,
image/x-win-bitmap,
application/x-maxis-dbpf,
text/PGP,
audio/x-hx-aac-adts,
application/x-chrome-extension,
image/heic, heic
image/x-gem,
application/x-lzma, lzma
application/warc, warc
application/x-lz4, lz4
application/x-lzip, lz
application/x-lzop, lzo
1 application/arj arj
91 application/x-excel xla|xld|xlk|xlt|xlv
92 application/x-executable exe
93 application/x-font-sfn
94 application/x-font-ttf ttf ttf|ttc
95 application/x-freelance pre
96 application/x-git
97 application/x-gsp gsp
252 text/javascript js
253 text/mcf mcf
254 text/pascal pas
255 text/plain com|cmd|conf|def|g|idc|list|lst|mar|sdml|text|txt|md|groovy|license|properties|desktop|ini|rst|cmake|ipynb|readme|less|lo|go|yml|d|cs|hpp|srt com|cmd|conf|def|g|idc|list|lst|mar|sdml|text|txt|md|groovy|license|properties|desktop|ini|rst|cmake|ipynb|readme|less|lo|go|yml|d|cs|hpp|srt|nfo|sfv|m3u|csv|eml
256 text/richtext rt|rtf|rtx
257 text/rtf
258 text/scriplet wsc
259 text/x-awk awk
260 !video/x-jng jng
264 !image/vnd.adobe.photoshop psd
265 text/tab-separated-values tsv
266 text/troff man|me|ms|roff|t|tr
267 text/uri-list uni|unis|uri|uris uji|unis|uri|uris
268 text/vnd.abc abc
269 text/vnd.fmi.flexstor flx
270 text/vnd.wap.wmlscript wmls
320 video/x-fli fli
321 video/x-isvideo isu
322 video/x-motion-jpeg mjpg
323 video/x-ms-asf asf|asx asf|asx|wmv
324 video/x-qtc qtc
325 video/x-sgi-movie movie|mv
326 application/x-7z-compressed 7z
357 application/x-innosetup
358 application/winhelp hlp
359 image/x-tga
360 application/x-wine-extension-ini
361 application/x-cbz cbz
362 application/x-cbr cbr
363 application/x-ms-compress-szdd fon
364 application/x-atari-7800-rom a78
365 application/x-nes-rom nes
366 application/x-font-pfm pfm
367 application/x-gettext-translation
368 image/wmf
369 application/pgp-keys
370 image/x-3ds 3ds
371 application/x-lz4 lz4
372 application/vnd.openxmlformats-officedocument.presentationml.presentation pptx
373 application/vnd.oasis.opendocument.presentation odp
374 application/x-msaccess accdb
375 application/vnd.oasis.opendocument.spreadsheet ods
376 audio/x-aiff aiff|aif
377 text/x-ms-regedit reg
378 application/x-gamecube-rom
379 application/x-nintendo-ds-rom
380 text/x-objective-c
381 application/x-font-gdos
382 application/x-apple-diskimage
383 application/x-zstd zst
384 video/x-m4v m4v
385 message/news
386 application/vnd.symbian.install
387 application/x-lzh-compressed
388 application/x-dosdriver
389 application/vnd.tcpdump.pcap pcap
390 x-epoc/x-sisx-app
391 application/x-avira-qua
392 video/MP2T
393 application/x-snappy-framed
394 application/x-lz4+json jsonlz4
395 application/x-dmp dmp
396 application/zlib z
397 application/x-pgp-keyring
398 application/x-gdbm
399 application/x-font-pf2 pf2
400 application/x-zip
401 application/x-coredump
402 application/x-java-jmod jmod
403 application/x-terminfo
404 application/x-terminfo2
405 application/x-arc
406 application/vnd.lotus-1-2-3
407 image/x-win-bitmap
408 application/x-maxis-dbpf
409 text/PGP
410 audio/x-hx-aac-adts
411 application/x-chrome-extension
412 image/heic heic
413 image/x-gem
414 application/x-lzma lzma
415 application/warc warc
416 application/x-lz4 lz4
417 application/x-lzip lz
418 application/x-lzop lzo

View File

@@ -1,5 +1,9 @@
{
"properties": {
"_tie": {
"type": "keyword",
"doc_values": true
},
"path": {
"type": "text",
"analyzer": "path_analyzer",
@@ -7,25 +11,30 @@
},
"suggest-path": {
"type": "completion",
"analyzer": "keyword"
"analyzer": "case_insensitive_kw_analyzer"
},
"mime": {
"type": "keyword"
},
"videoc": {
"type": "keyword"
"type": "keyword",
"index": false
},
"audioc": {
"type": "keyword"
"type": "keyword",
"index": false
},
"duration": {
"type": "float"
"type": "float",
"index": false
},
"width": {
"type": "integer"
"type": "integer",
"index": false
},
"height": {
"type": "integer"
"type": "integer",
"index": false
},
"mtime": {
"type": "integer"
@@ -70,6 +79,23 @@
"analyzer": "my_nGram",
"type": "text"
},
"_keyword.*": {
"type": "keyword"
},
"_text.*": {
"analyzer": "content_analyzer",
"type": "text",
"fields": {
"nGram": {
"type": "text",
"analyzer": "my_nGram"
}
}
},
"_url": {
"type": "keyword",
"index": false
},
"content": {
"analyzer": "content_analyzer",
"type": "text",
@@ -80,6 +106,33 @@
"analyzer": "my_nGram"
}
}
},
"tag": {
"type": "keyword"
},
"exif_make": {
"type": "text"
},
"exif_model": {
"type": "text"
},
"exif:software": {
"type": "text"
},
"exif_exposure_time": {
"type": "keyword"
},
"exif_fnumber": {
"type": "keyword"
},
"exif_iso_speed_ratings": {
"type": "keyword"
},
"exif_focal_length": {
"type": "keyword"
},
"exif_user_comment": {
"type": "text"
}
}
}

10
schema/pipeline.json Normal file
View File

@@ -0,0 +1,10 @@
{
"description": "Copy _id to _tie",
"processors": [
{
"script": {
"source": "ctx._tie = ctx._id;"
}
}
]
}

View File

@@ -21,6 +21,12 @@
"lowercase"
]
},
"case_insensitive_kw_analyzer": {
"tokenizer": "keyword",
"filter": [
"lowercase"
]
},
"my_nGram": {
"tokenizer": "my_nGram_tokenizer",
"filter": [

117
scripting/README.md Normal file
View File

@@ -0,0 +1,117 @@
## User scripts
*This document is under construction, more in-depth guide coming soon*
During the `index` step, you can use the `--script-file <script>` option to
modify documents or add user tags. This option is mainly used to
implement automatic tagging based on file attributes.
The scripting language used
([Painless Scripting Language](https://www.elastic.co/guide/en/elasticsearch/painless/7.4/index.html))
is very similar to Java, but you should be able to create user scripts
without programming experience at all if you're somewhat familiar with
regex.
This is the base structure of the documents we're working with:
```json
{
"_id": "e171405c-fdb5-4feb-bb32-82637bc32084",
"_index": "sist2",
"_type": "_doc",
"_source": {
"index": "206b3050-e821-421a-891d-12fcf6c2db0d",
"mime": "application/json",
"size": 1799,
"mtime": 1545443685,
"extension": "md",
"name": "README",
"path": "sist2/scripting",
"content": "..."
}
}
```
**Example script**
This script checks if the `genre` attribute exists, if it does
it adds the `genre.<genre>` tag.
```Java
ArrayList tags = ctx._source.tag = new ArrayList();
if (ctx._source?.genre != null) {
tags.add("genre." + ctx._source.genre.toLowerCase())
}
```
You can use `.` to create a hierarchical tag tree:
![scripting/genre_example](genre_example.png)
To use regular expressions, you need to add this line in `/etc/elasticsearch/elasticsearch.yml`
```yaml
script.painless.regex.enabled: true
```
Or, if you're using docker add `-e "script.painless.regex.enabled=true"`
### Examples
If `(20XX)` is in the file name, add the `year.<year>` tag:
```Java
ArrayList tags = ctx._source.tag = new ArrayList();
Matcher m = /[\(\.+](20[0-9]{2})[\)\.+]/.matcher(ctx._source.name);
if (m.find()) {
tags.add("year." + m.group(1))
}
```
Use default *Calibre* folder structure to infer author.
```Java
ArrayList tags = ctx._source.tag = new ArrayList();
// We expect the book path to look like this:
// /path/to/Calibre Library/Author/Title/Title - Author.pdf
if (ctx._source.name.contains("-") && ctx._source.extension == "pdf") {
String[] names = ctx._source.name.splitOnToken('-');
tags.add("author." + names[1].strip());
}
```
If the file matches a specific pattern `AAAA-000 fName1 lName1, <fName2 lName2>...`, add the `actress.<actress>` and
`studio.<studio>` tag:
```Java
ArrayList tags = ctx._source.tag = new ArrayList();
Matcher m = /([A-Z]{4})-[0-9]{3} (.*)/.matcher(ctx._source.name);
if (m.find()) {
tags.add("studio." + m.group(1));
// Take the matched group (.*), and add a tag for
// each name, separated by comma
for (String name : m.group(2).splitOnToken(',')) {
tags.add("actress." + name);
}
}
```
Set the name of the last folder (`/path/to/<studio>/file.mp4`) to `studio.<studio>` tag
```Java
ArrayList tags = ctx._source.tag = new ArrayList();
if (ctx._source.path != "") {
String[] names = ctx._source.path.splitOnToken('/');
tags.add("studio." + names[names.length-1]);
}
```
Set the name of the last folder (`/path/to/<studio>/file.mp4`) to `studio.<studio>` tag
```Java
ArrayList tags = ctx._source.tag = new ArrayList();
if (ctx._source.path != "") {
String[] names = ctx._source.path.splitOnToken('/');
tags.add("studio." + names[names.length-1]);
}
```

BIN
scripting/genre_example.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

View File

@@ -1,14 +1,16 @@
#!/bin/bash
#!/usr/bin/env bash
rm -rf index.sist2/
rm web/js/bundle.js 2> /dev/null
cat `ls -v web/js/*.min.js` > web/js/bundle.js
cat `ls web/js/*.min.js` > web/js/bundle.js
cat web/js/{util,dom,search}.js >> web/js/bundle.js
rm web/css/bundle.css 2> /dev/null
rm web/css/bundle*.css 2> /dev/null
cat web/css/*.min.css > web/css/bundle.css
cat web/css/main.css >> web/css/bundle.css
cat web/css/light.css >> web/css/bundle.css
cat web/css/*.min.css > web/css/bundle_dark.css
cat web/css/dark.css >> web/css/bundle_dark.css
python3 scripts/mime.py > src/parsing/mime_generated.c
python3 scripts/serve_static.py > src/web/static_generated.c

View File

@@ -1,21 +1,40 @@
#!/bin/bash
#!/usr/bin/env bash
THREADS=$(nproc)
cd lib
cd mupdf
HAVE_X11=no HAVE_GLUT=no make -j 4
CFLAGS=-fPIC make USE_SYSTEM_HARFBUZZ=yes USE_SYSTEM_OPENJPEG=yes HAVE_X11=no HAVE_GLUT=no -j $THREADS
cd ..
mv mupdf/build/release/libmupdf.a .
mv mupdf/build/release/libmupdf-third.a .
# openjp2
cd openjpeg
cmake . -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_FLAGS="-O3 -DNDEBUG -fPIC"
make -j $THREADS
cd ..
mv openjpeg/bin/libopenjp2.a .
# harfbuzz
cd harfbuzz
./autogen.sh
CFLAGS=-fPIC ./configure --disable-shared --enable-static
make -j $THREADS
cd ..
mv harfbuzz/src/.libs/libharfbuzz.a .
# ffmpeg
cd ffmpeg
./configure --disable-shared --enable-static --disable-ffmpeg --disable-ffplay \
--disable-ffprobe --disable-doc\
--disable-manpages --disable-postproc --disable-avfilter \
--disable-alsa --disable-lzma --disable-xlib --disable-debug\
--disable-vdpau --disable-vaapi --disable-sdl2 --disable-network
make -j 4
--disable-vdpau --disable-vaapi --disable-sdl2 --disable-network\
--extra-cflags=-fPIC
make -j $THREADS
cd ..
mv ffmpeg/libavcodec/libavcodec.a .
@@ -32,8 +51,78 @@ cmake -DONION_USE_SSL=false -DONION_USE_PAM=false -DONION_USE_PNG=false -DONION_
-DONION_USE_JPEG=false -DONION_USE_XML2=false -DONION_USE_SYSTEMD=false -DONION_USE_SQLITE3=false \
-DONION_USE_REDIS=false -DONION_USE_GC=false -DONION_USE_TESTS=false -DONION_EXAMPLES=false \
-DONION_USE_BINDINGS_CPP=false ..
make -j 4
make -j $THREADS
cd ../..
mv onion/build/src/onion/libonion_static.a .
#bzip2
cd bzip2-1.0.6
make -j $THREADS
cd ..
mv bzip2-1.0.6/libbz2.a .
# magic
cd libmagic
./autogen.sh
./configure --enable-static --disable-shared
make -j $THREADS
cd ..
mv libmagic/src/.libs/libmagic.a .
# tesseract
cd tesseract
mkdir build
cd build
cmake -DSTATIC=on -DBUILD_TRAINING_TOOLS=off -DBUILD_TESTS=off -DCMAKE_BUILD_TYPE=Release \
-DCMAKE_CXX_FLAGS="-fPIC" -DAUTO_OPTIMIZE=off ..
make -j $THREADS
cd ../..
mv tesseract/build/libtesseract.a .
# leptonica
cd leptonica
./autogen.sh
CFLAGS="-fPIC" ./configure --without-zlib --without-jpeg --without-giflib \
--without-giflib --without-libwebp --without-libwebpmux --without-libopenjpeg \
--enable-static --disable-shared
make -j $THREADS
cd ..
mv leptonica/src/.libs/liblept.a .
# tiff
cd libtiff
./autogen.sh
CFLAGS="-fPIC" CXXFLAGS="-fPIC" CXX_FLAGS="-fPIC" ./configure --enable-static --disable-shared --disable-lzw --disable-jpeg --disable-webp \
--disable-lzma --disable-zstd --disable-jbig
make -j $THREADS
cd ..
mv libtiff/libtiff/.libs/libtiff.a .
# png
cd libpng
CFLAGS="-fPIC" ./configure --enable-static --disable-shared
make -j $THREADS
cd ..
mv libpng/.libs/libpng16.a .
# openssl...
git clone --depth 1 -b OpenSSL_1_1_0-stable https://github.com/openssl/openssl
cd openssl
./config --prefix=$(pwd)/../ssl
make depend
make -j $THREADS
make install
cd ..
mv ./openssl/libcrypto.a ./openssl/libssl.a .
# curl
wget -nc https://curl.haxx.se/download/curl-7.68.0.tar.gz
tar -xzf curl-7.68.0.tar.gz
cd curl-7.68.0
./configure --disable-ldap --disable-ldaps --without-librtmp --disable-rtsp --disable-crypto-auth \
--disable-smtp --without-libidn2 --without-nghttp2 --without-brotli --enable-static --disable-shared \
--without-libpsl --with-ssl=$(pwd)/../ssl
make -j $THREADS
cd ..
mv curl-7.68.0/lib/.libs/libcurl.a .

View File

@@ -1,6 +1,9 @@
import json
files = [
"schema/mappings.json",
"schema/settings.json",
"schema/pipeline.json",
]
@@ -9,6 +12,6 @@ def clean(filepath):
for file in files:
with open(file, "rb") as f:
data = f.read()
with open(file, "r") as f:
data = json.dumps(json.load(f), separators=(",", ":")).encode()
print("char %s[%d] = {%s};" % (clean(file), len(data), ",".join(str(int(b)) for b in data)))

View File

@@ -12,18 +12,20 @@ major_mime = {
"audio": 7,
"image": 8,
"text": 9,
"application": 10
"application": 10,
"x-epoc": 11,
}
pdf = (
"application/pdf",
"application/x-cbr",
"application/x-cbz",
"application/epub+zip",
"application/vnd.ms-xpsdocument",
)
font = (
"application/vnd.ms-opentype",
"application/x-ms-compress-szdd"
"application/x-font-sfn",
"application/x-font-ttf",
"font/otf",
@@ -32,6 +34,34 @@ font = (
"font/woff2"
)
# Archive "formats"
archive = (
"application/x-tar",
"application/zip",
"application/x-rar",
"application/x-arc",
"application/x-warc",
"application/x-7z-compressed",
)
# Archive "filters"
arc_filter = (
"application/gzip",
"application/x-bzip2",
"application/x-xz",
"application/x-zstd",
"application/x-lzma",
"application/x-lz4",
"application/x-lzip",
"application/x-lzop",
)
doc = (
"application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
"application/vnd.openxmlformats-officedocument.presentationml.presentation"
)
cnt = 1
@@ -46,6 +76,12 @@ def mime_id(mime):
mime_id += " | 0x40000000"
elif mime in font:
mime_id += " | 0x20000000"
elif mime in archive:
mime_id += " | 0x10000000"
elif mime in arc_filter:
mime_id += " | 0x08000000"
elif mime in doc:
mime_id += " | 0x04000000"
elif mime == "application/x-empty":
return "1"
return mime_id

View File

@@ -1,8 +1,9 @@
files = [
"web/css/bundle.css",
"web/css/bundle_dark.css",
"web/js/bundle.js",
"web/img/bg-bars.png",
"web/img/sprite-skin-flat.png",
"web/img/sprite-skin-flat-dark.png",
"web/search.html",
]

186
src/cli.c
View File

@@ -1,30 +1,57 @@
#include "cli.h"
#include "ctx.h"
#include <tesseract/capi.h>
#define DEFAULT_OUTPUT "index.sist2/"
#define DEFAULT_CONTENT_SIZE 4096
#define DEFAULT_QUALITY 15
#define DEFAULT_SIZE 200
#define DEFAULT_CONTENT_SIZE 32768
#define DEFAULT_QUALITY 5
#define DEFAULT_SIZE 500
#define DEFAULT_REWRITE_URL ""
#define DEFAULT_ES_URL "http://localhost:9200"
#define DEFAULT_BATCH_SIZE 100
#define DEFAULT_BIND_ADDR "localhost"
#define DEFAULT_PORT "4090"
const char* TESS_DATAPATHS[] = {
"/usr/share/tessdata/",
"/usr/share/tesseract-ocr/tessdata/",
"./",
NULL
};
scan_args_t *scan_args_create() {
scan_args_t *args = calloc(sizeof(scan_args_t), 1);
args->depth = -1;
return args;
}
index_args_t *index_args_create() {
index_args_t *args = calloc(sizeof(index_args_t), 1);
return args;
void scan_args_destroy(scan_args_t *args) {
if (args->name != NULL) {
free(args->name);
}
if (args->path != NULL) {
free(args->path);
}
if (args->output != NULL) {
free(args->output);
}
free(args);
}
web_args_t *web_args_create() {
web_args_t *args = calloc(sizeof(web_args_t), 1);
return args;
void index_args_destroy(index_args_t *args) {
//todo
free(args);
}
void web_args_destroy(web_args_t *args) {
//todo
free(args);
}
int scan_args_validate(scan_args_t *args, int argc, const char **argv) {
@@ -35,7 +62,7 @@ int scan_args_validate(scan_args_t *args, int argc, const char **argv) {
char *abs_path = abspath(argv[1]);
if (abs_path == NULL) {
fprintf(stderr, "File not found: %s", argv[1]);
fprintf(stderr, "File not found: %s\n", argv[1]);
return 1;
} else {
args->path = abs_path;
@@ -44,7 +71,7 @@ int scan_args_validate(scan_args_t *args, int argc, const char **argv) {
if (args->incremental != NULL) {
abs_path = abspath(args->incremental);
if (abs_path == NULL) {
fprintf(stderr, "File not found: %s", args->incremental);
fprintf(stderr, "File not found: %s\n", args->incremental);
return 1;
}
}
@@ -58,16 +85,13 @@ int scan_args_validate(scan_args_t *args, int argc, const char **argv) {
if (args->size == 0) {
args->size = DEFAULT_SIZE;
} else if (args->size <= 0) {
fprintf(stderr, "Invalid size: %d\n", args->size);
} else if (args->size > 0 && args->size < 32) {
printf("Invalid size: %d\n", args->content_size);
return 1;
}
if (args->content_size == 0) {
args->content_size = DEFAULT_CONTENT_SIZE;
} else if (args->content_size <= 0) {
fprintf(stderr, "Invalid content-size: %d\n", args->content_size);
return 1;
}
if (args->threads == 0) {
@@ -90,6 +114,12 @@ int scan_args_validate(scan_args_t *args, int argc, const char **argv) {
return 1;
}
if (args->depth < 0) {
args->depth = G_MAXINT32;
} else {
args->depth += 1;
}
if (args->name == NULL) {
args->name = g_path_get_basename(args->output);
}
@@ -97,11 +127,62 @@ int scan_args_validate(scan_args_t *args, int argc, const char **argv) {
if (args->rewrite_url == NULL) {
args->rewrite_url = DEFAULT_REWRITE_URL;
}
if (args->archive == NULL || strcmp(args->archive, "recurse") == 0) {
args->archive_mode = ARC_MODE_RECURSE;
} else if (strcmp(args->archive, "list") == 0) {
args->archive_mode = ARC_MODE_LIST;
} else if (strcmp(args->archive, "shallow") == 0) {
args->archive_mode = ARC_MODE_SHALLOW;
} else if (strcmp(args->archive, "skip") == 0) {
args->archive_mode = ARC_MODE_SKIP;
} else {
fprintf(stderr, "Archive mode must be one of (skip, list, shallow, recurse), got '%s'", args->archive);
return 1;
}
if (args->tesseract_lang != NULL) {
TessBaseAPI *api = TessBaseAPICreate();
char filename[128];
sprintf(filename, "%s.traineddata", args->tesseract_lang);
const char * path = find_file_in_paths(TESS_DATAPATHS, filename);
if (path == NULL) {
LOG_FATAL("cli.c", "Could not find tesseract language file!");
}
ret = TessBaseAPIInit3(api, path, args->tesseract_lang);
if (ret != 0) {
fprintf(stderr, "Could not initialize tesseract with lang '%s'\n", args->tesseract_lang);
return 1;
}
TessBaseAPIEnd(api);
TessBaseAPIDelete(api);
args->tesseract_path = path;
}
LOG_DEBUGF("cli.c", "arg quality=%f", args->quality)
LOG_DEBUGF("cli.c", "arg size=%d", args->size)
LOG_DEBUGF("cli.c", "arg content_size=%d", args->content_size)
LOG_DEBUGF("cli.c", "arg threads=%d", args->threads)
LOG_DEBUGF("cli.c", "arg incremental=%s", args->incremental)
LOG_DEBUGF("cli.c", "arg output=%s", args->output)
LOG_DEBUGF("cli.c", "arg rewrite_url=%s", args->rewrite_url)
LOG_DEBUGF("cli.c", "arg name=%s", args->name)
LOG_DEBUGF("cli.c", "arg depth=%d", args->depth)
LOG_DEBUGF("cli.c", "arg path=%s", args->path)
LOG_DEBUGF("cli.c", "arg archive=%s", args->archive)
LOG_DEBUGF("cli.c", "arg tesseract_lang=%s", args->tesseract_lang)
LOG_DEBUGF("cli.c", "arg tesseract_path=%s", args->tesseract_path)
return 0;
}
int index_args_validate(index_args_t *args, int argc, const char **argv) {
LogCtx.verbose = 1;
if (argc < 2) {
fprintf(stderr, "Required positional argument: PATH.\n");
return 1;
@@ -109,20 +190,62 @@ int index_args_validate(index_args_t *args, int argc, const char **argv) {
char *index_path = abspath(argv[1]);
if (index_path == NULL) {
fprintf(stderr, "File not found: %s", argv[1]);
fprintf(stderr, "File not found: %s\n", argv[1]);
return 1;
} else {
args->index_path = argv[1];
free(index_path);
}
if (args->es_url == NULL) {
args->es_url = DEFAULT_ES_URL;
}
if (args->script_path != NULL) {
struct stat info;
int res = stat(args->script_path, &info);
if (res == -1) {
fprintf(stderr, "Error opening script file '%s': %s\n", args->script_path, strerror(errno));
return 1;
}
int fd = open(args->script_path, O_RDONLY);
if (fd == -1) {
fprintf(stderr, "Error opening script file '%s': %s\n", args->script_path, strerror(errno));
return 1;
}
args->script = malloc(info.st_size + 1);
res = read(fd, args->script, info.st_size);
if (res == -1) {
fprintf(stderr, "Error reading script file '%s': %s\n", args->script_path, strerror(errno));
return 1;
}
*(args->script + info.st_size) = '\0';
close(fd);
}
if (args->batch_size == 0) {
args->batch_size = DEFAULT_BATCH_SIZE;
}
LOG_DEBUGF("cli.c", "arg es_url=%s", args->es_url)
LOG_DEBUGF("cli.c", "arg index_path=%s", args->index_path)
LOG_DEBUGF("cli.c", "arg script_path=%s", args->script_path)
LOG_DEBUGF("cli.c", "arg script=%s", args->script)
LOG_DEBUGF("cli.c", "arg print=%d", args->print)
LOG_DEBUGF("cli.c", "arg batch_size=%d", args->batch_size)
LOG_DEBUGF("cli.c", "arg force_reset=%d", args->force_reset)
return 0;
}
int web_args_validate(web_args_t *args, int argc, const char **argv) {
LogCtx.verbose = 1;
if (argc < 2) {
fprintf(stderr, "Required positional argument: PATH.\n");
return 1;
@@ -140,16 +263,43 @@ int web_args_validate(web_args_t *args, int argc, const char **argv) {
args->port = DEFAULT_PORT;
}
if (args->credentials != NULL) {
args->b64credentials = onion_base64_encode(args->credentials, (int) strlen(args->credentials));
//Remove trailing newline
*(args->b64credentials + strlen(args->b64credentials) - 1) = '\0';
}
args->index_count = argc - 1;
args->indices = argv + 1;
for (int i = 0; i < args->index_count; i++) {
char *abs_path = abspath(args->indices[i]);
if (abs_path == NULL) {
fprintf(stderr, "File not found: %s", abs_path);
fprintf(stderr, "File not found: %s\n", args->indices[i]);
return 1;
}
}
LOG_DEBUGF("cli.c", "arg es_url=%s", args->es_url)
LOG_DEBUGF("cli.c", "arg bind=%s", args->bind)
LOG_DEBUGF("cli.c", "arg port=%s", args->port)
LOG_DEBUGF("cli.c", "arg credentials=%s", args->credentials)
LOG_DEBUGF("cli.c", "arg b64credentials=%s", args->b64credentials)
LOG_DEBUGF("cli.c", "arg index_count=%d", args->index_count)
for (int i = 0; i < args->index_count; i++) {
LOG_DEBUGF("cli.c", "arg indices[%d]=%s", i, args->indices[i])
}
return 0;
}
index_args_t *index_args_create() {
index_args_t *args = calloc(sizeof(index_args_t), 1);
return args;
}
web_args_t *web_args_create() {
web_args_t *args = calloc(sizeof(web_args_t), 1);
return args;
}

View File

@@ -12,13 +12,25 @@ typedef struct scan_args {
char *output;
char *rewrite_url;
char *name;
int depth;
char *path;
char *archive;
archive_mode_t archive_mode;
char *tesseract_lang;
const char *tesseract_path;
} scan_args_t;
scan_args_t *scan_args_create();
void scan_args_destroy(scan_args_t *args);
int scan_args_validate(scan_args_t *args, int argc, const char **argv);
typedef struct index_args {
char *es_url;
const char *index_path;
const char *script_path;
char *script;
int print;
int batch_size;
int force_reset;
} index_args_t;
@@ -26,15 +38,18 @@ typedef struct web_args {
char *es_url;
char *bind;
char *port;
char *credentials;
char *b64credentials;
int index_count;
const char **indices;
} web_args_t;
scan_args_t *scan_args_create();
index_args_t *index_args_create();
web_args_t *web_args_create();
void index_args_destroy(index_args_t *args);
web_args_t *web_args_create();
void web_args_destroy(web_args_t *args);
int scan_args_validate(scan_args_t *args, int argc, const char **argv);
int index_args_validate(index_args_t *args, int argc, const char **argv);
int web_args_validate(web_args_t *args, int argc, const char **argv);

View File

@@ -15,6 +15,10 @@ struct {
int threads;
int content_size;
float tn_qscale;
int depth;
archive_mode_t archive_mode;
int verbose;
int very_verbose;
size_t stat_tn_size;
size_t stat_index_size;
@@ -23,16 +27,25 @@ struct {
GHashTable *copy_table;
pthread_mutex_t mupdf_mu;
char * tesseract_lang;
const char * tesseract_path;
} ScanCtx;
struct {
int verbose;
int very_verbose;
int no_color;
} LogCtx;
struct {
char *es_url;
int batch_size;
} IndexCtx;
struct {
char *es_url;
int index_count;
char *b64credentials;
struct index_t indices[16];
} WebCtx;

View File

@@ -6,11 +6,9 @@
#include <stdio.h>
#include <string.h>
#include <cJSON/cJSON.h>
#include <src/ctx.h>
#include "static_generated.c"
#define BULK_INDEX_SIZE 100
typedef struct es_indexer {
int queued;
@@ -22,6 +20,8 @@ typedef struct es_indexer {
static es_indexer_t *Indexer;
void delete_queue(int max);
void print_json(cJSON *document, const char uuid_str[UUID_STR_LEN]) {
cJSON *line = cJSON_CreateObject();
@@ -29,13 +29,14 @@ void print_json(cJSON *document, const char uuid_str[UUID_STR_LEN]) {
cJSON_AddStringToObject(line, "_id", uuid_str);
cJSON_AddStringToObject(line, "_index", "sist2");
cJSON_AddStringToObject(line, "_type", "_doc");
cJSON_AddItemToObject(line, "_source", document);
cJSON_AddItemReferenceToObject(line, "_source", document);
char *json = cJSON_PrintUnformatted(line);
printf("%s\n", json);
cJSON_free(line);
cJSON_free(json);
cJSON_Delete(line);
}
void index_json(cJSON *document, const char uuid_str[UUID_STR_LEN]) {
@@ -54,24 +55,52 @@ void index_json(cJSON *document, const char uuid_str[UUID_STR_LEN]) {
elastic_index_line(bulk_line);
}
void elastic_flush() {
void execute_update_script(const char *script, const char index_id[UUID_STR_LEN]) {
if (Indexer == NULL) {
Indexer = create_indexer(IndexCtx.es_url);
cJSON *body = cJSON_CreateObject();
cJSON *script_obj = cJSON_AddObjectToObject(body, "script");
cJSON_AddStringToObject(script_obj, "lang", "painless");
cJSON_AddStringToObject(script_obj, "source", script);
cJSON *query = cJSON_AddObjectToObject(body, "query");
cJSON *term_obj = cJSON_AddObjectToObject(query, "term");
cJSON_AddStringToObject(term_obj, "index", index_id);
char *str = cJSON_Print(body);
char bulk_url[4096];
snprintf(bulk_url, 4096, "%s/sist2/_update_by_query?pretty", Indexer->es_url);
response_t *r = web_post(bulk_url, str, "Content-Type: application/json");
LOG_INFOF("elastic.c", "Executed user script <%d>", r->status_code);
cJSON *resp = cJSON_Parse(r->body);
cJSON_free(str);
cJSON_Delete(body);
free_response(r);
cJSON *error = cJSON_GetObjectItem(resp, "error");
if (error != NULL) {
char *error_str = cJSON_Print(error);
LOG_ERRORF("elastic.c", "User script error: \n%s", error_str);
cJSON_free(error_str);
}
es_bulk_line_t *line = Indexer->line_head;
cJSON_Delete(resp);
}
int count = 0;
void *create_bulk_buffer(int max, int *count, size_t *buf_len) {
es_bulk_line_t *line = Indexer->line_head;
*count = 0;
size_t buf_size = 0;
size_t buf_cur = 0;
char *buf = malloc(1);
while (line != NULL) {
while (line != NULL && *count < max) {
char action_str[512];
snprintf(action_str, 512,
"{\"index\":{\"_id\":\"%s\", \"_type\":\"_doc\", \"_index\":\"sist2\"}}\n", line->uuid_str);
"{\"index\":{\"_id\":\"%s\", \"_type\":\"_doc\", \"_index\":\"sist2\"}}\n", line->uuid_str);
size_t action_str_len = strlen(action_str);
size_t line_len = strlen(line->line);
@@ -83,31 +112,99 @@ void elastic_flush() {
memcpy(buf + buf_cur, line->line, line_len);
buf_cur += line_len;
es_bulk_line_t *tmp = line;
line = line->next;
free(tmp);
count++;
(*count)++;
}
buf = realloc(buf, buf_size + 1);
*(buf+buf_cur) = '\0';
*(buf + buf_cur) = '\0';
Indexer->line_head = NULL;
Indexer->line_tail = NULL;
Indexer->queued = 0;
*buf_len = buf_cur;
return buf;
}
void _elastic_flush(int max) {
size_t buf_len;
int count;
void *buf = create_bulk_buffer(max, &count, &buf_len);
char bulk_url[4096];
snprintf(bulk_url, 4096, "%s/sist2/_bulk", Indexer->es_url);
snprintf(bulk_url, 4096, "%s/sist2/_bulk?pipeline=tie", Indexer->es_url);
response_t *r = web_post(bulk_url, buf, "Content-Type: application/x-ndjson");
printf("Indexed %3d documents (%zukB) <%d>\n", count, buf_cur / 1024, r->status_code);
cJSON *ret_json = cJSON_Parse(r->body);
if (cJSON_GetObjectItem(ret_json, "errors")->valueint != 0) {
fprintf(stderr, "%s\n", r->body);
if (r->status_code == 0) {
LOG_FATALF("elastic.c", "Could not connect to %s, make sure that elasticsearch is running!\n", IndexCtx.es_url)
}
cJSON_Delete(ret_json);
if (r->status_code == 413) {
if (max <= 1) {
LOG_ERRORF("elastic.c", "Single document too large, giving up: {%s}", Indexer->line_head->uuid_str)
free_response(r);
free(buf);
delete_queue(1);
if (Indexer->queued != 0) {
elastic_flush();
}
return;
}
LOG_WARNINGF("elastic.c", "Payload too large, retrying (%d documents)", count);
free_response(r);
free(buf);
_elastic_flush(max / 2);
return;
} else if (r->status_code != 200) {
cJSON *ret_json = cJSON_Parse(r->body);
if (cJSON_GetObjectItem(ret_json, "errors")->valueint != 0) {
cJSON *err;
cJSON_ArrayForEach(err, cJSON_GetObjectItem(ret_json, "items")) {
if (cJSON_GetObjectItem(cJSON_GetObjectItem(err, "index"), "status")->valueint != 201) {
char *str = cJSON_Print(err);
LOG_ERRORF("elastic.c", "%s\n", str);
cJSON_free(str);
}
}
}
cJSON_Delete(ret_json);
delete_queue(Indexer->queued);
} else {
LOG_INFOF("elastic.c", "Indexed %d documents (%zukB) <%d>", count, buf_len / 1024, r->status_code);
delete_queue(max);
if (Indexer->queued != 0) {
elastic_flush();
}
}
free_response(r);
free(buf);
}
void delete_queue(int max) {
for (int i = 0; i < max; i++) {
es_bulk_line_t *tmp = Indexer->line_head;
Indexer->line_head = tmp->next;
if (Indexer->line_head == NULL) {
Indexer->line_tail = NULL;
} else {
free(tmp);
}
Indexer->queued -= 1;
}
}
void elastic_flush() {
if (Indexer == NULL) {
Indexer = create_indexer(IndexCtx.es_url);
}
_elastic_flush(Indexer->queued);
}
void elastic_index_line(es_bulk_line_t *line) {
@@ -126,15 +223,14 @@ void elastic_index_line(es_bulk_line_t *line) {
Indexer->queued += 1;
if (Indexer->queued >= BULK_INDEX_SIZE) {
if (Indexer->queued >= IndexCtx.batch_size) {
elastic_flush();
}
}
es_indexer_t *create_indexer(const char *url) {
size_t url_len = strlen(url);
char *es_url = malloc(url_len);
char *es_url = malloc(strlen(url) + 1);
strcpy(es_url, url);
es_indexer_t *indexer = malloc(sizeof(es_indexer_t));
@@ -147,18 +243,27 @@ es_indexer_t *create_indexer(const char *url) {
return indexer;
}
void destroy_indexer() {
void destroy_indexer(char *script, char index_id[UUID_STR_LEN]) {
char url[4096];
snprintf(url, sizeof(url), "%s/sist2/_refresh", IndexCtx.es_url);
response_t *r = web_post(url, "", NULL);
printf("Refresh index <%d>\n", r->status_code);
LOG_INFOF("elastic.c", "Refresh index <%d>", r->status_code);
free_response(r);
if (script != NULL) {
execute_update_script(script, index_id);
}
snprintf(url, sizeof(url), "%s/sist2/_refresh", IndexCtx.es_url);
r = web_post(url, "", NULL);
LOG_INFOF("elastic.c", "Refresh index <%d>", r->status_code);
free_response(r);
snprintf(url, sizeof(url), "%s/sist2/_forcemerge", IndexCtx.es_url);
r = web_post(url, "", NULL);
printf("Merge index <%d>\n", r->status_code);
LOG_INFOF("elastic.c", "Merge index <%d>", r->status_code);
free_response(r);
if (Indexer != NULL) {
@@ -178,32 +283,37 @@ void elastic_init(int force_reset) {
if (!index_exists || force_reset) {
r = web_delete(url);
printf("Delete index <%d>\n", r->status_code);
LOG_INFOF("elastic.c", "Delete index <%d>", r->status_code);
free_response(r);
snprintf(url, 4096, "%s/sist2", IndexCtx.es_url);
r = web_put(url, "", NULL);
printf("Create index <%d>\n", r->status_code);
LOG_INFOF("elastic.c", "Create index <%d>", r->status_code);
free_response(r);
snprintf(url, 4096, "%s/sist2/_close", IndexCtx.es_url);
r = web_post(url, "", NULL);
printf("Close index <%d>\n", r->status_code);
LOG_INFOF("elastic.c", "Close index <%d>", r->status_code);
free_response(r);
snprintf(url, 4096, "%s/_ingest/pipeline/tie", IndexCtx.es_url);
r = web_put(url, pipeline_json, "Content-Type: application/json");
LOG_INFOF("elastic.c", "Create pipeline <%d>", r->status_code);
free_response(r);
snprintf(url, 4096, "%s/sist2/_settings", IndexCtx.es_url);
r = web_put(url, settings_json, "Content-Type: application/json");
printf("Update settings <%d>\n", r->status_code);
LOG_INFOF("elastic.c", "Update settings <%d>", r->status_code);
free_response(r);
snprintf(url, 4096, "%s/sist2/_mappings/_doc?include_type_name=true", IndexCtx.es_url);
r = web_put(url, mappings_json, "Content-Type: application/json");
printf("Update mappings <%d>\n", r->status_code);
LOG_INFOF("elastic.c", "Update mappings <%d>", r->status_code);
free_response(r);
snprintf(url, 4096, "%s/sist2/_open", IndexCtx.es_url);
r = web_post(url, "", NULL);
printf("Open index <%d>\n", r->status_code);
LOG_INFOF("elastic.c", "Open index <%d>", r->status_code);
free_response(r);
}
}
@@ -213,8 +323,35 @@ cJSON *elastic_get_document(const char *uuid_str) {
snprintf(url, 4096, "%s/sist2/_doc/%s", WebCtx.es_url, uuid_str);
response_t *r = web_get(url);
cJSON *json = NULL;
if (r->status_code == 200) {
return cJSON_Parse(r->body);
json = cJSON_Parse(r->body);
}
return NULL;
free_response(r);
return json;
}
char *elastic_get_status() {
char url[4096];
snprintf(url, 4096,
"%s/_cluster/state/metadata/sist2?filter_path=metadata.indices.*.state", WebCtx.es_url);
response_t *r = web_get(url);
cJSON *json = NULL;
char *status = malloc(128 * sizeof(char));
status[0] = '\0';
if (r->status_code == 200) {
json = cJSON_Parse(r->body);
const cJSON *metadata = cJSON_GetObjectItem(json, "metadata");
if (metadata != NULL) {
const cJSON *indices = cJSON_GetObjectItem(metadata, "indices");
const cJSON *sist2 = cJSON_GetObjectItem(indices, "sist2");
const cJSON *state = cJSON_GetObjectItem(sist2, "state");
strcpy(status, state->valuestring);
}
}
free_response(r);
cJSON_Delete(json);
return status;
}

View File

@@ -24,10 +24,12 @@ void index_json(cJSON *document, const char uuid_str[UUID_STR_LEN]);
es_indexer_t *create_indexer(const char* es_url);
void destroy_indexer();
void destroy_indexer(char *script, char index_id[UUID_STR_LEN]);
void elastic_init(int force_reset);
cJSON *elastic_get_document(const char *uuid_str);
char *elastic_get_status();
#endif

File diff suppressed because one or more lines are too long

View File

@@ -49,18 +49,19 @@ response_t *web_post(const char *url, const char *data, const char *header) {
curl_easy_setopt(curl, CURLOPT_POST, 1);
curl_easy_setopt(curl, CURLOPT_USERAGENT, "sist2");
struct curl_slist *headers = NULL;
if (header != NULL) {
struct curl_slist *headers = NULL;
headers = curl_slist_append(headers, header);
curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers);
}
curl_easy_setopt(curl, CURLOPT_POSTFIELDS, data);
int r1 = curl_easy_perform(curl);
curl_easy_perform(curl);
curl_easy_getinfo(curl, CURLINFO_RESPONSE_CODE, &resp->status_code);
curl_easy_cleanup(curl);
curl_slist_free_all(headers);
resp->body = buffer.buf;
resp->size = buffer.cur;

View File

@@ -1,7 +1,7 @@
#include "src/ctx.h"
#include "serialize.h"
static __thread int IndexFd = -1;
static __thread int index_fd = -1;
typedef struct {
unsigned char uuid[16];
@@ -34,6 +34,7 @@ void write_index_descriptor(char *path, index_descriptor_t *desc) {
cJSON_AddStringToObject(json, "version", desc->version);
cJSON_AddStringToObject(json, "root", desc->root);
cJSON_AddStringToObject(json, "name", desc->name);
cJSON_AddStringToObject(json, "type", desc->type);
cJSON_AddStringToObject(json, "rewrite_url", desc->rewrite_url);
cJSON_AddNumberToObject(json, "timestamp", (double) desc->timestamp);
@@ -54,6 +55,11 @@ index_descriptor_t read_index_descriptor(char *path) {
struct stat info;
stat(path, &info);
int fd = open(path, O_RDONLY);
if (fd == -1) {
LOG_FATAL("serialize.c", "Invalid/corrupt index (Could not find descriptor)\n")
}
char *buf = malloc(info.st_size + 1);
read(fd, buf, info.st_size);
*(buf + info.st_size) = '\0';
@@ -66,9 +72,14 @@ index_descriptor_t read_index_descriptor(char *path) {
strcpy(descriptor.root, cJSON_GetObjectItem(json, "root")->valuestring);
strcpy(descriptor.name, cJSON_GetObjectItem(json, "name")->valuestring);
strcpy(descriptor.rewrite_url, cJSON_GetObjectItem(json, "rewrite_url")->valuestring);
descriptor.root_len = (short)strlen(descriptor.root);
descriptor.root_len = (short) strlen(descriptor.root);
strcpy(descriptor.version, cJSON_GetObjectItem(json, "version")->valuestring);
strcpy(descriptor.uuid, cJSON_GetObjectItem(json, "uuid")->valuestring);
if (cJSON_GetObjectItem(json, "type") == NULL) {
strcpy(descriptor.type, INDEX_TYPE_BIN);
} else {
strcpy(descriptor.type, cJSON_GetObjectItem(json, "type")->valuestring);
}
cJSON_Delete(json);
free(buf);
@@ -105,6 +116,26 @@ char *get_meta_key_text(enum metakey meta_key) {
return "title";
case MetaFontName:
return "font_name";
case MetaParent:
return "parent";
case MetaExifMake:
return "exif_make";
case MetaExifSoftware:
return "exif_software";
case MetaExifExposureTime:
return "exif_exposure_time";
case MetaExifFNumber:
return "exif_fnumber";
case MetaExifFocalLength:
return "exif_focal_length";
case MetaExifUserComment:
return "exif_user_comment";
case MetaExifIsoSpeedRatings:
return "exif_iso_speed_ratings";
case MetaExifModel:
return "exif_model";
case MetaExifDateTime:
return "exif_datetime";
default:
return NULL;
}
@@ -113,13 +144,13 @@ char *get_meta_key_text(enum metakey meta_key) {
void write_document(document_t *doc) {
if (IndexFd == -1) {
if (index_fd == -1) {
char dstfile[PATH_MAX];
pid_t tid = syscall(SYS_gettid);
snprintf(dstfile, PATH_MAX, "%s_index_%d", ScanCtx.index.path, tid);
IndexFd = open(dstfile, O_CREAT | O_WRONLY | O_APPEND, S_IRUSR | S_IWUSR);
pthread_t self = pthread_self();
snprintf(dstfile, PATH_MAX, "%s_index_%lu", ScanCtx.index.path, self);
index_fd = open(dstfile, O_CREAT | O_WRONLY | O_APPEND, S_IRUSR | S_IWUSR);
if (IndexFd == -1) {
if (index_fd == -1) {
perror("open");
}
}
@@ -152,17 +183,20 @@ void write_document(document_t *doc) {
}
dyn_buffer_write_char(&buf, '\n');
write(IndexFd, buf.buf, buf.cur);
int res = write(index_fd, buf.buf, buf.cur);
if (res == -1) {
perror("write");
}
ScanCtx.stat_index_size += buf.cur;
dyn_buffer_destroy(&buf);
}
void serializer_cleanup() {
close(IndexFd);
void thread_cleanup() {
close(index_fd);
}
void read_index(const char *path, const char index_id[UUID_STR_LEN], index_func func) {
void read_index_bin(const char *path, const char *index_id, index_func func) {
line_t line;
dyn_buffer_t buf = dyn_buffer_create();
@@ -180,8 +214,13 @@ void read_index(const char *path, const char index_id[UUID_STR_LEN], index_func
char uuid_str[UUID_STR_LEN];
uuid_unparse(line.uuid, uuid_str);
cJSON_AddStringToObject(document, "mime", mime_get_mime_text(line.mime));
cJSON_AddNumberToObject(document, "size", (double)line.size);
const char* mime_text = mime_get_mime_text(line.mime);
if (mime_text == NULL) {
cJSON_AddNullToObject(document, "mime");
} else {
cJSON_AddStringToObject(document, "mime", mime_get_mime_text(line.mime));
}
cJSON_AddNumberToObject(document, "size", (double) line.size);
cJSON_AddNumberToObject(document, "mtime", line.mtime);
int c;
@@ -197,21 +236,30 @@ void read_index(const char *path, const char index_id[UUID_STR_LEN], index_func
*(buf.buf + line.ext) = '\0';
}
cJSON_AddStringToObject(document, "name", buf.buf + line.base);
*(buf.buf + line.base - 1) = '\0';
cJSON_AddStringToObject(document, "path", buf.buf);
if (line.base > 0) {
*(buf.buf + line.base - 1) = '\0';
cJSON_AddStringToObject(document, "path", buf.buf);
} else {
cJSON_AddStringToObject(document, "path", "");
}
enum metakey key = getc(file);
while (key != '\n') {
switch (key) {
case MetaWidth:
case MetaHeight:
case MetaMediaDuration:
case MetaMediaBitrate: {
case MetaHeight: {
int value;
fread(&value, sizeof(int), 1, file);
cJSON_AddNumberToObject(document, get_meta_key_text(key), value);
break;
}
case MetaMediaDuration:
case MetaMediaBitrate: {
long value;
fread(&value, sizeof(long), 1, file);
cJSON_AddNumberToObject(document, get_meta_key_text(key), (double) value);
break;
}
case MetaMediaAudioCodec:
case MetaMediaVideoCodec: {
int value;
@@ -229,10 +277,20 @@ void read_index(const char *path, const char index_id[UUID_STR_LEN], index_func
case MetaAlbumArtist:
case MetaGenre:
case MetaFontName:
case MetaParent:
case MetaExifMake:
case MetaExifSoftware:
case MetaExifExposureTime:
case MetaExifFNumber:
case MetaExifFocalLength:
case MetaExifUserComment:
case MetaExifIsoSpeedRatings:
case MetaExifDateTime:
case MetaExifModel:
case MetaTitle: {
buf.cur = 0;
while ((c = getc(file)) != 0) {
if (!(SHOULD_IGNORE_CHAR(c)) || c == ' ') {
if (SHOULD_KEEP_CHAR(c) || c == ' ') {
dyn_buffer_write_char(&buf, (char) c);
}
}
@@ -240,17 +298,103 @@ void read_index(const char *path, const char index_id[UUID_STR_LEN], index_func
cJSON_AddStringToObject(document, get_meta_key_text(key), buf.buf);
break;
}
default:
LOG_FATALF("serialize.c", "Invalid meta key (corrupt index): %x", key)
}
key = getc(file);
}
func(document, uuid_str);
cJSON_free(document);
cJSON_Delete(document);
}
dyn_buffer_destroy(&buf);
fclose(file);
}
const char *json_type_copy_fields[] = {
"mime", "name", "path", "extension", "index", "size", "mtime", "parent",
// Meta
"title", "content", "width", "height", "duration", "audioc", "videoc",
"bitrate", "artist", "album", "album_artist", "genre", "title", "font_name",
// Special
"tag", "_url"
};
const char *json_type_array_fields[] = {
"_keyword", "_text"
};
void read_index_json(const char *path, UNUSED(const char *index_id), index_func func) {
FILE *file = fopen(path, "r");
while (1) {
char *line = NULL;
size_t len;
size_t read = getline(&line, &len, file);
if (read == -1) {
if (line) {
free(line);
}
break;
}
cJSON *input = cJSON_Parse(line);
if (input == NULL) {
LOG_FATALF("serialize.c", "Could not parse JSON line: \n%s", line)
}
if (line) {
free(line);
}
cJSON *document = cJSON_CreateObject();
const char *uuid_str = cJSON_GetObjectItem(input, "_id")->valuestring;
for (int i = 0; i < (sizeof(json_type_copy_fields) / sizeof(json_type_copy_fields[0])); i++) {
cJSON *value = cJSON_GetObjectItem(input, json_type_copy_fields[i]);
if (value != NULL) {
cJSON_AddItemReferenceToObject(document, json_type_copy_fields[i], value);
}
}
for (int i = 0; i < (sizeof(json_type_array_fields) / sizeof(json_type_array_fields[0])); i++) {
cJSON *arr = cJSON_GetObjectItem(input, json_type_array_fields[i]);
if (arr != NULL) {
cJSON *obj;
cJSON_ArrayForEach(obj, arr) {
char key[1024];
cJSON *k = cJSON_GetObjectItem(obj, "k");
cJSON *v = cJSON_GetObjectItem(obj, "v");
if (k == NULL || v == NULL || !cJSON_IsString(k) || !cJSON_IsString(v)) {
char *str = cJSON_Print(obj);
LOG_FATALF("serialize.c", "Invalid %s member: must contain .k and .v string fields: \n%s",
json_type_array_fields[i], str)
}
snprintf(key, sizeof(key), "%s.%s", json_type_array_fields[i], k->valuestring);
cJSON_AddStringToObject(document, key, v->valuestring);
}
}
}
func(document, uuid_str);
cJSON_Delete(document);
cJSON_Delete(input);
}
fclose(file);
}
void read_index(const char *path, const char index_id[UUID_STR_LEN], const char *type, index_func func) {
if (strcmp(type, INDEX_TYPE_BIN) == 0) {
read_index_bin(path, index_id, func);
} else if (strcmp(type, INDEX_TYPE_JSON) == 0) {
read_index_json(path, index_id, func);
}
}
void incremental_read(GHashTable *table, const char *filepath) {
FILE *file = fopen(filepath, "rb");
line_t line;
@@ -291,6 +435,7 @@ void incremental_copy(store_t *store, store_t *dst_store, const char *filepath,
size_t buf_len;
char *buf = store_read(store, (char *) line.uuid, 16, &buf_len);
store_write(dst_store, (char *) line.uuid, 16, buf, buf_len);
free(buf);
char c;
while ((c = (char) getc(file))) {

View File

@@ -11,14 +11,14 @@ void incremental_copy(store_t *store, store_t *dst_store, const char *filepath,
void write_document(document_t *doc);
void read_index(const char *path, const char[UUID_STR_LEN], index_func);
void read_index(const char *path, const char[UUID_STR_LEN], const char *type, index_func);
void incremental_read(GHashTable *table, const char *filepath);
/**
* Must be called after write_document
*/
void serializer_cleanup();
void thread_cleanup();
void write_index_descriptor(char *path, index_descriptor_t *desc);

View File

@@ -9,13 +9,13 @@ store_t *store_create(char *path) {
mdb_env_create(&store->env);
int open_ret = mdb_env_open(store->env,
path,
MDB_WRITEMAP | MDB_MAPASYNC,
S_IRUSR | S_IWUSR
path,
MDB_WRITEMAP | MDB_MAPASYNC,
S_IRUSR | S_IWUSR
);
if (open_ret != 0) {
fprintf(stderr, "Error while opening store: %s", mdb_strerror(open_ret));
fprintf(stderr, "Error while opening store: %s (%s)\n", mdb_strerror(open_ret), path);
exit(1);
}
@@ -42,6 +42,12 @@ void store_destroy(store_t *store) {
void store_write(store_t *store, char *key, size_t key_len, char *buf, size_t buf_len) {
if (LogCtx.very_verbose) {
char uuid_str[UUID_STR_LEN];
uuid_unparse((unsigned char *) key, uuid_str);
LOG_DEBUGF("store.c", "Store write {%s} %lu bytes", uuid_str, buf_len)
}
MDB_val mdb_key;
mdb_key.mv_data = key;
mdb_key.mv_size = key_len;
@@ -64,10 +70,12 @@ void store_write(store_t *store, char *key, size_t key_len, char *buf, size_t bu
// Cannot resize when there is a opened transaction.
// Resize take effect on the next commit.
pthread_rwlock_wrlock(&store->lock);
store->size += 1024 * 1024 * 5;
store->size += 1024 * 1024 * 50;
mdb_env_set_mapsize(store->env, store->size);
mdb_txn_begin(store->env, NULL, 0, &txn);
put_ret = mdb_put(txn, store->dbi, &mdb_key, &mdb_value, 0);
LOG_INFOF("store.c", "Updated mdb mapsize to %lu bytes", store->size)
}
mdb_txn_commit(txn);

View File

@@ -1,28 +1,36 @@
#include "walk.h"
#include "src/ctx.h"
parse_job_t *create_parse_job(const char *filepath, const struct stat *info, int base) {
__always_inline
parse_job_t *create_fs_parse_job(const char *filepath, const struct stat *info, int base) {
int len = (int) strlen(filepath);
parse_job_t *job = malloc(sizeof(parse_job_t) + len);
memcpy(&(job->filepath), filepath, len + 1);
strcpy(job->filepath, filepath);
job->base = base;
char *p = strrchr(filepath + base, '.');
if (p != NULL) {
job->ext = (int)(p - filepath + 1);
job->ext = (int) (p - filepath + 1);
} else {
job->ext = len;
}
memcpy(&(job->info), info, sizeof(struct stat));
job->info = *info;
memset(job->parent, 0, 16);
job->vfile.filepath = job->filepath;
job->vfile.read = fs_read;
job->vfile.close = fs_close;
job->vfile.fd = -1;
job->vfile.is_fs_file = TRUE;
return job;
}
int handle_entry(const char *filepath, const struct stat *info, int typeflag, struct FTW *ftw) {
if (typeflag == FTW_F && S_ISREG(info->st_mode)) {
parse_job_t *job = create_parse_job(filepath, info, ftw->base);
if (ftw->level <= ScanCtx.depth && typeflag == FTW_F && S_ISREG(info->st_mode)) {
parse_job_t *job = create_fs_parse_job(filepath, info, ftw->base);
tpool_add_work(ScanCtx.pool, parse, job);
}

99
src/log.c Normal file
View File

@@ -0,0 +1,99 @@
#include "log.h"
const char *log_colors[] = {
"\033[34m", "\033[01;34m", "\033[0m",
"\033[01;33m", "\033[31m", "\033[01;31m"
};
const char *log_levels[] = {
"DEBUG", "INFO", "WARNING", "ERROR", "FATAL"
};
void sist_logf(char *filepath, int level, char *format, ...) {
static int is_tty = -1;
if (is_tty == -1) {
is_tty = isatty(STDERR_FILENO);
}
char log_str[LOG_MAX_LENGTH];
unsigned long long pid = (unsigned long long) pthread_self();
char datetime[32];
time_t t;
struct tm result;
t = time(NULL);
localtime_r(&t, &result);
strftime(datetime, sizeof(datetime), "%Y-%m-%d %H:%M:%S", &result);
int log_len;
if (is_tty) {
log_len = snprintf(
log_str, sizeof(log_str),
"\033[%dm[%04X]%s [%s] [%s %s] ",
31 + ((unsigned int) (pid)) % 7, pid, log_colors[level],
datetime, log_levels[level], filepath
);
} else {
log_len = snprintf(
log_str, sizeof(log_str),
"[%04X] [%s] [%s %s] ",
pid, datetime, log_levels[level], filepath
);
}
va_list ap;
va_start(ap, format);
size_t maxsize = sizeof(log_str) - log_len;
log_len += vsnprintf(log_str + log_len, maxsize, format, ap);
va_end(ap);
if (is_tty) {
log_len += sprintf(log_str + log_len, "\033[0m\n");
} else {
*(log_str + log_len) = '\n';
log_len += 1;
}
write(STDERR_FILENO, log_str, log_len);
}
void sist_log(char *filepath, int level, char *str) {
static int is_tty = -1;
if (is_tty == -1) {
is_tty = isatty(STDERR_FILENO);
}
char log_str[LOG_MAX_LENGTH];
unsigned long long pid = (unsigned long long) pthread_self();
char datetime[32];
time_t t;
struct tm result;
t = time(NULL);
localtime_r(&t, &result);
strftime(datetime, sizeof(datetime), "%Y-%m-%d %H:%M:%S", &result);
int log_len;
if (is_tty) {
log_len = snprintf(
log_str, sizeof(log_str),
"\033[%dm[%04X]%s [%s] [%s %s] %s \033[0m\n",
31 + ((unsigned int) (pid)) % 7, pid, log_colors[level],
datetime, log_levels[level], filepath,
str
);
} else {
log_len = snprintf(
log_str, sizeof(log_str),
"[%04X] [%s] [%s %s] %s \n",
pid, datetime, log_levels[level], filepath,
str
);
}
write(STDERR_FILENO, log_str, log_len);
}

45
src/log.h Normal file
View File

@@ -0,0 +1,45 @@
#ifndef SIST2_LOG_H
#define SIST2_LOG_H
#define LOG_MAX_LENGTH 8192
#define SIST_DEBUG 0
#define SIST_INFO 1
#define SIST_WARNING 2
#define SIST_ERROR 3
#define SIST_FATAL 4
#define LOG_DEBUGF(filepath, fmt, ...) \
if (LogCtx.very_verbose) {sist_logf(filepath, SIST_DEBUG, fmt, __VA_ARGS__);}
#define LOG_DEBUG(filepath, str) \
if (LogCtx.very_verbose) {sist_log(filepath, SIST_DEBUG, str);}
#define LOG_INFOF(filepath, fmt, ...) \
if (LogCtx.verbose) {sist_logf(filepath, SIST_INFO, fmt, __VA_ARGS__);}
#define LOG_INFO(filepath, str) \
if (LogCtx.verbose) {sist_log(filepath, SIST_INFO, str);}
#define LOG_WARNINGF(filepath, fmt, ...) \
if (LogCtx.verbose) {sist_logf(filepath, SIST_WARNING, fmt, __VA_ARGS__);}
#define LOG_WARNING(filepath, str) \
if (LogCtx.verbose) {sist_log(filepath, SIST_WARNING, str);}
#define LOG_ERRORF(filepath, fmt, ...) \
if (LogCtx.verbose) {sist_logf(filepath, SIST_ERROR, fmt, __VA_ARGS__);}
#define LOG_ERROR(filepath, str) \
if (LogCtx.verbose) {sist_log(filepath, SIST_ERROR, str);}
#define LOG_FATALF(filepath, fmt, ...) \
sist_logf(filepath, SIST_FATAL, fmt, __VA_ARGS__);\
exit(-1);
#define LOG_FATAL(filepath, str) \
sist_log(filepath, SIST_FATAL, str);\
exit(-1);
#include "src/sist.h"
void sist_logf(char *filepath, int level, char *format, ...);
void sist_log(char *filepath, int level, char *str);
#endif

View File

@@ -2,10 +2,11 @@
#include "ctx.h"
#define DESCRIPTION "Lightning-fast file system indexer and search tool."
#define EPILOG "Made by simon987 <me@simon987.net>. Released under GPL-3.0"
static const char *const Version = "1.0.7";
static const char *const Version = "1.2.13";
static const char *const usage[] = {
"sist2 scan [OPTION]... PATH",
"sist2 index [OPTION]... INDEX",
@@ -16,6 +17,7 @@ static const char *const usage[] = {
void global_init() {
curl_global_init(CURL_GLOBAL_NOTHING);
av_log_set_level(AV_LOG_QUIET);
opcInitLibrary();
}
void init_dir(const char *dirpath) {
@@ -23,21 +25,17 @@ void init_dir(const char *dirpath) {
snprintf(path, PATH_MAX, "%sdescriptor.json", dirpath);
uuid_t uuid;
uuid_generate_time_safe(uuid);
uuid_generate(uuid);
uuid_unparse(uuid, ScanCtx.index.desc.uuid);
time(&ScanCtx.index.desc.timestamp);
strcpy(ScanCtx.index.desc.version, Version);
strcpy(ScanCtx.index.desc.type, INDEX_TYPE_BIN);
write_index_descriptor(path, &ScanCtx.index.desc);
}
void scan_print_header() {
printf("sist2 V%s\n", Version);
printf("---------------------\n");
printf("threads\t\t%d\n", ScanCtx.threads);
printf("tn_qscale\t%.1f/31.0\n", ScanCtx.tn_qscale);
printf("tn_size\t\t%dpx\n", ScanCtx.tn_size);
printf("output\t\t%s\n", ScanCtx.index.path);
LOG_INFOF("main.c", "sist2 v%s", Version)
}
void sist2_scan(scan_args_t *args) {
@@ -45,12 +43,16 @@ void sist2_scan(scan_args_t *args) {
ScanCtx.tn_qscale = args->quality;
ScanCtx.tn_size = args->size;
ScanCtx.content_size = args->content_size;
ScanCtx.pool = tpool_create(args->threads, serializer_cleanup);
ScanCtx.threads = args->threads;
ScanCtx.depth = args->depth;
ScanCtx.archive_mode = args->archive_mode;
strncpy(ScanCtx.index.path, args->output, sizeof(ScanCtx.index.path));
strncpy(ScanCtx.index.desc.name, args->name, sizeof(ScanCtx.index.desc.name));
strcpy(ScanCtx.index.desc.root, args->path);
strncpy(ScanCtx.index.desc.root, args->path, sizeof(ScanCtx.index.desc.root));
strncpy(ScanCtx.index.desc.rewrite_url, args->rewrite_url, sizeof(ScanCtx.index.desc.rewrite_url));
ScanCtx.index.desc.root_len = (short) strlen(ScanCtx.index.desc.root);
ScanCtx.tesseract_lang = args->tesseract_lang;
ScanCtx.tesseract_path = args->tesseract_path;
init_dir(ScanCtx.index.path);
@@ -86,6 +88,8 @@ void sist2_scan(scan_args_t *args) {
printf("Loaded %d items in to mtime table.", g_hash_table_size(ScanCtx.original_table));
}
ScanCtx.pool = tpool_create(args->threads, thread_cleanup);
tpool_start(ScanCtx.pool);
walk_directory_tree(ScanCtx.index.desc.root);
tpool_wait(ScanCtx.pool);
tpool_destroy(ScanCtx.pool);
@@ -119,6 +123,7 @@ void sist2_scan(scan_args_t *args) {
void sist2_index(index_args_t *args) {
IndexCtx.es_url = args->es_url;
IndexCtx.batch_size = args->batch_size;
if (!args->print) {
elastic_init(args->force_reset);
@@ -128,8 +133,12 @@ void sist2_index(index_args_t *args) {
snprintf(descriptor_path, PATH_MAX, "%s/descriptor.json", args->index_path);
index_descriptor_t desc = read_index_descriptor(descriptor_path);
if (strcmp(desc.version, Version) != 0) {
fprintf(stderr, "Version mismatch! Index is v%s but executable is v%s\n", desc.version, Version);
LOG_DEBUGF("main.c", "descriptor version %s (%s)", desc.version, desc.type)
if (strcmp(desc.version, Version) != 0 && strcmp(desc.version, INDEX_VERSION_EXTERNAL) != 0) {
fprintf(stderr, "Version mismatch! Index is %s but executable is %s/%s\n",
desc.version, Version, INDEX_VERSION_EXTERNAL);
return;
}
@@ -151,13 +160,14 @@ void sist2_index(index_args_t *args) {
if (strncmp(de->d_name, "_index_", sizeof("_index_") - 1) == 0) {
char file_path[PATH_MAX];
snprintf(file_path, PATH_MAX, "%s/%s", args->index_path, de->d_name);
read_index(file_path, desc.uuid, f);
read_index(file_path, desc.uuid, desc.type, f);
}
}
closedir(dir);
if (!args->print) {
elastic_flush();
destroy_indexer();
destroy_indexer(args->script, desc.uuid);
}
}
@@ -165,6 +175,7 @@ void sist2_web(web_args_t *args) {
WebCtx.es_url = args->es_url;
WebCtx.index_count = args->index_count;
WebCtx.b64credentials = args->b64credentials;
for (int i = 0; i < args->index_count; i++) {
char *abs_path = abspath(args->indices[i]);
@@ -196,34 +207,51 @@ int main(int argc, const char *argv[]) {
index_args_t *index_args = index_args_create();
web_args_t *web_args = web_args_create();
char * common_es_url = NULL;
int arg_version = 0;
char *common_es_url = NULL;
struct argparse_option options[] = {
OPT_HELP(),
OPT_BOOLEAN('v', "version", &arg_version, "Show version and exit"),
OPT_BOOLEAN(0, "verbose", &LogCtx.verbose, "Turn on logging"),
OPT_BOOLEAN(0, "very-verbose", &LogCtx.very_verbose, "Turn on debug messages"),
OPT_GROUP("Scan options"),
OPT_INTEGER('t', "threads", &scan_args->threads, "Number of threads. DEFAULT=1"),
OPT_FLOAT('q', "quality", &scan_args->quality,
"Thumbnail quality, on a scale of 1.0 to 31.0, 1.0 being the best. DEFAULT=15"),
OPT_INTEGER(0, "size", &scan_args->size, "Thumbnail size, in pixels. DEFAULT=200"),
"Thumbnail quality, on a scale of 1.0 to 31.0, 1.0 being the best. DEFAULT=5"),
OPT_INTEGER(0, "size", &scan_args->size,
"Thumbnail size, in pixels. Use negative value to disable. DEFAULT=500"),
OPT_INTEGER(0, "content-size", &scan_args->content_size,
"Number of bytes to be extracted from text documents. DEFAULT=4096"),
"Number of bytes to be extracted from text documents. Use negative value to disable. DEFAULT=32768"),
OPT_STRING(0, "incremental", &scan_args->incremental,
"Reuse an existing index and only scan modified files."),
OPT_STRING('o', "output", &scan_args->output, "Output directory. DEFAULT=index.sist2/"),
OPT_STRING(0, "rewrite-url", &scan_args->rewrite_url, "Serve files from this url instead of from disk."),
OPT_STRING(0, "name", &scan_args->name, "Index display name. DEFAULT: (name of the directory)"),
OPT_INTEGER(0, "depth", &scan_args->depth, "Scan up to DEPTH subdirectories deep. "
"Use 0 to only scan files in PATH. DEFAULT: -1"),
OPT_STRING(0, "archive", &scan_args->archive, "Archive file mode (skip|list|shallow|recurse). "
"skip: Don't parse, list: only get file names as text, "
"shallow: Don't parse archives inside archives. DEFAULT: recurse"),
OPT_STRING(0, "ocr", &scan_args->tesseract_lang, "Tesseract language (use tesseract --list-langs to see "
"which are installed on your machine)"),
OPT_GROUP("Index options"),
OPT_STRING(0, "es-url", &common_es_url, "Elasticsearch url. DEFAULT=http://localhost:9200"),
OPT_STRING(0, "es-url", &common_es_url, "Elasticsearch url with port. DEFAULT=http://localhost:9200"),
OPT_BOOLEAN('p', "print", &index_args->print, "Just print JSON documents to stdout."),
OPT_STRING(0, "script-file", &index_args->script_path, "Path to user script."),
OPT_INTEGER(0, "batch-size", &index_args->batch_size, "Index batch size. DEFAULT: 100"),
OPT_BOOLEAN('f', "force-reset", &index_args->force_reset, "Reset Elasticsearch mappings and settings. "
"(You must use this option the first time you use the index command)"),
"(You must use this option the first time you use the index command)"),
OPT_GROUP("Web options"),
OPT_STRING(0, "es-url", &common_es_url, "Elasticsearch url. DEFAULT=http://localhost:9200"),
OPT_STRING(0, "bind", &web_args->bind, "Listen on this address. DEFAULT=localhost"),
OPT_STRING(0, "port", &web_args->port, "Listen on this port. DEFAULT=4090"),
OPT_STRING(0, "auth", &web_args->credentials, "Basic auth in user:password format"),
OPT_END(),
};
@@ -233,6 +261,15 @@ int main(int argc, const char *argv[]) {
argparse_describe(&argparse, DESCRIPTION, EPILOG);
argc = argparse_parse(&argparse, argc, argv);
if (arg_version) {
printf(Version);
exit(0);
}
if (LogCtx.very_verbose != 0) {
LogCtx.verbose = 1;
}
web_args->es_url = common_es_url;
index_args->es_url = common_es_url;
@@ -247,7 +284,9 @@ int main(int argc, const char *argv[]) {
}
sist2_scan(scan_args);
} else if (strcmp(argv[0], "index") == 0) {
}
else if (strcmp(argv[0], "index") == 0) {
int err = index_args_validate(index_args, argc, argv);
if (err != 0) {
@@ -263,11 +302,18 @@ int main(int argc, const char *argv[]) {
}
sist2_web(web_args);
} else {
}
else {
fprintf(stderr, "Invalid command: '%s'\n", argv[0]);
argparse_usage(&argparse);
return 1;
}
printf("\n");
scan_args_destroy(scan_args);
index_args_destroy(index_args);
web_args_destroy(web_args);
return 0;
}

157
src/parsing/arc.c Normal file
View File

@@ -0,0 +1,157 @@
#include "arc.h"
#include "src/ctx.h"
#define ARC_BUF_SIZE 8192
int should_parse_filtered_file(const char *filepath, int ext) {
char tmp[PATH_MAX * 2];
if (ext == 0) {
return FALSE;
}
memcpy(tmp, filepath, ext - 1);
*(tmp + ext - 1) = '\0';
char *idx = strrchr(tmp, '.');
if (idx == NULL) {
return FALSE;
}
if (strcmp(idx, ".tar") == 0) {
return TRUE;
}
return FALSE;
}
int arc_read(struct vfile *f, void *buf, size_t size) {
return archive_read_data(f->arc, buf, size);
}
typedef struct arc_data {
vfile_t *f;
char buf[ARC_BUF_SIZE];
} arc_data_f;
int vfile_open_callback(struct archive *a, void *user_data) {
arc_data_f *data = user_data;
if (data->f->is_fs_file && data->f->fd == -1) {
data->f->fd = open(data->f->filepath, O_RDONLY);
}
return ARCHIVE_OK;
}
long vfile_read_callback(struct archive *a, void *user_data, const void **buf) {
arc_data_f *data = user_data;
*buf = data->buf;
return data->f->read(data->f, data->buf, ARC_BUF_SIZE);
}
int vfile_close_callback(struct archive *a, void *user_data) {
arc_data_f *data = user_data;
if (data->f->close != NULL) {
data->f->close(data->f);
}
return ARCHIVE_OK;
}
void parse_archive(vfile_t *f, document_t *doc) {
struct archive *a;
struct archive_entry *entry;
arc_data_f data;
data.f = f;
int ret = 0;
if (data.f->is_fs_file) {
a = archive_read_new();
archive_read_support_filter_all(a);
archive_read_support_format_all(a);
ret = archive_read_open_filename(a, doc->filepath, ARC_BUF_SIZE);
} else if (ScanCtx.archive_mode == ARC_MODE_RECURSE) {
a = archive_read_new();
archive_read_support_filter_all(a);
archive_read_support_format_all(a);
ret = archive_read_open(
a, &data,
vfile_open_callback,
vfile_read_callback,
vfile_close_callback
);
} else {
return;
}
if (ret != ARCHIVE_OK) {
LOG_ERRORF(doc->filepath, "(arc.c) [%d] %s", ret, archive_error_string(a))
archive_read_free(a);
return;
}
if (ScanCtx.archive_mode == ARC_MODE_LIST) {
dyn_buffer_t buf = dyn_buffer_create();
while (archive_read_next_header(a, &entry) == ARCHIVE_OK) {
if (S_ISREG(archive_entry_stat(entry)->st_mode)) {
char *path = (char *) archive_entry_pathname(entry);
dyn_buffer_append_string(&buf, path);
dyn_buffer_write_char(&buf, '\n');
}
}
dyn_buffer_write_char(&buf, '\0');
meta_line_t *meta_list = malloc(sizeof(meta_line_t) + buf.cur);
meta_list->key = MetaContent;
strcpy(meta_list->strval, buf.buf);
APPEND_META(doc, meta_list);
dyn_buffer_destroy(&buf);
} else {
parse_job_t *sub_job = malloc(sizeof(parse_job_t) + PATH_MAX * 2);
sub_job->vfile.close = NULL;
sub_job->vfile.read = arc_read;
sub_job->vfile.arc = a;
sub_job->vfile.filepath = sub_job->filepath;
sub_job->vfile.is_fs_file = FALSE;
memcpy(sub_job->parent, doc->uuid, sizeof(uuid_t));
while (archive_read_next_header(a, &entry) == ARCHIVE_OK) {
sub_job->info = *archive_entry_stat(entry);
if (S_ISREG(sub_job->info.st_mode)) {
sprintf(sub_job->filepath, "%s#/%s", f->filepath, archive_entry_pathname(entry));
sub_job->base = (int) (strrchr(sub_job->filepath, '/') - sub_job->filepath) + 1;
char *p = strrchr(sub_job->filepath, '.');
if (p != NULL) {
sub_job->ext = (int) (p - sub_job->filepath + 1);
} else {
sub_job->ext = (int) strlen(sub_job->filepath);
}
parse(sub_job);
}
}
free(sub_job);
}
archive_read_free(a);
}

12
src/parsing/arc.h Normal file
View File

@@ -0,0 +1,12 @@
#ifndef SIST2_ARC_H
#define SIST2_ARC_H
#include "src/sist.h"
int should_parse_filtered_file(const char *filepath, int ext);
void parse_archive(vfile_t *f, document_t *doc);
int arc_read(struct vfile * f, void *buf, size_t size);
#endif

129
src/parsing/doc.c Normal file
View File

@@ -0,0 +1,129 @@
#include "doc.h"
#include "src/ctx.h"
int dump_text(mceTextReader_t *reader, dyn_buffer_t *buf) {
mce_skip_attributes(reader);
xmlErrorPtr err = xmlGetLastError();
if (err != NULL) {
if (err->level == XML_ERR_FATAL) {
LOG_ERRORF("doc.c", "Got fatal XML error while parsing document: %s", err->message)
return -1;
} else {
LOG_ERRORF("doc.c", "Got recoverable XML error while parsing document: %s", err->message)
}
}
mce_start_children(reader) {
mce_start_element(reader, NULL, _X("t")) {
mce_skip_attributes(reader);
mce_start_children(reader) {
mce_start_text(reader) {
char *str = (char *) xmlTextReaderConstValue(reader->reader);
dyn_buffer_append_string(buf, str);
dyn_buffer_write_char(buf, ' ');
} mce_end_text(reader);
} mce_end_children(reader);
} mce_end_element(reader);
mce_start_element(reader, NULL, NULL) {
int ret = dump_text(reader, buf);
if (ret != 0) {
return ret;
}
} mce_end_element(reader);
} mce_end_children(reader)
return 0;
}
__always_inline
int should_read_part(opcPart part) {
char *part_name = (char *) part;
if (part == NULL) {
return FALSE;
}
if ( // Word
strcmp(part_name, "word/document.xml") == 0
|| strncmp(part_name, "word/footer", sizeof("word/footer") - 1) == 0
|| strncmp(part_name, "word/header", sizeof("word/header") - 1) == 0
// PowerPoint
|| strncmp(part_name, "ppt/slides/slide", sizeof("ppt/slides/slide") - 1) == 0
|| strncmp(part_name, "ppt/notesSlides/notesSlide", sizeof("ppt/notesSlides/notesSlide") - 1) == 0
// Excel
|| strncmp(part_name, "xl/worksheets/sheet", sizeof("xl/worksheets/sheet") - 1) == 0
|| strcmp(part_name, "xl/sharedStrings.xml") == 0
|| strcmp(part_name, "xl/workbook.xml") == 0
) {
return TRUE;
}
return FALSE;
}
__always_inline
int read_part(opcContainer *c, dyn_buffer_t *buf, opcPart part, document_t *doc) {
mceTextReader_t reader;
int ret = opcXmlReaderOpen(c, &reader, part, NULL, "UTF-8", XML_PARSE_NOWARNING | XML_PARSE_NOERROR | XML_PARSE_NONET);
if (ret != OPC_ERROR_NONE) {
LOG_ERRORF(doc->filepath, "(doc.c) opcXmlReaderOpen() returned error code %d", ret);
return -1;
}
mce_start_document(&reader) {
mce_start_element(&reader, NULL, NULL) {
ret = dump_text(&reader, buf);
if (ret != 0) {
mceTextReaderCleanup(&reader);
return -1;
}
} mce_end_element(&reader);
} mce_end_document(&reader);
mceTextReaderCleanup(&reader);
return 0;
}
void parse_doc(void *mem, size_t mem_len, document_t *doc) {
if (mem == NULL) {
return;
}
opcContainer *c = opcContainerOpenMem(mem, mem_len, OPC_OPEN_READ_ONLY, NULL);
if (c == NULL) {
LOG_ERROR(doc->filepath, "(doc.c) Couldn't open document with opcContainerOpenMem()");
return;
}
dyn_buffer_t buf = dyn_buffer_create();
opcPart part = opcPartGetFirst(c);
do {
if (should_read_part(part)) {
int ret = read_part(c, &buf, part, doc);
if (ret != 0) {
break;
}
}
} while ((part = opcPartGetNext(c, part)));
opcContainerClose(c, OPC_CLOSE_NOW);
if (buf.cur > 0) {
dyn_buffer_write_char(&buf, '\0');
meta_line_t *meta = malloc(sizeof(meta_line_t) + buf.cur);
meta->key = MetaContent;
strcpy(meta->strval, buf.buf);
APPEND_META(doc, meta)
}
dyn_buffer_destroy(&buf);
}

8
src/parsing/doc.h Normal file
View File

@@ -0,0 +1,8 @@
#ifndef SIST2_DOC_H
#define SIST2_DOC_H
#include "src/sist.h"
void parse_doc(void *buf, size_t buf_len, document_t *doc);
#endif

View File

@@ -1,11 +1,9 @@
#include "font.h"
#include "ft2build.h"
#include "freetype/freetype.h"
#include "src/ctx.h"
__thread FT_Library library = NULL;
__thread FT_Library ft_lib = NULL;
typedef struct text_dimensions {
@@ -15,12 +13,12 @@ typedef struct text_dimensions {
} text_dimensions_t;
typedef struct glyph {
unsigned int top;
unsigned int height;
unsigned int width;
unsigned int descent;
unsigned int ascent;
unsigned int advance_width;
int top;
int height;
int width;
int descent;
int ascent;
int advance_width;
unsigned char *pixmap;
} glyph_t;
@@ -39,10 +37,10 @@ glyph_t ft_glyph_to_glyph(FT_GlyphSlot slot) {
glyph.pixmap = slot->bitmap.buffer;
glyph.width = slot->bitmap.width;
glyph.height = slot->bitmap.rows;
glyph.width = (int) slot->bitmap.width;
glyph.height = (int) slot->bitmap.rows;
glyph.top = slot->bitmap_top;
glyph.advance_width = slot->advance.x / 64;
glyph.advance_width = (int) slot->advance.x / 64;
glyph.descent = MAX(0, glyph.height - glyph.top);
glyph.ascent = MAX(0, MAX(glyph.top, glyph.height) - glyph.descent);
@@ -50,10 +48,6 @@ glyph_t ft_glyph_to_glyph(FT_GlyphSlot slot) {
return glyph;
}
__always_inline
glyph_t get_glyph(char character, FT_Face face) {
}
text_dimensions_t text_dimension(char *text, FT_Face face) {
text_dimensions_t dimensions;
@@ -62,7 +56,7 @@ text_dimensions_t text_dimension(char *text, FT_Face face) {
int num_chars = (int) strlen(text);
unsigned int max_ascent = 0;
unsigned int max_descent = 0;
int max_descent = 0;
char pc = 0;
for (int i = 0; i < num_chars; i++) {
@@ -72,7 +66,7 @@ text_dimensions_t text_dimension(char *text, FT_Face face) {
glyph_t glyph = ft_glyph_to_glyph(face->glyph);
max_descent = MAX(max_descent, glyph.descent);
max_ascent = MAX(max_ascent, glyph.ascent);
max_ascent = MAX(max_ascent, MAX(glyph.height, glyph.ascent));
int kerning_x = kerning_offset(c, pc, face);
dimensions.width += MAX(glyph.advance_width, glyph.width) + kerning_x;
@@ -143,20 +137,28 @@ void bmp_format(dyn_buffer_t *buf, text_dimensions_t dimensions, const unsigned
}
void parse_font(const char *buf, size_t buf_len, document_t *doc) {
if (library == NULL) {
FT_Init_FreeType(&library);
if (ft_lib == NULL) {
FT_Init_FreeType(&ft_lib);
}
if (buf == NULL) {
return;
}
FT_Face face;
FT_Error err = FT_New_Memory_Face(library, (unsigned char *) buf, buf_len, 0, &face);
FT_Error err = FT_New_Memory_Face(ft_lib, (unsigned char *) buf, buf_len, 0, &face);
if (err != 0) {
LOG_ERRORF(doc->filepath, "(font.c) FT_New_Memory_Face() returned error code [%d] %s", err, ft_error_string(err));
return;
}
char font_name[1024];
if (face->style_name == NULL || *(face->style_name) == '?') {
strcpy(font_name, face->family_name);
if (face->family_name == NULL) {
strcpy(font_name, "(null)");
} else {
strcpy(font_name, face->family_name);
}
} else {
snprintf(font_name, sizeof(font_name), "%s %s", face->family_name, face->style_name);
}
@@ -166,11 +168,16 @@ void parse_font(const char *buf, size_t buf_len, document_t *doc) {
strcpy(meta_name->strval, font_name);
APPEND_META(doc, meta_name)
if (ScanCtx.tn_size <= 0) {
return;
}
int pixel = 64;
int num_chars = (int) strlen(font_name);
err = FT_Set_Pixel_Sizes(face, 0, pixel);
if (err != 0) {
LOG_WARNINGF(doc->filepath, "(font.c) FT_Set_Pixel_Sizes() returned error code [%d] %s", err, ft_error_string(err))
return;
}
@@ -186,11 +193,19 @@ void parse_font(const char *buf, size_t buf_len, document_t *doc) {
err = FT_Load_Char(face, c, FT_LOAD_NO_HINTING | FT_LOAD_RENDER);
if (err != 0) {
continue;
c = c >= 'a' && c <= 'z' ? c - 32 : c + 32;
err = FT_Load_Char(face, c, FT_LOAD_NO_HINTING | FT_LOAD_RENDER);
if (err != 0) {
LOG_WARNINGF(doc->filepath, "(font.c) FT_Load_Char() returned error code [%d] %s", err, ft_error_string(err));
continue;
}
}
glyph_t glyph = ft_glyph_to_glyph(face->glyph);
pen.x += kerning_offset(c, pc, face);
if (pen.x <= 0) {
pen.x = ABS(glyph.advance_width - glyph.width);
}
pen.y = dimensions.height - glyph.ascent - dimensions.baseline;
draw_glyph(&glyph, pen.x, pen.y, dimensions, bitmap);

View File

@@ -1,6 +1,10 @@
#include "src/sist.h"
#include "src/ctx.h"
#define MIN_SIZE 32
#define AVIO_BUF_SIZE 8192
__always_inline
AVCodecContext *alloc_jpeg_encoder(int dstW, int dstH, float qscale) {
AVCodec *jpeg_codec = avcodec_find_encoder(AV_CODEC_ID_MJPEG);
@@ -22,8 +26,8 @@ AVCodecContext *alloc_jpeg_encoder(int dstW, int dstH, float qscale) {
return jpeg;
}
__always_inline
AVFrame *scale_frame(const AVCodecContext *decoder, const AVFrame *frame, int size) {
AVFrame *scaled_frame = av_frame_alloc();
int dstW;
int dstH;
@@ -41,16 +45,22 @@ AVFrame *scale_frame(const AVCodecContext *decoder, const AVFrame *frame, int si
}
}
if (dstW <= MIN_SIZE || dstH <= MIN_SIZE) {
return NULL;
}
AVFrame *scaled_frame = av_frame_alloc();
struct SwsContext *ctx = sws_getContext(
decoder->width, decoder->height, decoder->pix_fmt,
dstW, dstH, AV_PIX_FMT_YUVJ420P,
SWS_FAST_BILINEAR, 0, 0, 0
);
int dst_buf_len = avpicture_get_size(AV_PIX_FMT_YUVJ420P, dstW, dstH);
int dst_buf_len = av_image_get_buffer_size(AV_PIX_FMT_YUV420P, dstW, dstH, 1);
uint8_t *dst_buf = (uint8_t *) av_malloc(dst_buf_len);
avpicture_fill((AVPicture *) scaled_frame, dst_buf, AV_PIX_FMT_YUVJ420P, dstW, dstH);
av_image_fill_arrays(scaled_frame->data, scaled_frame->linesize, dst_buf, AV_PIX_FMT_YUV420P, dstW, dstH, 1);
sws_scale(ctx,
(const uint8_t *const *) frame->data, frame->linesize,
@@ -67,7 +77,8 @@ AVFrame *scale_frame(const AVCodecContext *decoder, const AVFrame *frame, int si
return scaled_frame;
}
AVFrame *read_frame(AVFormatContext *pFormatCtx, AVCodecContext *decoder, int stream_idx) {
__always_inline
AVFrame *read_frame(AVFormatContext *pFormatCtx, AVCodecContext *decoder, int stream_idx, document_t *doc) {
AVFrame *frame = av_frame_alloc();
AVPacket avPacket;
@@ -81,7 +92,10 @@ AVFrame *read_frame(AVFormatContext *pFormatCtx, AVCodecContext *decoder, int st
if (read_frame_ret != 0) {
if (read_frame_ret != AVERROR_EOF) {
fprintf(stderr, "Error reading frame: %s\n", av_err2str(read_frame_ret));
LOG_WARNINGF(doc->filepath,
"(media.c) avcodec_read_frame() returned error code [%d] %s",
read_frame_ret, av_err2str(read_frame_ret)
)
}
av_frame_free(&frame);
av_packet_unref(&avPacket);
@@ -99,7 +113,10 @@ AVFrame *read_frame(AVFormatContext *pFormatCtx, AVCodecContext *decoder, int st
// Feed it to decoder
int decode_ret = avcodec_send_packet(decoder, &avPacket);
if (decode_ret != 0) {
printf("Error decoding frame: %s\n", av_err2str(decode_ret));
LOG_WARNINGF(doc->filepath,
"(media.c) avcodec_send_packet() returned error code [%d] %s",
decode_ret, av_err2str(decode_ret)
)
}
av_packet_unref(&avPacket);
receive_ret = avcodec_receive_frame(decoder, frame);
@@ -107,63 +124,103 @@ AVFrame *read_frame(AVFormatContext *pFormatCtx, AVCodecContext *decoder, int st
return frame;
}
#define APPEND_TAG_META(doc, tag_, keyname) \
text_buffer_t tex = text_buffer_create(-1); \
text_buffer_append_string0(&tex, tag_->value); \
text_buffer_terminate_string(&tex); \
meta_line_t *meta_tag = malloc(sizeof(meta_line_t) + tex.dyn_buffer.cur); \
meta_tag->key = keyname; \
strcpy(meta_tag->strval, tex.dyn_buffer.buf); \
APPEND_META(doc, meta_tag) \
text_buffer_destroy(&tex);
__always_inline
void append_audio_meta(AVFormatContext *pFormatCtx, document_t *doc) {
AVDictionaryEntry *tag = NULL;
while ((tag = av_dict_get(pFormatCtx->metadata, "", tag, AV_DICT_IGNORE_SUFFIX))) {
char *key = tag->key;
for (; *key; ++key) *key = (char) tolower(*key);
char key[32];
strncpy(key, tag->key, sizeof(key));
if (strcmp(tag->key, "artist") == 0) {
size_t len = strlen(tag->value);
meta_line_t *meta_tag = malloc(sizeof(meta_line_t) + len);
meta_tag->key = MetaArtist;
memcpy(meta_tag->strval, tag->value, len);
APPEND_META(doc, meta_tag)
} else if (strcmp(tag->key, "genre") == 0) {
size_t len = strlen(tag->value);
meta_line_t *meta_tag = malloc(sizeof(meta_line_t) + len);
meta_tag->key = MetaGenre;
memcpy(meta_tag->strval, tag->value, len);
APPEND_META(doc, meta_tag)
} else if (strcmp(tag->key, "title") == 0) {
size_t len = strlen(tag->value);
meta_line_t *meta_tag = malloc(sizeof(meta_line_t) + len);
meta_tag->key = MetaTitle;
memcpy(meta_tag->strval, tag->value, len);
APPEND_META(doc, meta_tag)
} else if (strcmp(tag->key, "album_artist") == 0) {
size_t len = strlen(tag->value);
meta_line_t *meta_tag = malloc(sizeof(meta_line_t) + len);
meta_tag->key = MetaAlbumArtist;
memcpy(meta_tag->strval, tag->value, len);
APPEND_META(doc, meta_tag)
} else if (strcmp(tag->key, "album") == 0) {
size_t len = strlen(tag->value);
meta_line_t *meta_tag = malloc(sizeof(meta_line_t) + len);
meta_tag->key = MetaAlbum;
memcpy(meta_tag->strval, tag->value, len);
APPEND_META(doc, meta_tag)
char *ptr = key;
for (; *ptr; ++ptr) *ptr = (char) tolower(*ptr);
if (strcmp(key, "artist") == 0) {
APPEND_TAG_META(doc, tag, MetaArtist)
} else if (strcmp(key, "genre") == 0) {
APPEND_TAG_META(doc, tag, MetaGenre)
} else if (strcmp(key, "title") == 0) {
APPEND_TAG_META(doc, tag, MetaTitle)
} else if (strcmp(key, "album_artist") == 0) {
APPEND_TAG_META(doc, tag, MetaAlbumArtist)
} else if (strcmp(key, "album") == 0) {
APPEND_TAG_META(doc, tag, MetaAlbum)
}
}
}
void parse_media(const char *filepath, document_t *doc) {
__always_inline
void
append_video_meta(AVFormatContext *pFormatCtx, AVFrame *frame, document_t *doc, int include_audio_tags, int is_video) {
if (is_video) {
meta_line_t *meta_duration = malloc(sizeof(meta_line_t));
meta_duration->key = MetaMediaDuration;
meta_duration->longval = pFormatCtx->duration / AV_TIME_BASE;
APPEND_META(doc, meta_duration)
meta_line_t *meta_bitrate = malloc(sizeof(meta_line_t));
meta_bitrate->key = MetaMediaBitrate;
meta_bitrate->longval = pFormatCtx->bit_rate;
APPEND_META(doc, meta_bitrate)
}
AVDictionaryEntry *tag = NULL;
if (is_video) {
while ((tag = av_dict_get(pFormatCtx->metadata, "", tag, AV_DICT_IGNORE_SUFFIX))) {
if (include_audio_tags && strcmp(tag->key, "title") == 0) {
APPEND_TAG_META(doc, tag, MetaTitle)
} else if (strcmp(tag->key, "comment") == 0) {
APPEND_TAG_META(doc, tag, MetaContent)
} else if (include_audio_tags && strcmp(tag->key, "artist") == 0) {
APPEND_TAG_META(doc, tag, MetaArtist)
}
}
} else {
// EXIF metadata
while ((tag = av_dict_get(frame->metadata, "", tag, AV_DICT_IGNORE_SUFFIX))) {
if (include_audio_tags && strcmp(tag->key, "Artist") == 0) {
APPEND_TAG_META(doc, tag, MetaArtist)
} else if (strcmp(tag->key, "ImageDescription") == 0) {
APPEND_TAG_META(doc, tag, MetaContent)
} else if (strcmp(tag->key, "Make") == 0) {
APPEND_TAG_META(doc, tag, MetaExifMake)
} else if (strcmp(tag->key, "Model") == 0) {
APPEND_TAG_META(doc, tag, MetaExifModel)
} else if (strcmp(tag->key, "Software") == 0) {
APPEND_TAG_META(doc, tag, MetaExifSoftware)
} else if (strcmp(tag->key, "FNumber") == 0) {
APPEND_TAG_META(doc, tag, MetaExifFNumber)
} else if (strcmp(tag->key, "FocalLength") == 0) {
APPEND_TAG_META(doc, tag, MetaExifFocalLength)
} else if (strcmp(tag->key, "UserComment") == 0) {
APPEND_TAG_META(doc, tag, MetaExifUserComment)
} else if (strcmp(tag->key, "ISOSpeedRatings") == 0) {
APPEND_TAG_META(doc, tag, MetaExifIsoSpeedRatings)
} else if (strcmp(tag->key, "ExposureTime") == 0) {
APPEND_TAG_META(doc, tag, MetaExifExposureTime)
} else if (strcmp(tag->key, "DateTime") == 0) {
APPEND_TAG_META(doc, tag, MetaExifDateTime)
}
}
}
}
void parse_media(AVFormatContext *pFormatCtx, document_t *doc) {
int video_stream = -1;
int audio_stream = -1;
AVFormatContext *pFormatCtx = avformat_alloc_context();
if (pFormatCtx == NULL) {
fprintf(stderr, "Could not allocate AVFormatContext! %s \n", filepath);
return;
}
int res = avformat_open_input(&pFormatCtx, filepath, NULL, NULL);
if (res < 0) {
printf("ERR%s %s\n", filepath, av_err2str(res));
return;
}
avformat_find_stream_info(pFormatCtx, NULL);
for (int i = (int) pFormatCtx->nb_streams - 1; i >= 0; i--) {
@@ -202,23 +259,10 @@ void parse_media(const char *filepath, document_t *doc) {
}
}
if (video_stream != -1) {
if (video_stream != -1 && ScanCtx.tn_size > 0) {
AVStream *stream = pFormatCtx->streams[video_stream];
if (stream->nb_frames > 1) {
//This is a video (not a still image)
meta_line_t *meta_duration = malloc(sizeof(meta_line_t));
meta_duration->key = MetaMediaDuration;
meta_duration->longval = pFormatCtx->duration / AV_TIME_BASE;
APPEND_META(doc, meta_duration)
meta_line_t *meta_bitrate = malloc(sizeof(meta_line_t));
meta_bitrate->key = MetaMediaBitrate;
meta_bitrate->intval = pFormatCtx->bit_rate;
APPEND_META(doc, meta_bitrate)
}
if (stream->codecpar->width <= 20 || stream->codecpar->height <= 20) {
if (stream->codecpar->width <= MIN_SIZE || stream->codecpar->height <= MIN_SIZE) {
avformat_close_input(&pFormatCtx);
avformat_free_context(pFormatCtx);
return;
@@ -242,7 +286,7 @@ void parse_media(const char *filepath, document_t *doc) {
}
}
AVFrame *frame = read_frame(pFormatCtx, decoder, video_stream);
AVFrame *frame = read_frame(pFormatCtx, decoder, video_stream, doc);
if (frame == NULL) {
avcodec_free_context(&decoder);
avformat_close_input(&pFormatCtx);
@@ -250,9 +294,19 @@ void parse_media(const char *filepath, document_t *doc) {
return;
}
append_video_meta(pFormatCtx, frame, doc, audio_stream == -1, stream->nb_frames > 1);
// Scale frame
AVFrame *scaled_frame = scale_frame(decoder, frame, ScanCtx.tn_size);
if (scaled_frame == NULL) {
av_frame_free(&frame);
avcodec_free_context(&decoder);
avformat_close_input(&pFormatCtx);
avformat_free_context(pFormatCtx);
return;
}
// Encode frame to jpeg
AVCodecContext *jpeg_encoder = alloc_jpeg_encoder(scaled_frame->width, scaled_frame->height, ScanCtx.tn_qscale);
avcodec_send_frame(jpeg_encoder, scaled_frame);
@@ -262,7 +316,8 @@ void parse_media(const char *filepath, document_t *doc) {
avcodec_receive_packet(jpeg_encoder, &jpeg_packet);
// Save thumbnail
store_write(ScanCtx.index.store, (char *) doc->uuid, sizeof(doc->uuid), (char *) jpeg_packet.data, jpeg_packet.size);
store_write(ScanCtx.index.store, (char *) doc->uuid, sizeof(doc->uuid), (char *) jpeg_packet.data,
jpeg_packet.size);
av_packet_unref(&jpeg_packet);
av_frame_free(&frame);
@@ -276,3 +331,69 @@ void parse_media(const char *filepath, document_t *doc) {
avformat_free_context(pFormatCtx);
}
void parse_media_filename(const char *filepath, document_t *doc) {
AVFormatContext *pFormatCtx = avformat_alloc_context();
if (pFormatCtx == NULL) {
LOG_ERROR(doc->filepath, "(media.c) Could not allocate context with avformat_alloc_context()")
return;
}
int res = avformat_open_input(&pFormatCtx, filepath, NULL, NULL);
if (res < 0) {
LOG_ERRORF(doc->filepath, "(media.c) avformat_open_input() returned [%d] %s", res, av_err2str(res))
avformat_close_input(&pFormatCtx);
avformat_free_context(pFormatCtx);
return;
}
parse_media(pFormatCtx, doc);
}
int vfile_read(void *ptr, uint8_t *buf, int buf_size) {
struct vfile *f = ptr;
int ret = f->read(f, buf, buf_size);
if (ret == 0) {
return AVERROR_EOF;
}
return ret;
}
void parse_media_vfile(struct vfile *f, document_t *doc) {
AVFormatContext *pFormatCtx = avformat_alloc_context();
if (pFormatCtx == NULL) {
LOG_ERROR(doc->filepath, "(media.c) Could not allocate context with avformat_alloc_context()")
return;
}
unsigned char *buffer = (unsigned char *) av_malloc(AVIO_BUF_SIZE);
AVIOContext *io_ctx = avio_alloc_context(buffer, AVIO_BUF_SIZE, 0, f, vfile_read, NULL, NULL);
pFormatCtx->pb = io_ctx;
pFormatCtx->flags |= AVFMT_FLAG_CUSTOM_IO;
int res = avformat_open_input(&pFormatCtx, "", NULL, NULL);
if (res == -5) {
// Tried to parse media that requires seek
av_free(io_ctx->buffer);
avio_context_free(&io_ctx);
avformat_close_input(&pFormatCtx);
avformat_free_context(pFormatCtx);
return;
} else if (res < 0) {
LOG_ERRORF(doc->filepath, "(media.c) avformat_open_input() returned [%d] %s", res, av_err2str(res))
av_free(io_ctx->buffer);
avio_context_free(&io_ctx);
avformat_close_input(&pFormatCtx);
avformat_free_context(pFormatCtx);
return;
}
parse_media(pFormatCtx, doc);
av_free(io_ctx->buffer);
avio_context_free(&io_ctx);
}

View File

@@ -5,7 +5,10 @@
#include "src/sist.h"
#define MIN_VIDEO_SIZE 1024 * 64
#define MIN_IMAGE_SIZE 1024 * 2
void parse_media(const char * filepath, document_t *doc);
void parse_media_filename(const char * filepath, document_t *doc);
void parse_media_vfile(struct vfile *f, document_t *doc);
#endif

View File

@@ -1,10 +1,12 @@
#include "mime.h"
unsigned int mime_get_mime_by_ext(GHashTable *ext_table, const char * ext) {
char lower[64];
char lower[8];
char *p = lower;
while ((*ext)) {
int cnt = 0;
while ((*ext) != '\0' && cnt + 1 < sizeof(lower)) {
*p++ = (char)tolower(*ext++);
cnt++;
}
*p = '\0';
return (size_t) g_hash_table_lookup(ext_table, lower);

View File

@@ -8,7 +8,7 @@
#define MIME_EMPTY 1
#define DONT_PARSE 0x80000000
#define SHOULD_PARSE(mime_id) (mime_id & DONT_PARSE) != DONT_PARSE
#define SHOULD_PARSE(mime_id) (mime_id & DONT_PARSE) != DONT_PARSE && mime_id != 0
#define PDF_MASK 0x40000000
#define IS_PDF(mime_id) (mime_id & PDF_MASK) == PDF_MASK
@@ -16,6 +16,15 @@
#define FONT_MASK 0x20000000
#define IS_FONT(mime_id) (mime_id & FONT_MASK) == FONT_MASK
#define ARC_MASK 0x10000000
#define IS_ARC(mime_id) (mime_id & ARC_MASK) == ARC_MASK
#define ARC_FILTER_MASK 0x08000000
#define IS_ARC_FILTER(mime_id) (mime_id & ARC_FILTER_MASK) == ARC_FILTER_MASK
#define DOC_MASK 0x04000000
#define IS_DOC(mime_id) (mime_id & DOC_MASK) == DOC_MASK
enum major_mime {
MimeInvalid = 0,
MimeModel = 1,

View File

@@ -16,11 +16,11 @@ enum mime {
application_commonground=655368,
application_dicom=655369,
application_drafting=655370,
application_epub_zip=655371,
application_epub_zip=655371 | 0x40000000,
application_freeloader=655372,
application_futuresplash=655373,
application_groupwise=655374,
application_gzip=655375,
application_gzip=655375 | 0x08000000,
application_hta=655376,
application_i_deas=655377,
application_iges=655378,
@@ -39,331 +39,389 @@ enum mime {
application_oda=655391,
application_ogg=655392,
application_pdf=655393 | 0x40000000,
application_pgp_signature=655394,
application_pkcs7_signature=655395,
application_pkix_cert=655396,
application_postscript=655397,
application_pro_eng=655398,
application_ringing_tones=655399,
application_smil=655400,
application_solids=655401,
application_sounder=655402,
application_step=655403,
application_streamingmedia=655404,
application_vda=655405,
application_vnd_fdf=655406,
application_vnd_font_fontforge_sfd=655407,
application_vnd_hp_hpgl=655408,
application_vnd_iccprofile=655409,
application_vnd_ms_cab_compressed=655410,
application_vnd_ms_excel=655411,
application_vnd_ms_fontobject=655412,
application_vnd_ms_opentype=655413 | 0x20000000,
application_vnd_ms_pki_certstore=655414,
application_vnd_ms_pki_pko=655415,
application_vnd_ms_pki_seccat=655416,
application_vnd_ms_powerpoint=655417,
application_vnd_ms_project=655418,
application_vnd_oasis_opendocument_base=655419,
application_vnd_oasis_opendocument_formula=655420,
application_vnd_oasis_opendocument_graphics=655421,
application_vnd_oasis_opendocument_text=655422,
application_vnd_openxmlformats_officedocument_spreadsheetml_sheet=655423,
application_vnd_openxmlformats_officedocument_wordprocessingml_document=655424,
application_vnd_wap_wmlc=655425,
application_vnd_wap_wmlscriptc=655426,
application_vnd_xara=655427,
application_vocaltec_media_desc=655428,
application_vocaltec_media_file=655429,
application_winhelp=655430,
application_wordperfect=655431,
application_wordperfect6_0=655432,
application_wordperfect6_1=655433,
application_x_123=655434,
application_x_7z_compressed=655435,
application_x_aim=655436,
application_x_archive=655437,
application_x_authorware_bin=655438,
application_x_authorware_map=655439,
application_x_authorware_seg=655440,
application_x_bcpio=655441,
application_x_bittorrent=655442,
application_x_bsh=655443,
application_x_bytecode_python=655444,
application_x_bzip=655445,
application_x_bzip2=655446,
application_x_cdlink=655447,
application_x_chat=655448,
application_x_cocoa=655449,
application_x_conference=655450,
application_x_cpio=655451,
application_x_dbf=655452,
application_x_dbt=655453,
application_x_debian_package=655454,
application_x_deepv=655455,
application_x_director=655456,
application_x_dosexec=655457,
application_x_dvi=655458,
application_x_elc=655459,
application_pgp_keys=655394,
application_pgp_signature=655395,
application_pkcs7_signature=655396,
application_pkix_cert=655397,
application_postscript=655398,
application_pro_eng=655399,
application_ringing_tones=655400,
application_smil=655401,
application_solids=655402,
application_sounder=655403,
application_step=655404,
application_streamingmedia=655405,
application_vda=655406,
application_vnd_fdf=655407,
application_vnd_font_fontforge_sfd=655408,
application_vnd_hp_hpgl=655409,
application_vnd_iccprofile=655410,
application_vnd_lotus_1_2_3=655411,
application_vnd_ms_cab_compressed=655412,
application_vnd_ms_excel=655413,
application_vnd_ms_fontobject=655414,
application_vnd_ms_opentype=655415 | 0x20000000,
application_vnd_ms_pki_certstore=655416,
application_vnd_ms_pki_pko=655417,
application_vnd_ms_pki_seccat=655418,
application_vnd_ms_powerpoint=655419,
application_vnd_ms_project=655420,
application_vnd_oasis_opendocument_base=655421,
application_vnd_oasis_opendocument_formula=655422,
application_vnd_oasis_opendocument_graphics=655423,
application_vnd_oasis_opendocument_presentation=655424,
application_vnd_oasis_opendocument_spreadsheet=655425,
application_vnd_oasis_opendocument_text=655426,
application_vnd_openxmlformats_officedocument_presentationml_presentation=655427 | 0x04000000,
application_vnd_openxmlformats_officedocument_spreadsheetml_sheet=655428 | 0x04000000,
application_vnd_openxmlformats_officedocument_wordprocessingml_document=655429 | 0x04000000,
application_vnd_symbian_install=655430,
application_vnd_tcpdump_pcap=655431,
application_vnd_wap_wmlc=655432,
application_vnd_wap_wmlscriptc=655433,
application_vnd_xara=655434,
application_vocaltec_media_desc=655435,
application_vocaltec_media_file=655436,
application_warc=655437,
application_winhelp=655438,
application_wordperfect=655439,
application_wordperfect6_0=655440,
application_wordperfect6_1=655441,
application_x_123=655442,
application_x_7z_compressed=655443 | 0x10000000,
application_x_aim=655444,
application_x_apple_diskimage=655445,
application_x_arc=655446 | 0x10000000,
application_x_archive=655447,
application_x_atari_7800_rom=655448,
application_x_authorware_bin=655449,
application_x_authorware_map=655450,
application_x_authorware_seg=655451,
application_x_avira_qua=655452,
application_x_bcpio=655453,
application_x_bittorrent=655454,
application_x_bsh=655455,
application_x_bytecode_python=655456,
application_x_bzip=655457,
application_x_bzip2=655458 | 0x08000000,
application_x_cbr=655459,
application_x_cbz=655460 | 0x40000000,
application_x_cdlink=655461,
application_x_chat=655462,
application_x_chrome_extension=655463,
application_x_cocoa=655464,
application_x_conference=655465,
application_x_coredump=655466,
application_x_cpio=655467,
application_x_dbf=655468,
application_x_dbt=655469,
application_x_debian_package=655470,
application_x_deepv=655471,
application_x_director=655472,
application_x_dmp=655473,
application_x_dosdriver=655474,
application_x_dosexec=655475,
application_x_dvi=655476,
application_x_elc=655477,
application_x_empty=1,
application_x_envoy=655461,
application_x_esrehber=655462,
application_x_excel=655463,
application_x_executable=655464,
application_x_font_sfn=655465 | 0x20000000,
application_x_font_ttf=655466 | 0x20000000,
application_x_freelance=655467,
application_x_git=655468,
application_x_gsp=655469,
application_x_gss=655470,
application_x_gtar=655471,
application_x_gzip=655472,
application_x_hdf=655473,
application_x_helpfile=655474,
application_x_httpd_imap=655475,
application_x_ima=655476,
application_x_innosetup=655477,
application_x_internett_signup=655478,
application_x_inventor=655479,
application_x_ip2=655480,
application_x_java_applet=655481,
application_x_java_commerce=655482,
application_x_java_image=655483,
application_x_java_keystore=655484,
application_x_kdelnk=655485,
application_x_koan=655486,
application_x_latex=655487,
application_x_livescreen=655488,
application_x_lotus=655489,
application_x_lzh=655490,
application_x_lzx=655491,
application_x_mach_binary=655492,
application_x_mach_executable=655493,
application_x_magic_cap_package_1_0=655494,
application_x_mathcad=655495,
application_x_meme=655496,
application_x_midi=655497,
application_x_mif=655498,
application_x_mix_transfer=655499,
application_x_mobipocket_ebook=655500,
application_x_ms_pdb=655501,
application_x_ms_reader=655502,
application_x_navi_animation=655503,
application_x_navidoc=655504,
application_x_navimap=655505,
application_x_navistyle=655506,
application_x_netcdf=655507,
application_x_newton_compatible_pkg=655508,
application_x_object=655509,
application_x_omc=655510,
application_x_omcdatamaker=655511,
application_x_omcregerator=655512,
application_x_pagemaker=655513,
application_x_pcl=655514,
application_x_pixclscript=655515,
application_x_pkcs7_certreqresp=655516,
application_x_pkcs7_signature=655517,
application_x_project=655518,
application_x_qpro=655519,
application_x_rar=655520,
application_x_rpm=655521,
application_x_sdp=655522,
application_x_sea=655523,
application_x_seelogo=655524,
application_x_setupscript=655525,
application_x_shar=655526,
application_x_sharedlib=655527,
application_x_shockwave_flash=655528,
application_x_sprite=655529,
application_x_sqlite3=655530,
application_x_sv4cpio=655531,
application_x_sv4crc=655532,
application_x_tar=655533,
application_x_tbook=655534,
application_x_tex_tfm=655535,
application_x_texinfo=655536,
application_x_ustar=655537,
application_x_visio=655538,
application_x_vnd_audioexplosion_mzz=655539,
application_x_vnd_ls_xpix=655540,
application_x_vrml=655541,
application_x_wais_source=655542,
application_x_wine_extension_ini=655543,
application_x_wintalk=655544,
application_x_world=655545,
application_x_wri=655546,
application_x_x509_ca_cert=655547,
application_x_xz=655548,
application_xml=655549,
application_zip=655550,
audio_it=458943,
audio_make=458944,
audio_mid=458945,
audio_midi=458946,
audio_mp4=458947,
audio_mpeg=458948,
audio_ogg=458949,
audio_s3m=458950,
audio_tsp_audio=458951,
audio_tsplayer=458952,
audio_vnd_qcelp=458953,
audio_voxware=458954,
audio_x_flac=458955,
audio_x_gsm=458956,
audio_x_jam=458957,
audio_x_liveaudio=458958,
audio_x_m4a=458959,
audio_x_midi=458960,
audio_x_mod=458961,
audio_x_mp4a_latm=458962,
audio_x_mpeg_3=458963,
audio_x_mpequrl=458964,
audio_x_nspaudio=458965,
audio_x_pn_realaudio=458966,
audio_x_psid=458967,
audio_x_realaudio=458968,
audio_x_twinvq=458969,
audio_x_twinvq_plugin=458970,
audio_x_voc=458971,
audio_x_wav=458972,
audio_xm=458973,
font_otf=327902 | 0x20000000,
font_sfnt=327903 | 0x20000000,
font_woff=327904 | 0x20000000,
font_woff2=327905 | 0x20000000,
image_cmu_raster=524514,
image_fif=524515,
image_florian=524516,
image_g3fax=524517,
image_gif=524518,
image_ief=524519,
image_jpeg=524520,
image_jutvision=524521,
image_naplps=524522,
image_pict=524523,
image_png=524524,
image_svg=524525 | 0x80000000,
image_svg_xml=524526 | 0x80000000,
image_tiff=524527,
image_vnd_adobe_photoshop=524528 | 0x80000000,
image_vnd_djvu=524529 | 0x80000000,
image_vnd_fpx=524530,
image_vnd_microsoft_icon=524531,
image_vnd_rn_realflash=524532,
image_vnd_rn_realpix=524533,
image_vnd_wap_wbmp=524534,
image_vnd_xiff=524535,
image_webp=524536,
image_x_cmu_raster=524537,
image_x_cur=524538,
image_x_dwg=524539,
image_x_eps=524540,
image_x_exr=524541,
image_x_icns=524542,
image_x_icon=524543 | 0x80000000,
image_x_jg=524544,
image_x_jps=524545,
image_x_ms_bmp=524546,
image_x_niff=524547,
image_x_pcx=524548,
image_x_pict=524549,
image_x_portable_bitmap=524550,
image_x_portable_graymap=524551,
image_x_portable_pixmap=524552,
image_x_quicktime=524553,
image_x_rgb=524554,
image_x_tga=524555,
image_x_tiff=524556,
image_x_xcf=524557 | 0x80000000,
image_x_xpixmap=524558 | 0x80000000,
image_x_xwindowdump=524559,
message_rfc822=196880,
model_vnd_dwf=65809,
model_vnd_gdl=65810,
model_vnd_gs_gdl=65811,
model_vrml=65812,
model_x_pov=65813,
text_asp=590102,
text_css=590103,
text_html=590104,
text_javascript=590105,
text_mcf=590106,
text_pascal=590107,
text_plain=590108,
text_richtext=590109,
text_scriplet=590110,
text_tab_separated_values=590111,
text_troff=590112,
text_uri_list=590113,
text_vnd_abc=590114,
text_vnd_fmi_flexstor=590115,
text_vnd_wap_wml=590116,
text_vnd_wap_wmlscript=590117,
text_webviewhtml=590118,
text_x_Algol68=590119,
text_x_asm=590120,
text_x_audiosoft_intra=590121,
text_x_awk=590122,
text_x_bcpl=590123,
text_x_c=590124,
text_x_c__=590125,
text_x_component=590126,
text_x_diff=590127,
text_x_fortran=590128,
text_x_java=590129,
text_x_la_asf=590130,
text_x_lisp=590131,
text_x_m=590132,
text_x_m4=590133,
text_x_makefile=590134,
text_x_msdos_batch=590135,
text_x_pascal=590136,
text_x_perl=590137,
text_x_php=590138,
text_x_po=590139,
text_x_python=590140,
text_x_ruby=590141,
text_x_sass=590142,
text_x_scss=590143,
text_x_server_parsed_html=590144,
text_x_setext=590145,
text_x_sgml=590146,
text_x_shellscript=590147,
text_x_speech=590148,
text_x_tcl=590149,
text_x_tex=590150,
text_x_uil=590151,
text_x_uuencode=590152,
text_x_vcalendar=590153,
text_x_vcard=590154,
text_xml=590155,
video_animaflex=393548,
video_avi=393549,
video_avs_video=393550,
video_mp4=393551,
video_mpeg=393552,
video_quicktime=393553,
video_vdo=393554,
video_vivo=393555,
video_vnd_rn_realvideo=393556,
video_vosaic=393557,
video_webm=393558,
video_x_amt_demorun=393559,
video_x_amt_showrun=393560,
video_x_atomic3d_feature=393561,
video_x_dl=393562,
video_x_dv=393563,
video_x_fli=393564,
video_x_flv=393565,
video_x_isvideo=393566,
video_x_jng=393567 | 0x80000000,
video_x_matroska=393568,
video_x_mng=393569,
video_x_motion_jpeg=393570,
video_x_ms_asf=393571,
video_x_msvideo=393572,
video_x_qtc=393573,
video_x_sgi_movie=393574,
application_x_envoy=655479,
application_x_esrehber=655480,
application_x_excel=655481,
application_x_executable=655482,
application_x_font_gdos=655483,
application_x_font_pf2=655484,
application_x_font_pfm=655485,
application_x_font_sfn=655486,
application_x_font_ttf=655487 | 0x20000000,
application_x_freelance=655488,
application_x_gamecube_rom=655489,
application_x_gdbm=655490,
application_x_gettext_translation=655491,
application_x_git=655492,
application_x_gsp=655493,
application_x_gss=655494,
application_x_gtar=655495,
application_x_gzip=655496,
application_x_hdf=655497,
application_x_helpfile=655498,
application_x_httpd_imap=655499,
application_x_ima=655500,
application_x_innosetup=655501,
application_x_internett_signup=655502,
application_x_inventor=655503,
application_x_ip2=655504,
application_x_java_applet=655505,
application_x_java_commerce=655506,
application_x_java_image=655507,
application_x_java_jmod=655508,
application_x_java_keystore=655509,
application_x_kdelnk=655510,
application_x_koan=655511,
application_x_latex=655512,
application_x_livescreen=655513,
application_x_lotus=655514,
application_x_lz4=655515 | 0x08000000,
application_x_lz4_json=655516,
application_x_lzh=655517,
application_x_lzh_compressed=655518,
application_x_lzip=655519 | 0x08000000,
application_x_lzma=655520 | 0x08000000,
application_x_lzop=655521 | 0x08000000,
application_x_lzx=655522,
application_x_mach_binary=655523,
application_x_mach_executable=655524,
application_x_magic_cap_package_1_0=655525,
application_x_mathcad=655526,
application_x_maxis_dbpf=655527,
application_x_meme=655528,
application_x_midi=655529,
application_x_mif=655530,
application_x_mix_transfer=655531,
application_x_mobipocket_ebook=655532,
application_x_ms_compress_szdd=655533,
application_x_ms_pdb=655534,
application_x_ms_reader=655535,
application_x_msaccess=655536,
application_x_navi_animation=655537,
application_x_navidoc=655538,
application_x_navimap=655539,
application_x_navistyle=655540,
application_x_nes_rom=655541,
application_x_netcdf=655542,
application_x_newton_compatible_pkg=655543,
application_x_nintendo_ds_rom=655544,
application_x_object=655545,
application_x_omc=655546,
application_x_omcdatamaker=655547,
application_x_omcregerator=655548,
application_x_pagemaker=655549,
application_x_pcl=655550,
application_x_pgp_keyring=655551,
application_x_pixclscript=655552,
application_x_pkcs7_certreqresp=655553,
application_x_pkcs7_signature=655554,
application_x_project=655555,
application_x_qpro=655556,
application_x_rar=655557 | 0x10000000,
application_x_rpm=655558,
application_x_sdp=655559,
application_x_sea=655560,
application_x_seelogo=655561,
application_x_setupscript=655562,
application_x_shar=655563,
application_x_sharedlib=655564,
application_x_shockwave_flash=655565,
application_x_snappy_framed=655566,
application_x_sprite=655567,
application_x_sqlite3=655568,
application_x_sv4cpio=655569,
application_x_sv4crc=655570,
application_x_tar=655571 | 0x10000000,
application_x_tbook=655572,
application_x_terminfo=655573,
application_x_terminfo2=655574,
application_x_tex_tfm=655575,
application_x_texinfo=655576,
application_x_ustar=655577,
application_x_visio=655578,
application_x_vnd_audioexplosion_mzz=655579,
application_x_vnd_ls_xpix=655580,
application_x_vrml=655581,
application_x_wais_source=655582,
application_x_wine_extension_ini=655583,
application_x_wintalk=655584,
application_x_world=655585,
application_x_wri=655586,
application_x_x509_ca_cert=655587,
application_x_xz=655588 | 0x08000000,
application_x_zip=655589,
application_x_zstd=655590 | 0x08000000,
application_xml=655591,
application_zip=655592 | 0x10000000,
application_zlib=655593,
audio_it=458986,
audio_make=458987,
audio_mid=458988,
audio_midi=458989,
audio_mp4=458990,
audio_mpeg=458991,
audio_ogg=458992,
audio_s3m=458993,
audio_tsp_audio=458994,
audio_tsplayer=458995,
audio_vnd_qcelp=458996,
audio_voxware=458997,
audio_x_aiff=458998,
audio_x_flac=458999,
audio_x_gsm=459000,
audio_x_hx_aac_adts=459001,
audio_x_jam=459002,
audio_x_liveaudio=459003,
audio_x_m4a=459004,
audio_x_midi=459005,
audio_x_mod=459006,
audio_x_mp4a_latm=459007,
audio_x_mpeg_3=459008,
audio_x_mpequrl=459009,
audio_x_nspaudio=459010,
audio_x_pn_realaudio=459011,
audio_x_psid=459012,
audio_x_realaudio=459013,
audio_x_twinvq=459014,
audio_x_twinvq_plugin=459015,
audio_x_voc=459016,
audio_x_wav=459017,
audio_xm=459018,
font_otf=327947 | 0x20000000,
font_sfnt=327948 | 0x20000000,
font_woff=327949 | 0x20000000,
font_woff2=327950 | 0x20000000,
image_cmu_raster=524559,
image_fif=524560,
image_florian=524561,
image_g3fax=524562,
image_gif=524563,
image_heic=524564,
image_ief=524565,
image_jpeg=524566,
image_jutvision=524567,
image_naplps=524568,
image_pict=524569,
image_png=524570,
image_svg=524571 | 0x80000000,
image_svg_xml=524572 | 0x80000000,
image_tiff=524573,
image_vnd_adobe_photoshop=524574 | 0x80000000,
image_vnd_djvu=524575 | 0x80000000,
image_vnd_fpx=524576,
image_vnd_microsoft_icon=524577,
image_vnd_rn_realflash=524578,
image_vnd_rn_realpix=524579,
image_vnd_wap_wbmp=524580,
image_vnd_xiff=524581,
image_webp=524582,
image_wmf=524583,
image_x_3ds=524584,
image_x_cmu_raster=524585,
image_x_cur=524586,
image_x_dwg=524587,
image_x_eps=524588,
image_x_exr=524589,
image_x_gem=524590,
image_x_icns=524591,
image_x_icon=524592 | 0x80000000,
image_x_jg=524593,
image_x_jps=524594,
image_x_ms_bmp=524595,
image_x_niff=524596,
image_x_pcx=524597,
image_x_pict=524598,
image_x_portable_bitmap=524599,
image_x_portable_graymap=524600,
image_x_portable_pixmap=524601,
image_x_quicktime=524602,
image_x_rgb=524603,
image_x_tga=524604,
image_x_tiff=524605,
image_x_win_bitmap=524606,
image_x_xcf=524607 | 0x80000000,
image_x_xpixmap=524608 | 0x80000000,
image_x_xwindowdump=524609,
message_news=196930,
message_rfc822=196931,
model_vnd_dwf=65860,
model_vnd_gdl=65861,
model_vnd_gs_gdl=65862,
model_vrml=65863,
model_x_pov=65864,
text_PGP=590153,
text_asp=590154,
text_css=590155,
text_html=590156,
text_javascript=590157,
text_mcf=590158,
text_pascal=590159,
text_plain=590160,
text_richtext=590161,
text_rtf=590162,
text_scriplet=590163,
text_tab_separated_values=590164,
text_troff=590165,
text_uri_list=590166,
text_vnd_abc=590167,
text_vnd_fmi_flexstor=590168,
text_vnd_wap_wml=590169,
text_vnd_wap_wmlscript=590170,
text_webviewhtml=590171,
text_x_Algol68=590172,
text_x_asm=590173,
text_x_audiosoft_intra=590174,
text_x_awk=590175,
text_x_bcpl=590176,
text_x_c=590177,
text_x_c__=590178,
text_x_component=590179,
text_x_diff=590180,
text_x_fortran=590181,
text_x_java=590182,
text_x_la_asf=590183,
text_x_lisp=590184,
text_x_m=590185,
text_x_m4=590186,
text_x_makefile=590187,
text_x_ms_regedit=590188,
text_x_msdos_batch=590189,
text_x_objective_c=590190,
text_x_pascal=590191,
text_x_perl=590192,
text_x_php=590193,
text_x_po=590194,
text_x_python=590195,
text_x_ruby=590196,
text_x_sass=590197,
text_x_scss=590198,
text_x_server_parsed_html=590199,
text_x_setext=590200,
text_x_sgml=590201,
text_x_shellscript=590202,
text_x_speech=590203,
text_x_tcl=590204,
text_x_tex=590205,
text_x_uil=590206,
text_x_uuencode=590207,
text_x_vcalendar=590208,
text_x_vcard=590209,
text_xml=590210,
video_MP2T=393603,
video_animaflex=393604,
video_avi=393605,
video_avs_video=393606,
video_mp4=393607,
video_mpeg=393608,
video_quicktime=393609,
video_vdo=393610,
video_vivo=393611,
video_vnd_rn_realvideo=393612,
video_vosaic=393613,
video_webm=393614,
video_x_amt_demorun=393615,
video_x_amt_showrun=393616,
video_x_atomic3d_feature=393617,
video_x_dl=393618,
video_x_dv=393619,
video_x_fli=393620,
video_x_flv=393621,
video_x_isvideo=393622,
video_x_jng=393623 | 0x80000000,
video_x_m4v=393624,
video_x_matroska=393625,
video_x_mng=393626,
video_x_motion_jpeg=393627,
video_x_ms_asf=393628,
video_x_msvideo=393629,
video_x_qtc=393630,
video_x_sgi_movie=393631,
x_epoc_x_sisx_app=721312,
};
char *mime_get_mime_text(unsigned int mime_id) {switch (mime_id) {
case application_arj: return "application/arj";
@@ -622,6 +680,7 @@ case text_mcf: return "text/mcf";
case text_pascal: return "text/pascal";
case text_plain: return "text/plain";
case text_richtext: return "text/richtext";
case text_rtf: return "text/rtf";
case text_scriplet: return "text/scriplet";
case text_x_awk: return "text/x-awk";
case video_x_jng: return "video/x-jng";
@@ -724,6 +783,63 @@ case application_x_innosetup: return "application/x-innosetup";
case application_winhelp: return "application/winhelp";
case image_x_tga: return "image/x-tga";
case application_x_wine_extension_ini: return "application/x-wine-extension-ini";
case application_x_cbz: return "application/x-cbz";
case application_x_cbr: return "application/x-cbr";
case application_x_ms_compress_szdd: return "application/x-ms-compress-szdd";
case application_x_atari_7800_rom: return "application/x-atari-7800-rom";
case application_x_nes_rom: return "application/x-nes-rom";
case application_x_font_pfm: return "application/x-font-pfm";
case application_x_gettext_translation: return "application/x-gettext-translation";
case image_wmf: return "image/wmf";
case application_pgp_keys: return "application/pgp-keys";
case image_x_3ds: return "image/x-3ds";
case application_x_lz4: return "application/x-lz4";
case application_vnd_openxmlformats_officedocument_presentationml_presentation: return "application/vnd.openxmlformats-officedocument.presentationml.presentation";
case application_vnd_oasis_opendocument_presentation: return "application/vnd.oasis.opendocument.presentation";
case application_x_msaccess: return "application/x-msaccess";
case application_vnd_oasis_opendocument_spreadsheet: return "application/vnd.oasis.opendocument.spreadsheet";
case audio_x_aiff: return "audio/x-aiff";
case text_x_ms_regedit: return "text/x-ms-regedit";
case application_x_gamecube_rom: return "application/x-gamecube-rom";
case application_x_nintendo_ds_rom: return "application/x-nintendo-ds-rom";
case text_x_objective_c: return "text/x-objective-c";
case application_x_font_gdos: return "application/x-font-gdos";
case application_x_apple_diskimage: return "application/x-apple-diskimage";
case application_x_zstd: return "application/x-zstd";
case video_x_m4v: return "video/x-m4v";
case message_news: return "message/news";
case application_vnd_symbian_install: return "application/vnd.symbian.install";
case application_x_lzh_compressed: return "application/x-lzh-compressed";
case application_x_dosdriver: return "application/x-dosdriver";
case application_vnd_tcpdump_pcap: return "application/vnd.tcpdump.pcap";
case x_epoc_x_sisx_app: return "x-epoc/x-sisx-app";
case application_x_avira_qua: return "application/x-avira-qua";
case video_MP2T: return "video/MP2T";
case application_x_snappy_framed: return "application/x-snappy-framed";
case application_x_lz4_json: return "application/x-lz4+json";
case application_x_dmp: return "application/x-dmp";
case application_zlib: return "application/zlib";
case application_x_pgp_keyring: return "application/x-pgp-keyring";
case application_x_gdbm: return "application/x-gdbm";
case application_x_font_pf2: return "application/x-font-pf2";
case application_x_zip: return "application/x-zip";
case application_x_coredump: return "application/x-coredump";
case application_x_java_jmod: return "application/x-java-jmod";
case application_x_terminfo: return "application/x-terminfo";
case application_x_terminfo2: return "application/x-terminfo2";
case application_x_arc: return "application/x-arc";
case application_vnd_lotus_1_2_3: return "application/vnd.lotus-1-2-3";
case image_x_win_bitmap: return "image/x-win-bitmap";
case application_x_maxis_dbpf: return "application/x-maxis-dbpf";
case text_PGP: return "text/PGP";
case audio_x_hx_aac_adts: return "audio/x-hx-aac-adts";
case application_x_chrome_extension: return "application/x-chrome-extension";
case image_heic: return "image/heic";
case image_x_gem: return "image/x-gem";
case application_x_lzma: return "application/x-lzma";
case application_warc: return "application/warc";
case application_x_lzip: return "application/x-lzip";
case application_x_lzop: return "application/x-lzop";
default: return NULL;}}
GHashTable *mime_get_ext_table() {GHashTable *ext_table = g_hash_table_new(g_str_hash, g_str_equal);
g_hash_table_insert(ext_table, "arj", (gpointer)application_arj);
@@ -853,6 +969,7 @@ g_hash_table_insert(ext_table, "xlt", (gpointer)application_x_excel);
g_hash_table_insert(ext_table, "xlv", (gpointer)application_x_excel);
g_hash_table_insert(ext_table, "exe", (gpointer)application_x_executable);
g_hash_table_insert(ext_table, "ttf", (gpointer)application_x_font_ttf);
g_hash_table_insert(ext_table, "ttc", (gpointer)application_x_font_ttf);
g_hash_table_insert(ext_table, "pre", (gpointer)application_x_freelance);
g_hash_table_insert(ext_table, "gsp", (gpointer)application_x_gsp);
g_hash_table_insert(ext_table, "gss", (gpointer)application_x_gss);
@@ -1073,6 +1190,11 @@ g_hash_table_insert(ext_table, "d", (gpointer)text_plain);
g_hash_table_insert(ext_table, "cs", (gpointer)text_plain);
g_hash_table_insert(ext_table, "hpp", (gpointer)text_plain);
g_hash_table_insert(ext_table, "srt", (gpointer)text_plain);
g_hash_table_insert(ext_table, "nfo", (gpointer)text_plain);
g_hash_table_insert(ext_table, "sfv", (gpointer)text_plain);
g_hash_table_insert(ext_table, "m3u", (gpointer)text_plain);
g_hash_table_insert(ext_table, "csv", (gpointer)text_plain);
g_hash_table_insert(ext_table, "eml", (gpointer)text_plain);
g_hash_table_insert(ext_table, "rt", (gpointer)text_richtext);
g_hash_table_insert(ext_table, "rtf", (gpointer)text_richtext);
g_hash_table_insert(ext_table, "rtx", (gpointer)text_richtext);
@@ -1090,7 +1212,7 @@ g_hash_table_insert(ext_table, "ms", (gpointer)text_troff);
g_hash_table_insert(ext_table, "roff", (gpointer)text_troff);
g_hash_table_insert(ext_table, "t", (gpointer)text_troff);
g_hash_table_insert(ext_table, "tr", (gpointer)text_troff);
g_hash_table_insert(ext_table, "uni", (gpointer)text_uri_list);
g_hash_table_insert(ext_table, "uji", (gpointer)text_uri_list);
g_hash_table_insert(ext_table, "unis", (gpointer)text_uri_list);
g_hash_table_insert(ext_table, "uri", (gpointer)text_uri_list);
g_hash_table_insert(ext_table, "uris", (gpointer)text_uri_list);
@@ -1171,6 +1293,7 @@ g_hash_table_insert(ext_table, "isu", (gpointer)video_x_isvideo);
g_hash_table_insert(ext_table, "mjpg", (gpointer)video_x_motion_jpeg);
g_hash_table_insert(ext_table, "asf", (gpointer)video_x_ms_asf);
g_hash_table_insert(ext_table, "asx", (gpointer)video_x_ms_asf);
g_hash_table_insert(ext_table, "wmv", (gpointer)video_x_ms_asf);
g_hash_table_insert(ext_table, "qtc", (gpointer)video_x_qtc);
g_hash_table_insert(ext_table, "movie", (gpointer)video_x_sgi_movie);
g_hash_table_insert(ext_table, "mv", (gpointer)video_x_sgi_movie);
@@ -1201,6 +1324,34 @@ g_hash_table_insert(ext_table, "djvu", (gpointer)image_vnd_djvu);
g_hash_table_insert(ext_table, "lit", (gpointer)application_x_ms_reader);
g_hash_table_insert(ext_table, "vcf", (gpointer)text_x_vcard);
g_hash_table_insert(ext_table, "hlp", (gpointer)application_winhelp);
g_hash_table_insert(ext_table, "cbz", (gpointer)application_x_cbz);
g_hash_table_insert(ext_table, "cbr", (gpointer)application_x_cbr);
g_hash_table_insert(ext_table, "fon", (gpointer)application_x_ms_compress_szdd);
g_hash_table_insert(ext_table, "a78", (gpointer)application_x_atari_7800_rom);
g_hash_table_insert(ext_table, "nes", (gpointer)application_x_nes_rom);
g_hash_table_insert(ext_table, "pfm", (gpointer)application_x_font_pfm);
g_hash_table_insert(ext_table, "3ds", (gpointer)image_x_3ds);
g_hash_table_insert(ext_table, "lz4", (gpointer)application_x_lz4);
g_hash_table_insert(ext_table, "pptx", (gpointer)application_vnd_openxmlformats_officedocument_presentationml_presentation);
g_hash_table_insert(ext_table, "odp", (gpointer)application_vnd_oasis_opendocument_presentation);
g_hash_table_insert(ext_table, "accdb", (gpointer)application_x_msaccess);
g_hash_table_insert(ext_table, "ods", (gpointer)application_vnd_oasis_opendocument_spreadsheet);
g_hash_table_insert(ext_table, "aiff", (gpointer)audio_x_aiff);
g_hash_table_insert(ext_table, "aif", (gpointer)audio_x_aiff);
g_hash_table_insert(ext_table, "reg", (gpointer)text_x_ms_regedit);
g_hash_table_insert(ext_table, "zst", (gpointer)application_x_zstd);
g_hash_table_insert(ext_table, "m4v", (gpointer)video_x_m4v);
g_hash_table_insert(ext_table, "pcap", (gpointer)application_vnd_tcpdump_pcap);
g_hash_table_insert(ext_table, "jsonlz4", (gpointer)application_x_lz4_json);
g_hash_table_insert(ext_table, "dmp", (gpointer)application_x_dmp);
g_hash_table_insert(ext_table, "z", (gpointer)application_zlib);
g_hash_table_insert(ext_table, "pf2", (gpointer)application_x_font_pf2);
g_hash_table_insert(ext_table, "jmod", (gpointer)application_x_java_jmod);
g_hash_table_insert(ext_table, "heic", (gpointer)image_heic);
g_hash_table_insert(ext_table, "lzma", (gpointer)application_x_lzma);
g_hash_table_insert(ext_table, "warc", (gpointer)application_warc);
g_hash_table_insert(ext_table, "lz", (gpointer)application_x_lzip);
g_hash_table_insert(ext_table, "lzo", (gpointer)application_x_lzop);
return ext_table;}
GHashTable *mime_get_mime_table() {GHashTable *mime_table = g_hash_table_new(g_str_hash, g_str_equal);
g_hash_table_insert(mime_table, "application/arj", (gpointer)application_arj);
@@ -1459,6 +1610,7 @@ g_hash_table_insert(mime_table, "text/mcf", (gpointer)text_mcf);
g_hash_table_insert(mime_table, "text/pascal", (gpointer)text_pascal);
g_hash_table_insert(mime_table, "text/plain", (gpointer)text_plain);
g_hash_table_insert(mime_table, "text/richtext", (gpointer)text_richtext);
g_hash_table_insert(mime_table, "text/rtf", (gpointer)text_rtf);
g_hash_table_insert(mime_table, "text/scriplet", (gpointer)text_scriplet);
g_hash_table_insert(mime_table, "text/x-awk", (gpointer)text_x_awk);
g_hash_table_insert(mime_table, "video/x-jng", (gpointer)video_x_jng);
@@ -1561,5 +1713,62 @@ g_hash_table_insert(mime_table, "application/x-innosetup", (gpointer)application
g_hash_table_insert(mime_table, "application/winhelp", (gpointer)application_winhelp);
g_hash_table_insert(mime_table, "image/x-tga", (gpointer)image_x_tga);
g_hash_table_insert(mime_table, "application/x-wine-extension-ini", (gpointer)application_x_wine_extension_ini);
g_hash_table_insert(mime_table, "application/x-cbz", (gpointer)application_x_cbz);
g_hash_table_insert(mime_table, "application/x-cbr", (gpointer)application_x_cbr);
g_hash_table_insert(mime_table, "application/x-ms-compress-szdd", (gpointer)application_x_ms_compress_szdd);
g_hash_table_insert(mime_table, "application/x-atari-7800-rom", (gpointer)application_x_atari_7800_rom);
g_hash_table_insert(mime_table, "application/x-nes-rom", (gpointer)application_x_nes_rom);
g_hash_table_insert(mime_table, "application/x-font-pfm", (gpointer)application_x_font_pfm);
g_hash_table_insert(mime_table, "application/x-gettext-translation", (gpointer)application_x_gettext_translation);
g_hash_table_insert(mime_table, "image/wmf", (gpointer)image_wmf);
g_hash_table_insert(mime_table, "application/pgp-keys", (gpointer)application_pgp_keys);
g_hash_table_insert(mime_table, "image/x-3ds", (gpointer)image_x_3ds);
g_hash_table_insert(mime_table, "application/x-lz4", (gpointer)application_x_lz4);
g_hash_table_insert(mime_table, "application/vnd.openxmlformats-officedocument.presentationml.presentation", (gpointer)application_vnd_openxmlformats_officedocument_presentationml_presentation);
g_hash_table_insert(mime_table, "application/vnd.oasis.opendocument.presentation", (gpointer)application_vnd_oasis_opendocument_presentation);
g_hash_table_insert(mime_table, "application/x-msaccess", (gpointer)application_x_msaccess);
g_hash_table_insert(mime_table, "application/vnd.oasis.opendocument.spreadsheet", (gpointer)application_vnd_oasis_opendocument_spreadsheet);
g_hash_table_insert(mime_table, "audio/x-aiff", (gpointer)audio_x_aiff);
g_hash_table_insert(mime_table, "text/x-ms-regedit", (gpointer)text_x_ms_regedit);
g_hash_table_insert(mime_table, "application/x-gamecube-rom", (gpointer)application_x_gamecube_rom);
g_hash_table_insert(mime_table, "application/x-nintendo-ds-rom", (gpointer)application_x_nintendo_ds_rom);
g_hash_table_insert(mime_table, "text/x-objective-c", (gpointer)text_x_objective_c);
g_hash_table_insert(mime_table, "application/x-font-gdos", (gpointer)application_x_font_gdos);
g_hash_table_insert(mime_table, "application/x-apple-diskimage", (gpointer)application_x_apple_diskimage);
g_hash_table_insert(mime_table, "application/x-zstd", (gpointer)application_x_zstd);
g_hash_table_insert(mime_table, "video/x-m4v", (gpointer)video_x_m4v);
g_hash_table_insert(mime_table, "message/news", (gpointer)message_news);
g_hash_table_insert(mime_table, "application/vnd.symbian.install", (gpointer)application_vnd_symbian_install);
g_hash_table_insert(mime_table, "application/x-lzh-compressed", (gpointer)application_x_lzh_compressed);
g_hash_table_insert(mime_table, "application/x-dosdriver", (gpointer)application_x_dosdriver);
g_hash_table_insert(mime_table, "application/vnd.tcpdump.pcap", (gpointer)application_vnd_tcpdump_pcap);
g_hash_table_insert(mime_table, "x-epoc/x-sisx-app", (gpointer)x_epoc_x_sisx_app);
g_hash_table_insert(mime_table, "application/x-avira-qua", (gpointer)application_x_avira_qua);
g_hash_table_insert(mime_table, "video/MP2T", (gpointer)video_MP2T);
g_hash_table_insert(mime_table, "application/x-snappy-framed", (gpointer)application_x_snappy_framed);
g_hash_table_insert(mime_table, "application/x-lz4+json", (gpointer)application_x_lz4_json);
g_hash_table_insert(mime_table, "application/x-dmp", (gpointer)application_x_dmp);
g_hash_table_insert(mime_table, "application/zlib", (gpointer)application_zlib);
g_hash_table_insert(mime_table, "application/x-pgp-keyring", (gpointer)application_x_pgp_keyring);
g_hash_table_insert(mime_table, "application/x-gdbm", (gpointer)application_x_gdbm);
g_hash_table_insert(mime_table, "application/x-font-pf2", (gpointer)application_x_font_pf2);
g_hash_table_insert(mime_table, "application/x-zip", (gpointer)application_x_zip);
g_hash_table_insert(mime_table, "application/x-coredump", (gpointer)application_x_coredump);
g_hash_table_insert(mime_table, "application/x-java-jmod", (gpointer)application_x_java_jmod);
g_hash_table_insert(mime_table, "application/x-terminfo", (gpointer)application_x_terminfo);
g_hash_table_insert(mime_table, "application/x-terminfo2", (gpointer)application_x_terminfo2);
g_hash_table_insert(mime_table, "application/x-arc", (gpointer)application_x_arc);
g_hash_table_insert(mime_table, "application/vnd.lotus-1-2-3", (gpointer)application_vnd_lotus_1_2_3);
g_hash_table_insert(mime_table, "image/x-win-bitmap", (gpointer)image_x_win_bitmap);
g_hash_table_insert(mime_table, "application/x-maxis-dbpf", (gpointer)application_x_maxis_dbpf);
g_hash_table_insert(mime_table, "text/PGP", (gpointer)text_PGP);
g_hash_table_insert(mime_table, "audio/x-hx-aac-adts", (gpointer)audio_x_hx_aac_adts);
g_hash_table_insert(mime_table, "application/x-chrome-extension", (gpointer)application_x_chrome_extension);
g_hash_table_insert(mime_table, "image/heic", (gpointer)image_heic);
g_hash_table_insert(mime_table, "image/x-gem", (gpointer)image_x_gem);
g_hash_table_insert(mime_table, "application/x-lzma", (gpointer)application_x_lzma);
g_hash_table_insert(mime_table, "application/warc", (gpointer)application_warc);
g_hash_table_insert(mime_table, "application/x-lzip", (gpointer)application_x_lzip);
g_hash_table_insert(mime_table, "application/x-lzop", (gpointer)application_x_lzop);
return mime_table;}
#endif

View File

@@ -1,9 +1,30 @@
#include "src/sist.h"
#include "src/ctx.h"
__thread magic_t Magic;
__thread magic_t Magic = NULL;
void *read_all(parse_job_t *job, const char *buf, int bytes_read, int *fd) {
int fs_read(struct vfile *f, void *buf, size_t size) {
if (f->fd == -1) {
f->fd = open(f->filepath, O_RDONLY);
if (f->fd == -1) {
LOG_ERRORF(f->filepath, "open(): [%d] %s", errno, strerror(errno))
return -1;
}
}
return read(f->fd, buf, size);
}
#define CLOSE_FILE(f) if (f.close != NULL) {f.close(&f);};
void fs_close(struct vfile *f) {
if (f->fd != -1) {
close(f->fd);
}
}
void *read_all(parse_job_t *job, const char *buf, int bytes_read) {
void *full_buf;
@@ -11,20 +32,13 @@ void *read_all(parse_job_t *job, const char *buf, int bytes_read, int *fd) {
full_buf = malloc(job->info.st_size);
memcpy(full_buf, buf, job->info.st_size);
} else {
if (*fd == -1) {
*fd = open(job->filepath, O_RDONLY);
if (*fd == -1) {
perror("open");
printf("%s\n", job->filepath);
free(job);
return NULL;
}
}
full_buf = malloc(job->info.st_size);
memcpy(full_buf, buf, bytes_read);
int ret = read(*fd, full_buf + bytes_read, job->info.st_size - bytes_read);
int ret = job->vfile.read(&job->vfile, full_buf + bytes_read, job->info.st_size - bytes_read);
if (ret == -1) {
perror("read");
LOG_ERRORF(job->filepath, "read(): [%d] %s", errno, strerror(errno))
return NULL;
}
}
@@ -36,15 +50,14 @@ void parse(void *arg) {
parse_job_t *job = arg;
document_t doc;
if (incremental_get(ScanCtx.original_table, job->info.st_ino) == job->info.st_mtim.tv_sec) {
int inc_ts = incremental_get(ScanCtx.original_table, job->info.st_ino);
if (inc_ts != 0 && inc_ts == job->info.st_mtim.tv_sec) {
incremental_mark_file_for_copy(ScanCtx.copy_table, job->info.st_ino);
free(job);
return;
}
if (Magic == NULL) {
Magic = magic_open(MAGIC_MIME_TYPE);
magic_load(Magic, NULL);
}
doc.filepath = job->filepath;
@@ -57,34 +70,37 @@ void parse(void *arg) {
doc.ino = job->info.st_ino;
doc.mtime = job->info.st_mtim.tv_sec;
uuid_generate_time_safe(doc.uuid);
uuid_generate(doc.uuid);
char *buf[PARSE_BUF_SIZE];
if (LogCtx.very_verbose) {
char uuid_str[UUID_STR_LEN];
uuid_unparse(doc.uuid, uuid_str);
LOG_DEBUGF(job->filepath, "Starting parse job {%s}", uuid_str)
}
if (job->info.st_size == 0) {
doc.mime = MIME_EMPTY;
} else if (*(job->filepath + job->ext) != '\0') {
} else if (*(job->filepath + job->ext) != '\0' && (job->ext - job->base != 1)) {
doc.mime = mime_get_mime_by_ext(ScanCtx.ext_table, job->filepath + job->ext);
}
int fd = -1;
int bytes_read = 0;
if (doc.mime == 0) {
// Get mime type with libmagic
fd = open(job->filepath, O_RDONLY);
if (fd == -1) {
perror("open");
free(job);
bytes_read = job->vfile.read(&job->vfile, buf, PARSE_BUF_SIZE);
if (bytes_read == -1) {
LOG_WARNINGF(job->filepath, "read() Error: %s", strerror(errno))
CLOSE_FILE(job->vfile)
return;
}
bytes_read = read(fd, buf, PARSE_BUF_SIZE);
const char *magic_mime_str = magic_buffer(Magic, buf, bytes_read);
if (magic_mime_str != NULL) {
doc.mime = mime_get_mime_by_string(ScanCtx.mime_table, magic_mime_str);
if (doc.mime == 0) {
fprintf(stderr, "Couldn't find mime %s, %s!\n", magic_mime_str, job->filepath + job->base);
LOG_WARNINGF(job->filepath, "Couldn't find mime %s", magic_mime_str);
}
}
}
@@ -93,34 +109,60 @@ void parse(void *arg) {
if (!(SHOULD_PARSE(doc.mime))) {
} else if ((mmime == MimeVideo && doc.size >= MIN_VIDEO_SIZE) || mmime == MimeAudio || mmime == MimeImage) {
parse_media(job->filepath, &doc);
} else if ((mmime == MimeVideo && doc.size >= MIN_VIDEO_SIZE) ||
(mmime == MimeImage && doc.size >= MIN_IMAGE_SIZE) || mmime == MimeAudio) {
if (job->vfile.is_fs_file) {
parse_media_filename(job->filepath, &doc);
} else {
parse_media_vfile(&job->vfile, &doc);
}
} else if (IS_PDF(doc.mime)) {
void *pdf_buf = read_all(job, (char *) buf, bytes_read, &fd);
void *pdf_buf = read_all(job, (char *) buf, bytes_read);
parse_pdf(pdf_buf, doc.size, &doc);
if (pdf_buf != buf) {
if (pdf_buf != buf && pdf_buf != NULL) {
free(pdf_buf);
}
} else if (mmime == MimeText && ScanCtx.content_size > 0) {
parse_text(bytes_read, &fd, (char *) buf, &doc);
parse_text(bytes_read, &job->vfile, (char *) buf, &doc);
} else if (IS_FONT(doc.mime)) {
void *font_buf = read_all(job, (char *) buf, bytes_read, &fd);
void *font_buf = read_all(job, (char *) buf, bytes_read);
parse_font(font_buf, doc.size, &doc);
if (font_buf != buf) {
if (font_buf != buf && font_buf != NULL) {
free(font_buf);
}
} else if (
ScanCtx.archive_mode != ARC_MODE_SKIP && (
IS_ARC(doc.mime) ||
(IS_ARC_FILTER(doc.mime) && should_parse_filtered_file(doc.filepath, doc.ext))
)) {
parse_archive(&job->vfile, &doc);
} else if (ScanCtx.content_size > 0 && IS_DOC(doc.mime)) {
void *doc_buf = read_all(job, (char *) buf, bytes_read);
parse_doc(doc_buf, doc.size, &doc);
if (doc_buf != buf && doc_buf != NULL) {
free(doc_buf);
}
}
//Parent meta
if (!uuid_is_null(job->parent)) {
char tmp[UUID_STR_LEN];
uuid_unparse(job->parent, tmp);
meta_line_t *meta_parent = malloc(sizeof(meta_line_t) + UUID_STR_LEN + 1);
meta_parent->key = MetaParent;
strcpy(meta_parent->strval, tmp);
APPEND_META((&doc), meta_parent)
}
write_document(&doc);
if (fd != -1) {
close(fd);
}
free(job);
CLOSE_FILE(job->vfile)
}

View File

@@ -5,6 +5,9 @@
#define PARSE_BUF_SIZE 4096
int fs_read(struct vfile *f, void *buf, size_t size);
void fs_close(struct vfile *f);
void parse(void *arg);
#endif

View File

@@ -1,10 +1,28 @@
#include <src/ctx.h>
#include "pdf.h"
#include "src/ctx.h"
#define MIN_OCR_SIZE 350
#define MIN_OCR_LEN 10
__thread text_buffer_t thread_buffer;
fz_page *render_cover(fz_context *ctx, document_t *doc, fz_document *fzdoc) {
fz_page *cover = fz_load_page(ctx, fzdoc, 0);
int err = 0;
fz_page *cover = NULL;
fz_var(cover);
fz_try(ctx)
cover = fz_load_page(ctx, fzdoc, 0);
fz_catch(ctx)
err = 1;
if (err != 0) {
fz_drop_page(ctx, cover);
LOG_WARNINGF(doc->filepath, "fz_load_page() returned error code [%d] %s", err, ctx->error.message)
return NULL;
}
fz_rect bounds = fz_bound_page(ctx, cover);
float scale;
@@ -24,128 +42,299 @@ fz_page *render_cover(fz_context *ctx, document_t *doc, fz_document *fzdoc) {
fz_clear_pixmap_with_value(ctx, pixmap, 0xFF);
fz_device *dev = fz_new_draw_device(ctx, m, pixmap);
pthread_mutex_lock(&ScanCtx.mupdf_mu);
fz_var(err);
fz_try(ctx)
{
pthread_mutex_lock(&ScanCtx.mupdf_mu);
fz_run_page(ctx, cover, dev, fz_identity, NULL);
}
fz_always(ctx)
{
fz_close_device(ctx, dev);
fz_drop_device(ctx, dev);
pthread_mutex_unlock(&ScanCtx.mupdf_mu);
}
fz_catch(ctx)
fz_rethrow(ctx);
err = ctx->error.errcode;
fz_drop_device(ctx, dev);
if (err != 0) {
LOG_WARNINGF(doc->filepath, "fz_run_page() returned error code [%d] %s", err, ctx->error.message)
fz_drop_page(ctx, cover);
fz_drop_pixmap(ctx, pixmap);
return NULL;
}
fz_buffer *fzbuf = fz_new_buffer_from_pixmap_as_png(ctx, pixmap, fz_default_color_params);
unsigned char *tn_buf;
size_t tn_len = fz_buffer_storage(ctx, fzbuf, &tn_buf);
fz_buffer *fzbuf = NULL;
fz_var(fzbuf);
fz_var(err);
store_write(ScanCtx.index.store, (char *) doc->uuid, sizeof(doc->uuid), (char *) tn_buf, tn_len);
fz_try(ctx)
fzbuf = fz_new_buffer_from_pixmap_as_png(ctx, pixmap, fz_default_color_params);
fz_catch(ctx)
err = ctx->error.errcode;
if (err == 0) {
unsigned char *tn_buf;
size_t tn_len = fz_buffer_storage(ctx, fzbuf, &tn_buf);
store_write(ScanCtx.index.store, (char *) doc->uuid, sizeof(doc->uuid), (char *) tn_buf, tn_len);
}
fz_drop_pixmap(ctx, pixmap);
fz_drop_buffer(ctx, fzbuf);
fz_drop_pixmap(ctx, pixmap);
if (err != 0) {
LOG_WARNINGF(doc->filepath, "fz_new_buffer_from_pixmap_as_png() returned error code [%d] %s", err,
ctx->error.message)
fz_drop_page(ctx, cover);
return NULL;
}
return cover;
}
void fz_noop_callback(__attribute__((unused)) void *user, __attribute__((unused)) const char *message) {}
void fz_err_callback(void *user, UNUSED(const char *message)) {
if (LogCtx.verbose) {
document_t *doc = (document_t *) user;
LOG_WARNINGF(doc->filepath, "FZ: %s", message)
}
}
__always_inline
void init_ctx(fz_context *ctx, document_t *doc) {
fz_disable_icc(ctx);
fz_register_document_handlers(ctx);
ctx->warn.print_user = doc;
ctx->warn.print = fz_err_callback;
ctx->error.print_user = doc;
ctx->error.print = fz_err_callback;
}
int read_stext_block(fz_stext_block *block, text_buffer_t *tex) {
if (block->type != FZ_STEXT_BLOCK_TEXT) {
return 0;
}
fz_stext_line *line = block->u.t.first_line;
while (line != NULL) {
fz_stext_char *c = line->first_char;
while (c != NULL) {
if (text_buffer_append_char(tex, c->c) == TEXT_BUF_FULL) {
return TEXT_BUF_FULL;
}
c = c->next;
}
line = line->next;
}
return 0;
}
#define IS_VALID_BPP(d) (d==1 || d==2 || d==4 || d==8 || d==16 || d==24 || d==32)
void fill_image(fz_context *ctx, UNUSED(fz_device *dev),
fz_image *img, UNUSED(fz_matrix ctm), UNUSED(float alpha),
UNUSED(fz_color_params color_params)) {
int l2factor = 0;
if (img->w > MIN_OCR_SIZE && img->h > MIN_OCR_SIZE && IS_VALID_BPP(img->n)) {
fz_pixmap *pix = img->get_pixmap(ctx, img, NULL, img->w, img->h, &l2factor);
if (pix->h > MIN_OCR_SIZE && img->h > MIN_OCR_SIZE && img->xres != 0) {
TessBaseAPI *api = TessBaseAPICreate();
TessBaseAPIInit3(api, ScanCtx.tesseract_path, ScanCtx.tesseract_lang);
TessBaseAPISetImage(api, pix->samples, pix->w, pix->h, pix->n, pix->stride);
TessBaseAPISetSourceResolution(api, pix->xres);
char *text = TessBaseAPIGetUTF8Text(api);
size_t len = strlen(text);
if (len >= MIN_OCR_LEN) {
text_buffer_append_string(&thread_buffer, text, len - 1);
LOG_DEBUGF(
"pdf.c",
"(OCR) %dx%d got %dB from tesseract (%s), buffer:%dB",
pix->w, pix->h, len, ScanCtx.tesseract_lang, thread_buffer.dyn_buffer.cur
)
}
TessBaseAPIEnd(api);
TessBaseAPIDelete(api);
}
fz_drop_pixmap(ctx, pix);
}
}
void parse_pdf(void *buf, size_t buf_len, document_t *doc) {
if (buf == NULL) {
return;
}
static int mu_is_initialized = 0;
if (!mu_is_initialized) {
pthread_mutex_init(&ScanCtx.mupdf_mu, NULL);
mu_is_initialized = 1;
}
fz_context *ctx = fz_new_context(NULL, NULL, FZ_STORE_UNLIMITED);
fz_stream *stream = NULL;
fz_document *fzdoc = NULL;
fz_var(stream);
init_ctx(ctx, doc);
int err = 0;
fz_document *fzdoc = NULL;
fz_stream *stream = NULL;
fz_var(fzdoc);
fz_var(stream);
fz_var(err);
fz_try(ctx)
{
fz_disable_icc(ctx);
fz_register_document_handlers(ctx);
//disable warnings
ctx->warn.print = fz_noop_callback;
ctx->error.print = fz_noop_callback;
stream = fz_open_memory(ctx, buf, buf_len);
fzdoc = fz_open_document_with_stream(ctx, mime_get_mime_text(doc->mime), stream);
}
fz_catch(ctx)
err = ctx->error.errcode;
int page_count = fz_count_pages(ctx, fzdoc);
if (err) {
fz_drop_stream(ctx, stream);
fz_drop_document(ctx, fzdoc);
fz_drop_context(ctx);
return;
}
fz_page *cover = render_cover(ctx, doc, fzdoc);
char title[4096] = {'\0',};
fz_try(ctx)
fz_lookup_metadata(ctx, fzdoc, FZ_META_INFO_TITLE, title, sizeof(title));
fz_catch(ctx)
;
fz_stext_options opts;
if (strlen(title) > 0) {
meta_line_t *meta_content = malloc(sizeof(meta_line_t) + strlen(title));
meta_content->key = MetaTitle;
strcpy(meta_content->strval, title);
APPEND_META(doc, meta_content)
}
text_buffer_t text_buf = text_buffer_create(ScanCtx.content_size);
int page_count = -1;
fz_var(err);
fz_try(ctx)
page_count = fz_count_pages(ctx, fzdoc);
fz_catch(ctx)
err = ctx->error.errcode;
if (err) {
LOG_WARNINGF(doc->filepath, "fz_count_pages() returned error code [%d] %s", err, ctx->error.message)
fz_drop_stream(ctx, stream);
fz_drop_document(ctx, fzdoc);
fz_drop_context(ctx);
return;
}
fz_page *cover = NULL;
if (ScanCtx.tn_size > 0) {
cover = render_cover(ctx, doc, fzdoc);
} else {
fz_var(cover);
fz_try(ctx)
cover = fz_load_page(ctx, fzdoc, 0);
fz_catch(ctx)
cover = NULL;
}
if (cover == NULL) {
fz_drop_stream(ctx, stream);
fz_drop_document(ctx, fzdoc);
fz_drop_context(ctx);
return;
}
if (ScanCtx.content_size > 0) {
fz_stext_options opts = {0};
thread_buffer = text_buffer_create(ScanCtx.content_size);
for (int current_page = 0; current_page < page_count; current_page++) {
fz_page *page;
fz_page *page = NULL;
if (current_page == 0) {
page = cover;
} else {
page = fz_load_page(ctx, fzdoc, current_page);
fz_var(err);
fz_try(ctx)
page = fz_load_page(ctx, fzdoc, current_page);
fz_catch(ctx)
err = ctx->error.errcode;
if (err != 0) {
LOG_WARNINGF(doc->filepath, "fz_load_page() returned error code [%d] %s", err, ctx->error.message)
text_buffer_destroy(&thread_buffer);
fz_drop_page(ctx, page);
fz_drop_stream(ctx, stream);
fz_drop_document(ctx, fzdoc);
fz_drop_context(ctx);
return;
}
}
fz_stext_page *stext = fz_new_stext_page(ctx, fz_bound_page(ctx, page));
fz_device *dev = fz_new_stext_device(ctx, stext, &opts);
dev->stroke_path = NULL;
dev->stroke_text = NULL;
dev->clip_text = NULL;
dev->clip_stroke_path = NULL;
dev->clip_stroke_text = NULL;
pthread_mutex_lock(&ScanCtx.mupdf_mu);
if (ScanCtx.tesseract_lang != NULL) {
dev->fill_image = fill_image;
}
fz_var(err);
fz_try(ctx)
fz_run_page_contents(ctx, page, dev, fz_identity, NULL);
fz_run_page(ctx, page, dev, fz_identity, NULL);
fz_always(ctx)
pthread_mutex_unlock(&ScanCtx.mupdf_mu);
{
fz_close_device(ctx, dev);
fz_drop_device(ctx, dev);
}
fz_catch(ctx)
fz_rethrow(ctx);
err = ctx->error.errcode;
fz_drop_device(ctx, dev);
if (err != 0) {
LOG_WARNINGF(doc->filepath, "fz_run_page() returned error code [%d] %s", err, ctx->error.message)
text_buffer_destroy(&thread_buffer);
fz_drop_page(ctx, page);
fz_drop_stext_page(ctx, stext);
fz_drop_stream(ctx, stream);
fz_drop_document(ctx, fzdoc);
fz_drop_context(ctx);
return;
}
fz_stext_block *block = stext->first_block;
while (block != NULL) {
if (block->type != FZ_STEXT_BLOCK_TEXT) {
block = block->next;
continue;
}
fz_stext_line *line = block->u.t.first_line;
while (line != NULL) {
fz_stext_char *c = line->first_char;
while (c != NULL) {
if (text_buffer_append_char(&text_buf, c->c) == TEXT_BUF_FULL) {
fz_drop_page(ctx, page);
fz_drop_stext_page(ctx, stext);
goto write_loop_end;
}
c = c->next;
}
line = line->next;
int ret = read_stext_block(block, &thread_buffer);
if (ret == TEXT_BUF_FULL) {
break;
}
block = block->next;
}
fz_drop_page(ctx, page);
fz_drop_stext_page(ctx, stext);
fz_drop_page(ctx, page);
if (thread_buffer.dyn_buffer.cur >= thread_buffer.dyn_buffer.size) {
break;
}
}
write_loop_end:;
text_buffer_terminate_string(&text_buf);
text_buffer_terminate_string(&thread_buffer);
meta_line_t *meta_content = malloc(sizeof(meta_line_t) + text_buf.dyn_buffer.cur);
meta_line_t *meta_content = malloc(sizeof(meta_line_t) + thread_buffer.dyn_buffer.cur);
meta_content->key = MetaContent;
memcpy(meta_content->strval, text_buf.dyn_buffer.buf, text_buf.dyn_buffer.cur);
text_buffer_destroy(&text_buf);
memcpy(meta_content->strval, thread_buffer.dyn_buffer.buf, thread_buffer.dyn_buffer.cur);
APPEND_META(doc, meta_content)
}
fz_always(ctx)
{
fz_drop_stream(ctx, stream);
fz_drop_document(ctx, fzdoc);
fz_drop_context(ctx);
} fz_catch(ctx) {
fprintf(stderr, "Error %s %s\n", doc->filepath, ctx->error.message);
}
}
text_buffer_destroy(&thread_buffer);
}
fz_drop_stream(ctx, stream);
fz_drop_document(ctx, fzdoc);
fz_drop_context(ctx);
}

View File

@@ -1,7 +1,7 @@
#include "text.h"
#include "src/ctx.h"
void parse_text(int bytes_read, int *fd, char *buf, document_t *doc) {
void parse_text(int bytes_read, struct vfile *f, char *buf, document_t *doc) {
char *intermediate_buf;
int intermediate_buf_len;
@@ -13,10 +13,6 @@ void parse_text(int bytes_read, int *fd, char *buf, document_t *doc) {
memcpy(intermediate_buf, buf, to_copy);
} else {
if (*fd == -1) {
*fd = open(doc->filepath, O_RDONLY);
}
int to_read = MIN(ScanCtx.content_size, doc->size) - bytes_read;
intermediate_buf = malloc(to_read + bytes_read);
@@ -25,19 +21,17 @@ void parse_text(int bytes_read, int *fd, char *buf, document_t *doc) {
memcpy(intermediate_buf, buf, bytes_read);
}
read(*fd, intermediate_buf + bytes_read, to_read);
f->read(f, intermediate_buf + bytes_read, to_read);
}
text_buffer_t tex = text_buffer_create(ScanCtx.content_size);
text_buffer_append_string(&tex, intermediate_buf, intermediate_buf_len);
text_buffer_terminate_string(&tex);
text_buffer_t text_buf = text_buffer_create(ScanCtx.content_size);
for (int i = 0; i < intermediate_buf_len; i++) {
text_buffer_append_char(&text_buf, *(intermediate_buf + i));
}
text_buffer_terminate_string(&text_buf);
meta_line_t *meta = malloc(sizeof(meta_line_t) + text_buf.dyn_buffer.cur);
meta_line_t *meta = malloc(sizeof(meta_line_t) + tex.dyn_buffer.cur);
meta->key = MetaContent;
strcpy(meta->strval, text_buf.dyn_buffer.buf);
text_buffer_destroy(&text_buf);
free(intermediate_buf);
strcpy(meta->strval, tex.dyn_buffer.buf);
APPEND_META(doc, meta)
free(intermediate_buf);
text_buffer_destroy(&tex);
}

View File

@@ -3,6 +3,6 @@
#include "src/sist.h"
void parse_text(int bytes_read, int *fd, char *buf, document_t *doc);
void parse_text(int bytes_read, struct vfile *f, char *buf, document_t *doc);
#endif

View File

@@ -2,6 +2,7 @@
#define SIST_H
#define UUID_STR_LEN 37
#define UNUSED(x) __attribute__((__unused__)) x
#include <glib-2.0/glib.h>
#include <unistd.h>
@@ -12,10 +13,11 @@
#include <ftw.h>
#include <uuid.h>
#include <magic.h>
#include <libavformat/avformat.h>
#include <libswscale/swscale.h>
#include <libswresample/swresample.h>
#include <libavcodec/avcodec.h>
#include "libavformat/avformat.h"
#include "libswscale/swscale.h"
#include "libswresample/swresample.h"
#include "libavcodec/avcodec.h"
#include "libavutil/imgutils.h"
#include <ctype.h>
#include <mupdf/fitz.h>
#include <mupdf/pdf.h>
@@ -25,19 +27,27 @@
#include <pthread.h>
#include <sys/stat.h>
#include <wordexp.h>
#include "ft2build.h"
#include "freetype/freetype.h"
#include <archive.h>
#include <archive_entry.h>
#include <opc/opc.h>
#include <libxml/xmlstring.h>
#define BOOL int
#include <tesseract/capi.h>
#include <onion/onion.h>
#include <onion/handler.h>
#include <onion/block.h>
#include <onion/shortcuts.h>
#include <onion/codecs.h>
#include <curl/curl.h>
#include "cJSON/cJSON.h"
#include "types.h"
#include "tpool.h"
#include "util.h"
#include "src/index/elastic.h"
#include "io/store.h"
#include "io/serialize.h"
#include "io/walk.h"
@@ -47,9 +57,16 @@
#include "parsing/pdf.h"
#include "parsing/media.h"
#include "parsing/font.h"
#include "parsing/arc.h"
#include "parsing/doc.h"
#include "cli.h"
#include "log.h"
#include "utf8.h/utf8.h"
#include "src/index/elastic.h"
#include "index/web.h"
#include "web/serve.h"
#include "cli.h"
#include "web/auth_basic.h"
;

View File

@@ -25,6 +25,7 @@ typedef struct tpool {
int done_cnt;
int stop;
void (*cleanup_func)();
} tpool_t;
@@ -100,7 +101,7 @@ static void *tpool_worker(void *arg) {
tpool_t *pool = arg;
while (1) {
pthread_mutex_lock(&(pool->work_mutex));
pthread_mutex_lock(&pool->work_mutex);
if (pool->stop) {
break;
}
@@ -113,14 +114,21 @@ static void *tpool_worker(void *arg) {
pthread_mutex_unlock(&(pool->work_mutex));
if (work != NULL) {
if (pool->stop) {
break;
}
work->func(work->arg);
free(work->arg);
free(work);
}
pthread_mutex_lock(&(pool->work_mutex));
pool->done_cnt++;
if (work != NULL) {
pool->done_cnt++;
}
progress_bar_print((double)pool->done_cnt / pool->work_cnt, ScanCtx.stat_tn_size, ScanCtx.stat_index_size);
progress_bar_print((double) pool->done_cnt / pool->work_cnt, ScanCtx.stat_tn_size, ScanCtx.stat_index_size);
if (pool->work_head == NULL) {
pthread_cond_signal(&(pool->working_cond));
@@ -128,6 +136,7 @@ static void *tpool_worker(void *arg) {
pthread_mutex_unlock(&(pool->work_mutex));
}
LOG_INFO("tpool.c", "Executing cleaup function")
pool->cleanup_func();
pthread_cond_signal(&(pool->working_cond));
@@ -136,17 +145,24 @@ static void *tpool_worker(void *arg) {
}
void tpool_wait(tpool_t *pool) {
LOG_INFO("tpool.c", "Waiting for worker threads to finish")
pthread_mutex_lock(&(pool->work_mutex));
while (1) {
if (pool->done_cnt < pool->work_cnt) {
pthread_cond_wait(&(pool->working_cond), &(pool->work_mutex));
} else {
pool->stop = 1;
break;
usleep(500000);
if (pool->done_cnt == pool->work_cnt) {
pool->stop = 1;
usleep(1000000);
break;
}
}
progress_bar_print(100.0, ScanCtx.stat_tn_size, ScanCtx.stat_index_size);
}
progress_bar_print(1.0, ScanCtx.stat_tn_size, ScanCtx.stat_index_size);
pthread_mutex_unlock(&(pool->work_mutex));
LOG_INFO("tpool.c", "Worker threads finished")
}
void tpool_destroy(tpool_t *pool) {
@@ -154,6 +170,8 @@ void tpool_destroy(tpool_t *pool) {
return;
}
LOG_INFO("tpool.c", "Destroying thread pool")
pthread_mutex_lock(&(pool->work_mutex));
tpool_work_t *work = pool->work_head;
while (work != NULL) {
@@ -168,10 +186,13 @@ void tpool_destroy(tpool_t *pool) {
for (size_t i = 0; i < pool->thread_cnt; i++) {
pthread_t thread = pool->threads[i];
if (thread != 0) {
pthread_cancel(thread);
void *_;
pthread_join(thread, &_);
}
}
LOG_INFO("tpool.c", "Final cleanup")
pthread_mutex_destroy(&(pool->work_mutex));
pthread_cond_destroy(&(pool->has_work_cond));
pthread_cond_destroy(&(pool->working_cond));
@@ -188,11 +209,11 @@ tpool_t *tpool_create(size_t thread_cnt, void cleanup_func()) {
tpool_t *pool = malloc(sizeof(tpool_t));
pool->thread_cnt = thread_cnt;
pool->work_cnt =0;
pool->done_cnt =0;
pool->work_cnt = 0;
pool->done_cnt = 0;
pool->stop = 0;
pool->cleanup_func = cleanup_func;
pool->threads = malloc(sizeof(pthread_t) * thread_cnt);
pool->threads = calloc(sizeof(pthread_t), thread_cnt);
pthread_mutex_init(&(pool->work_mutex), NULL);
@@ -202,11 +223,14 @@ tpool_t *tpool_create(size_t thread_cnt, void cleanup_func()) {
pool->work_head = NULL;
pool->work_tail = NULL;
for (size_t i = 0; i < thread_cnt; i++) {
pthread_t thread = pool->threads[i];
pthread_create(&thread, NULL, tpool_worker, pool);
pthread_detach(thread);
}
return pool;
}
void tpool_start(tpool_t *pool) {
LOG_INFOF("tpool.c", "Starting thread pool with %d threads", pool->thread_cnt)
for (size_t i = 0; i < pool->thread_cnt; i++) {
pthread_create(&pool->threads[i], NULL, tpool_worker, pool);
}
}

View File

@@ -9,6 +9,7 @@ typedef struct tpool tpool_t;
typedef void (*thread_func_t)(void *arg);
tpool_t *tpool_create(size_t num, void (*cleanup_func)());
void tpool_start(tpool_t *pool);
void tpool_destroy(tpool_t *tm);
int tpool_add_work(tpool_t *pool, thread_func_t func, void *arg);

View File

@@ -2,13 +2,19 @@
#define SIST2_TYPES_H
#define META_INT_MASK 0xF0
#define META_STR_MASK 0xE0
#define META_LONG_MASK 0xD0
#define META_INT_MASK 0x80
#define META_STR_MASK 0x40
#define META_LONG_MASK 0x20
#define IS_META_INT(key) (key & META_INT_MASK) == META_INT_MASK
#define IS_META_LONG(key) (key & META_LONG_MASK) == META_LONG_MASK
#define IS_META_STR(meta) (meta->key & META_STR_MASK) == META_STR_MASK
#define ARC_MODE_SKIP 0
#define ARC_MODE_LIST 1
#define ARC_MODE_SHALLOW 2
#define ARC_MODE_RECURSE 3
typedef int archive_mode_t;
// This is written to file as a 8bit char!
enum metakey {
MetaContent = 1 | META_STR_MASK,
@@ -24,16 +30,32 @@ enum metakey {
MetaGenre = 11 | META_STR_MASK,
MetaTitle = 12 | META_STR_MASK,
MetaFontName = 13 | META_STR_MASK,
MetaParent = 14 | META_STR_MASK,
MetaExifMake = 15 | META_STR_MASK,
MetaExifSoftware = 16 | META_STR_MASK,
MetaExifExposureTime = 17 | META_STR_MASK,
MetaExifFNumber = 18 | META_STR_MASK,
MetaExifFocalLength = 19 | META_STR_MASK,
MetaExifUserComment = 20 | META_STR_MASK,
MetaExifModel = 21 | META_STR_MASK,
MetaExifIsoSpeedRatings = 22 | META_STR_MASK,
MetaExifDateTime = 23 | META_STR_MASK,
//Note to self: this will break after 31 entries
};
#define INDEX_TYPE_BIN "binary"
#define INDEX_TYPE_JSON "json"
#define INDEX_VERSION_EXTERNAL "_external_v1"
typedef struct index_descriptor {
char uuid[UUID_STR_LEN];
char version[6];
char version[64];
long timestamp;
char root[PATH_MAX];
char rewrite_url[8196];
short root_len;
char name[1024];
char type[64];
} index_descriptor_t;
typedef struct index_t {
@@ -66,10 +88,32 @@ typedef struct document {
char *filepath;
} document_t;
typedef struct vfile vfile_t;
typedef int (*read_func_t)(struct vfile *, void *buf, size_t size);
typedef void (*close_func_t)(struct vfile *);
typedef struct vfile {
union {
int fd;
struct archive *arc;
};
int is_fs_file;
char *filepath;
read_func_t read;
close_func_t close;
} vfile_t;
typedef struct parse_job_t {
int base;
int ext;
struct stat info;
struct vfile vfile;
uuid_t parent;
char filepath[1];
} parse_job_t;

View File

@@ -1,5 +1,233 @@
#define _GNU_SOURCE
#include "util.h"
#include "src/ctx.h"
dyn_buffer_t dyn_buffer_create() {
dyn_buffer_t buf;
buf.size = INITIAL_BUF_SIZE;
buf.cur = 0;
buf.buf = malloc(INITIAL_BUF_SIZE);
return buf;
}
void grow_buffer(dyn_buffer_t *buf, size_t size) {
if (buf->cur + size > buf->size) {
do {
buf->size *= 2;
} while (buf->cur + size > buf->size);
buf->buf = realloc(buf->buf, buf->size);
}
}
void grow_buffer_small(dyn_buffer_t *buf) {
if (buf->cur + sizeof(long) > buf->size) {
buf->size *= 2;
buf->buf = realloc(buf->buf, buf->size);
}
}
void dyn_buffer_write(dyn_buffer_t *buf, void *data, size_t size) {
grow_buffer(buf, size);
memcpy(buf->buf + buf->cur, data, size);
buf->cur += size;
}
void dyn_buffer_write_char(dyn_buffer_t *buf, char c) {
grow_buffer_small(buf);
*(buf->buf + buf->cur) = c;
buf->cur += sizeof(c);
}
void dyn_buffer_write_str(dyn_buffer_t *buf, char *str) {
dyn_buffer_write(buf, str, strlen(str));
dyn_buffer_write_char(buf, '\0');
}
void dyn_buffer_append_string(dyn_buffer_t *buf, char *str) {
dyn_buffer_write(buf, str, strlen(str));
}
void dyn_buffer_write_int(dyn_buffer_t *buf, int d) {
grow_buffer_small(buf);
*(int *) (buf->buf + buf->cur) = d;
buf->cur += sizeof(int);
}
void dyn_buffer_write_short(dyn_buffer_t *buf, short s) {
grow_buffer_small(buf);
*(short *) (buf->buf + buf->cur) = s;
buf->cur += sizeof(short);
}
void dyn_buffer_write_long(dyn_buffer_t *buf, unsigned long l) {
grow_buffer_small(buf);
*(unsigned long *) (buf->buf + buf->cur) = l;
buf->cur += sizeof(unsigned long);
}
void dyn_buffer_destroy(dyn_buffer_t *buf) {
free(buf->buf);
}
void text_buffer_destroy(text_buffer_t *buf) {
dyn_buffer_destroy(&buf->dyn_buffer);
}
text_buffer_t text_buffer_create(int max_size) {
text_buffer_t text_buf;
text_buf.dyn_buffer = dyn_buffer_create();
text_buf.max_size = max_size;
text_buf.last_char_was_whitespace = FALSE;
return text_buf;
}
void text_buffer_terminate_string(text_buffer_t *buf) {
if (*(buf->dyn_buffer.buf + buf->dyn_buffer.cur - 1) == ' ') {
*(buf->dyn_buffer.buf + buf->dyn_buffer.cur - 1) = '\0';
} else {
dyn_buffer_write_char(&buf->dyn_buffer, '\0');
}
}
__always_inline
int utf8_validchr(const char *s) {
if (0x00 == (0x80 & *s)) {
return TRUE;
} else if (0xf0 == (0xf8 & *s)) {
if ((0x80 != (0xc0 & s[1])) || (0x80 != (0xc0 & s[2])) ||
(0x80 != (0xc0 & s[3]))) {
return FALSE;
}
if (0x80 == (0xc0 & s[4])) {
return FALSE;
}
if ((0 == (0x07 & s[0])) && (0 == (0x30 & s[1]))) {
return FALSE;
}
} else if (0xe0 == (0xf0 & *s)) {
if ((0x80 != (0xc0 & s[1])) || (0x80 != (0xc0 & s[2]))) {
return FALSE;
}
if (0x80 == (0xc0 & s[3])) {
return FALSE;
}
if ((0 == (0x0f & s[0])) && (0 == (0x20 & s[1]))) {
return FALSE;
}
} else if (0xc0 == (0xe0 & *s)) {
if (0x80 != (0xc0 & s[1])) {
return FALSE;
}
if (0x80 == (0xc0 & s[2])) {
return FALSE;
}
if (0 == (0x1e & s[0])) {
return FALSE;
}
} else {
return FALSE;
}
return TRUE;
}
int text_buffer_append_string(text_buffer_t *buf, char *str, size_t len) {
utf8_int32_t c;
if (str == NULL || len < 1 ||
(0xf0 == (0xf8 & str[0]) && len < 4) ||
(0xe0 == (0xf0 & str[0]) && len < 3) ||
(0xc0 == (0xe0 & str[0]) && len == 1) ||
*(str) == 0) {
return 0;
}
for (void *v = utf8codepoint(str, &c); c != '\0' && ((char *) v - str + 4) < len; v = utf8codepoint(v, &c)) {
if (utf8_validchr(v)) {
text_buffer_append_char(buf, c);
}
}
return 0;
}
int text_buffer_append_string0(text_buffer_t *buf, char *str) {
utf8_int32_t c;
for (void *v = utf8codepoint(str, &c); c != '\0'; v = utf8codepoint(v, &c)) {
if (utf8_validchr(v)) {
text_buffer_append_char(buf, c);
}
}
}
int text_buffer_append_char(text_buffer_t *buf, int c) {
if (SHOULD_IGNORE_CHAR(c) || c == ' ') {
if (!buf->last_char_was_whitespace && buf->dyn_buffer.cur != 0) {
dyn_buffer_write_char(&buf->dyn_buffer, ' ');
buf->last_char_was_whitespace = TRUE;
if (buf->max_size > 0 && buf->dyn_buffer.cur >= buf->max_size) {
return TEXT_BUF_FULL;
}
}
} else {
buf->last_char_was_whitespace = FALSE;
grow_buffer_small(&buf->dyn_buffer);
if (0 == ((utf8_int32_t) 0xffffff80 & c)) {
*(buf->dyn_buffer.buf + buf->dyn_buffer.cur++) = (char) c;
} else if (0 == ((utf8_int32_t) 0xfffff800 & c)) {
*(buf->dyn_buffer.buf + buf->dyn_buffer.cur++) = 0xc0 | (char) (c >> 6);
*(buf->dyn_buffer.buf + buf->dyn_buffer.cur++) = 0x80 | (char) (c & 0x3f);
} else if (0 == ((utf8_int32_t) 0xffff0000 & c)) {
*(buf->dyn_buffer.buf + buf->dyn_buffer.cur++) = 0xe0 | (char) (c >> 12);
*(buf->dyn_buffer.buf + buf->dyn_buffer.cur++) = 0x80 | (char) ((c >> 6) & 0x3f);
*(buf->dyn_buffer.buf + buf->dyn_buffer.cur++) = 0x80 | (char) (c & 0x3f);
} else {
*(buf->dyn_buffer.buf + buf->dyn_buffer.cur++) = 0xf0 | (char) (c >> 18);
*(buf->dyn_buffer.buf + buf->dyn_buffer.cur++) = 0x80 | (char) ((c >> 12) & 0x3f);
*(buf->dyn_buffer.buf + buf->dyn_buffer.cur++) = 0x80 | (char) ((c >> 6) & 0x3f);
*(buf->dyn_buffer.buf + buf->dyn_buffer.cur++) = 0x80 | (char) (c & 0x3f);
}
if (buf->max_size > 0 && buf->dyn_buffer.cur >= buf->max_size) {
return TEXT_BUF_FULL;
}
}
return 0;
}
void incremental_put(GHashTable *table, unsigned long inode_no, int mtime) {
g_hash_table_insert(table, (gpointer) inode_no, GINT_TO_POINTER(mtime));
}
int incremental_get(GHashTable *table, unsigned long inode_no) {
if (table != NULL) {
return GPOINTER_TO_INT(g_hash_table_lookup(table, (gpointer) inode_no));
} else {
return 0;
}
}
int incremental_mark_file_for_copy(GHashTable *table, unsigned long inode_no) {
g_hash_table_insert(table, GINT_TO_POINTER(inode_no), GINT_TO_POINTER(1));
}
#define PBSTR "========================================"
@@ -9,7 +237,7 @@ dyn_buffer_t url_escape(char *str) {
dyn_buffer_t text = dyn_buffer_create();
char * ptr = str;
char *ptr = str;
while (*ptr) {
if (*ptr == '#') {
dyn_buffer_write(&text, "%23", 3);
@@ -27,7 +255,7 @@ char *abspath(const char *path) {
wordexp_t w;
wordexp(path, &w, 0);
char *abs = canonicalize_file_name(w.we_wordv[0]);
char *abs = realpath(w.we_wordv[0], NULL);
if (abs == NULL) {
return NULL;
}
@@ -42,7 +270,7 @@ char *expandpath(const char *path) {
wordexp_t w;
wordexp(path, &w, 0);
char * expanded = malloc(strlen(w.we_wordv[0]) + 2);
char *expanded = malloc(strlen(w.we_wordv[0]) + 2);
strcpy(expanded, w.we_wordv[0]);
strcat(expanded, "/");
@@ -94,4 +322,29 @@ GHashTable *incremental_get_table() {
return file_table;
}
const char *find_file_in_paths(const char *paths[], const char *filename) {
for (int i = 0; paths[i] != NULL; i++) {
char *apath = abspath(paths[i]);
if (apath == NULL) {
continue;
}
char path[PATH_MAX];
snprintf(path, sizeof(path), "%s%s", apath, filename);
LOG_DEBUGF("util.c", "Looking for '%s' in folder '%s'", filename, apath)
free(apath);
struct stat info;
int ret = stat(path, &info);
if (ret != -1) {
return paths[i];
}
}
return NULL;
}

View File

@@ -5,7 +5,10 @@
#define TEXT_BUF_FULL -1
#define INITIAL_BUF_SIZE 1024 * 16
#define SHOULD_IGNORE_CHAR(c) c < '0' || c > 'z'
#define SHOULD_IGNORE_CHAR(c) !(SHOULD_KEEP_CHAR(c))
#define SHOULD_KEEP_CHAR(c) ((c >= '\'' && c <= ';') || (c >= 'A' && c <= 'z') || (c > 127))
typedef struct dyn_buffer {
char *buf;
@@ -21,160 +24,56 @@ typedef struct text_buffer {
dyn_buffer_t dyn_buffer;
} text_buffer_t;
__always_inline
dyn_buffer_t dyn_buffer_create() {
dyn_buffer_t buf;
char *abspath(const char *path);
buf.size = INITIAL_BUF_SIZE;
buf.cur = 0;
buf.buf = malloc(INITIAL_BUF_SIZE);
return buf;
}
__always_inline
void grow_buffer(dyn_buffer_t *buf, size_t size) {
if (buf->cur + size > buf->size) {
do {
buf->size *= 2;
} while (buf->cur + size > buf->size);
buf->buf = realloc(buf->buf, buf->size);
}
}
__always_inline
void grow_buffer_small(dyn_buffer_t *buf) {
if (buf->cur + sizeof(long) > buf->size) {
buf->size *= 2;
buf->buf = realloc(buf->buf, buf->size);
}
}
__always_inline
void dyn_buffer_write(dyn_buffer_t *buf, void *data, size_t size) {
grow_buffer(buf, size);
memcpy(buf->buf + buf->cur, data, size);
buf->cur += size;
}
__always_inline
void dyn_buffer_write_char(dyn_buffer_t *buf, char c) {
grow_buffer_small(buf);
*(buf->buf + buf->cur) = c;
buf->cur += sizeof(c);
}
__always_inline
void dyn_buffer_write_str(dyn_buffer_t *buf, char *str) {
dyn_buffer_write(buf, str, strlen(str));
dyn_buffer_write_char(buf, '\0');
}
__always_inline
void dyn_buffer_write_int(dyn_buffer_t *buf, int d) {
grow_buffer_small(buf);
*(int *) (buf->buf + buf->cur) = d;
buf->cur += sizeof(int);
}
__always_inline
void dyn_buffer_write_short(dyn_buffer_t *buf, short s) {
grow_buffer_small(buf);
*(short *) (buf->buf + buf->cur) = s;
buf->cur += sizeof(short);
}
__always_inline
void dyn_buffer_write_long(dyn_buffer_t *buf, unsigned long l) {
grow_buffer_small(buf);
*(unsigned long *) (buf->buf + buf->cur) = l;
buf->cur += sizeof(unsigned long);
}
__always_inline
void dyn_buffer_destroy(dyn_buffer_t *buf) {
free(buf->buf);
}
__always_inline
void text_buffer_destroy(text_buffer_t *buf) {
dyn_buffer_destroy(&buf->dyn_buffer);
}
__always_inline
text_buffer_t text_buffer_create(int max_size) {
text_buffer_t text_buf;
text_buf.dyn_buffer = dyn_buffer_create();
text_buf.max_size = max_size;
text_buf.last_char_was_whitespace = FALSE;
return text_buf;
}
__always_inline
void text_buffer_terminate_string(text_buffer_t *buf) {
dyn_buffer_write_char(&buf->dyn_buffer, '\0');
}
__always_inline
int text_buffer_append_char(text_buffer_t *buf, int c) {
if (SHOULD_IGNORE_CHAR(c)) {
if (!buf->last_char_was_whitespace) {
dyn_buffer_write_char(&buf->dyn_buffer, ' ');
buf->last_char_was_whitespace = TRUE;
if (buf->dyn_buffer.cur >= buf->max_size) {
return TEXT_BUF_FULL;
}
}
} else {
buf->last_char_was_whitespace = FALSE;
dyn_buffer_write_char(&buf->dyn_buffer, (char) c);
if (buf->dyn_buffer.cur >= buf->max_size) {
return TEXT_BUF_FULL;
}
}
return 0;
}
char *abspath(const char * path);
char *expandpath(const char *path);
dyn_buffer_t url_escape(char *str);
void progress_bar_print(double percentage, size_t tn_size, size_t index_size);
__always_inline
void incremental_put(GHashTable *table, unsigned long inode_no, int mtime) {
g_hash_table_insert(table, (gpointer) inode_no, GINT_TO_POINTER(mtime));
}
__always_inline
int incremental_get(GHashTable *table, unsigned long inode_no) {
if (table != NULL) {
return GPOINTER_TO_INT(g_hash_table_lookup(table, (gpointer) inode_no));
} else {
return 0;
}
}
__always_inline
int incremental_mark_file_for_copy(GHashTable *table, unsigned long inode_no) {
g_hash_table_insert(table, GINT_TO_POINTER(inode_no), GINT_TO_POINTER(1));
}
GHashTable *incremental_get_table();
dyn_buffer_t dyn_buffer_create();
void grow_buffer(dyn_buffer_t *buf, size_t size);
void grow_buffer_small(dyn_buffer_t *buf);
void dyn_buffer_write(dyn_buffer_t *buf, void *data, size_t size);
void dyn_buffer_write_char(dyn_buffer_t *buf, char c);
void dyn_buffer_write_str(dyn_buffer_t *buf, char *str);
void dyn_buffer_append_string(dyn_buffer_t *buf, char *str);
void dyn_buffer_write_int(dyn_buffer_t *buf, int d);
void dyn_buffer_write_short(dyn_buffer_t *buf, short s);
void dyn_buffer_write_long(dyn_buffer_t *buf, unsigned long l);
void dyn_buffer_destroy(dyn_buffer_t *buf);
void text_buffer_destroy(text_buffer_t *buf);
text_buffer_t text_buffer_create(int max_size);
void text_buffer_terminate_string(text_buffer_t *buf);
int text_buffer_append_string(text_buffer_t *buf, char *str, size_t len);
int text_buffer_append_string0(text_buffer_t *buf, char *str);
int text_buffer_append_char(text_buffer_t *buf, int c);
void incremental_put(GHashTable *table, unsigned long inode_no, int mtime);
int incremental_get(GHashTable *table, unsigned long inode_no);
int incremental_mark_file_for_copy(GHashTable *table, unsigned long inode_no);
const char *find_file_in_paths(const char **paths, const char *filename);
#endif

59
src/web/auth_basic.c Normal file
View File

@@ -0,0 +1,59 @@
#include "auth_basic.h"
#define UNAUTHORIZED_TEXT "Unauthorized"
typedef struct auth_basic_data {
onion_handler *inside;
const char *b64credentials;
} auth_basic_data_t;
int authenticate(const char *expected, const char *credentials) {
if (expected == NULL) {
return TRUE;
}
if (credentials && strncmp(credentials, "Basic ", 6) == 0) {
if (strcmp((credentials + 6), expected) == 0) {
return TRUE;
}
}
return FALSE;
}
int auth_basic_handler(auth_basic_data_t *d,
onion_request *req,
onion_response *res) {
const char *credentials = onion_request_get_header(req, "Authorization");
if (authenticate(d->b64credentials, credentials)) {
return onion_handler_handle(d->inside, req, res);
}
onion_response_set_header(res, "WWW-Authenticate", "Basic realm=\"sist2\"");
onion_response_set_code(res, HTTP_UNAUTHORIZED);
onion_response_write(res, UNAUTHORIZED_TEXT, sizeof(UNAUTHORIZED_TEXT));
onion_response_set_length(res, sizeof(UNAUTHORIZED_TEXT));
return OCS_PROCESSED;
}
void auth_basic_free(auth_basic_data_t *data) {
onion_handler_free(data->inside);
free(data);
}
onion_handler *auth_basic(const char *b64credentials, onion_handler *inside_level) {
auth_basic_data_t *privdata = malloc(sizeof(auth_basic_data_t));
privdata->b64credentials = b64credentials;
privdata->inside = inside_level;
return onion_handler_new((onion_handler_handler) auth_basic_handler, privdata,
(onion_handler_private_data_free) auth_basic_free);
}

4
src/web/auth_basic.h Normal file
View File

@@ -0,0 +1,4 @@
#include "src/sist.h"
onion_handler *auth_basic(const char *b64credentials, onion_handler *inside_level);

View File

@@ -43,27 +43,40 @@ int javascript(void *p, onion_request *req, onion_response *res) {
return OCS_PROCESSED;
}
int style(void *p, onion_request *req, onion_response *res) {
set_default_headers(res);
onion_response_set_header(res, "Content-Type", "text/css");
onion_response_set_length(res, sizeof(bundle_css));
onion_response_write(res, bundle_css, sizeof(bundle_css));
return OCS_PROCESSED;
int client_requested_dark_theme(onion_request *req) {
const char *cookie = onion_request_get_cookie(req, "sist");
if (cookie == NULL) {
return FALSE;
}
return strcmp(cookie, "dark") == 0;
}
int bg_bars(void *p, onion_request *req, onion_response *res) {
int style(void *p, onion_request *req, onion_response *res) {
set_default_headers(res);
onion_response_set_header(res, "Content-Type", "image/png");
onion_response_set_length(res, sizeof(bg_bars_png));
onion_response_write(res, bg_bars_png, sizeof(bg_bars_png));
onion_response_set_header(res, "Content-Type", "text/css");
if (client_requested_dark_theme(req)) {
onion_response_set_length(res, sizeof(bundle_dark_css));
onion_response_write(res, bundle_dark_css, sizeof(bundle_dark_css));
} else {
onion_response_set_length(res, sizeof(bundle_css));
onion_response_write(res, bundle_css, sizeof(bundle_css));
}
return OCS_PROCESSED;
}
int img_sprite_skin_flag(void *p, onion_request *req, onion_response *res) {
set_default_headers(res);
onion_response_set_header(res, "Content-Type", "image/png");
onion_response_set_length(res, sizeof(sprite_skin_flat_png));
onion_response_write(res, sprite_skin_flat_png, sizeof(sprite_skin_flat_png));
if (client_requested_dark_theme(req)) {
onion_response_set_length(res, sizeof(sprite_skin_flat_dark_png));
onion_response_write(res, sprite_skin_flat_dark_png, sizeof(sprite_skin_flat_dark_png));
} else {
onion_response_set_length(res, sizeof(sprite_skin_flat_png));
onion_response_write(res, sprite_skin_flat_png, sizeof(sprite_skin_flat_png));
}
return OCS_PROCESSED;
}
@@ -97,7 +110,7 @@ int thumbnail(void *p, onion_request *req, onion_response *res) {
int written = onion_response_write(res, data, data_len);
onion_response_flush(res);
if (written != data_len || data_len == 0) {
printf("Couldn't write thumb\n");
LOG_DEBUG("serve.c", "Couldn't write thumbnail");
}
free(data);
@@ -168,7 +181,12 @@ int chunked_response_file(const char *filename, const char *mime,
}
}
onion_response_set_length(res, length);
onion_response_set_header(res, "Content-Type", mime);
if (mime != NULL) {
onion_response_set_header(res, "Content-Type", mime);
} else {
onion_response_set_header(res, "Content-Type", "application/octet-stream");
}
onion_response_write_headers(res);
if ((onion_request_get_flags(request) & OR_HEAD) == OR_HEAD) {
length = 0;
@@ -201,21 +219,13 @@ int chunked_response_file(const char *filename, const char *mime,
return OCS_PROCESSED;
}
int search(void *p, onion_request *req, onion_response *res) {
int search(UNUSED(void *p), onion_request *req, onion_response *res) {
int flags = onion_request_get_flags(req);
if ((flags & OR_METHODS) != OR_POST) {
return OCS_NOT_PROCESSED;
}
char *scroll_param;
const char *scroll = onion_request_get_query(req, "scroll");
if (scroll != NULL) {
scroll_param = "?scroll=3m";
} else {
scroll_param = "";
}
const struct onion_block_t *block = onion_request_get_data(req);
if (block == NULL) {
@@ -223,7 +233,7 @@ int search(void *p, onion_request *req, onion_response *res) {
}
char url[4096];
snprintf(url, 4096, "%s/sist2/_search%s", WebCtx.es_url, scroll_param);
snprintf(url, 4096, "%s/sist2/_search", WebCtx.es_url);
response_t *r = web_post(url, onion_block_data(block), "Content-Type: application/json");
set_default_headers(res);
@@ -232,6 +242,8 @@ int search(void *p, onion_request *req, onion_response *res) {
if (r->status_code == 200) {
onion_response_write(res, r->body, r->size);
} else {
onion_response_set_code(res, HTTP_INTERNAL_ERROR);
}
free_response(r);
@@ -239,43 +251,6 @@ int search(void *p, onion_request *req, onion_response *res) {
return OCS_PROCESSED;
}
int scroll(void *p, onion_request *req, onion_response *res) {
int flags = onion_request_get_flags(req);
if ((flags & OR_METHODS) != OR_GET) {
return OCS_NOT_PROCESSED;
}
char url[4096];
snprintf(url, 4096, "%s/_search/scroll", WebCtx.es_url);
const char *scroll_id = onion_request_get_query(req, "scroll_id");
cJSON *json = cJSON_CreateObject();
cJSON_AddStringToObject(json, "scroll_id", scroll_id);
cJSON_AddStringToObject(json, "scroll", "3m");
char *json_str = cJSON_PrintUnformatted(json);
response_t *r = web_post(url, json_str, "Content-Type: application/json");
cJSON_Delete(json);
cJSON_free(json_str);
if (r->status_code != 200) {
free_response(r);
return OCS_NOT_PROCESSED;
}
set_default_headers(res);
onion_response_set_header(res, "Content-Type", "application/json");
onion_response_set_header(res, "Content-Disposition", "application/json");
onion_response_set_length(res, r->size);
onion_response_write(res, r->body, r->size);
free_response(r);
return OCS_PROCESSED;
}
int serve_file_from_url(cJSON *json, index_t *idx, onion_request *req, onion_response *res) {
const char *path = cJSON_GetObjectItem(json, "path")->valuestring;
@@ -312,7 +287,7 @@ int serve_file_from_disk(cJSON *json, index_t *idx, onion_request *req, onion_re
return chunked_response_file(full_path, mime, 1, req, res);
}
int index_info(void *p, onion_request *req, onion_response *res) {
int index_info(UNUSED(void *p), onion_request *req, onion_response *res) {
cJSON *json = cJSON_CreateObject();
cJSON *arr = cJSON_AddArrayToObject(json, "indices");
@@ -326,7 +301,7 @@ int index_info(void *p, onion_request *req, onion_response *res) {
cJSON_AddStringToObject(idx_json, "name", idx->desc.name);
cJSON_AddStringToObject(idx_json, "version", idx->desc.version);
cJSON_AddStringToObject(idx_json, "id", idx->desc.uuid);
cJSON_AddNumberToObject(idx_json, "timestamp", (double)idx->desc.timestamp);
cJSON_AddNumberToObject(idx_json, "timestamp", (double) idx->desc.timestamp);
cJSON_AddItemToArray(arr, idx_json);
}
@@ -338,7 +313,8 @@ int index_info(void *p, onion_request *req, onion_response *res) {
return OCS_PROCESSED;
}
int file(void *p, onion_request *req, onion_response *res) {
int document_info(UNUSED(void *p), onion_request *req, onion_response *res) {
const char *arg_uuid = onion_request_get_query(req, "1");
if (arg_uuid == NULL) {
@@ -347,22 +323,89 @@ int file(void *p, onion_request *req, onion_response *res) {
cJSON *doc = elastic_get_document(arg_uuid);
cJSON *source = cJSON_GetObjectItem(doc, "_source");
cJSON *index_id = cJSON_GetObjectItem(source, "index");
if (index_id == NULL) {
cJSON_Delete(doc);
return OCS_NOT_PROCESSED;
}
index_t *idx = get_index_by_id(index_id->valuestring);
if (idx == NULL) {
cJSON_Delete(doc);
return OCS_NOT_PROCESSED;
}
onion_response_set_header(res, "Content-Type", "application/json");
char *json_str = cJSON_PrintUnformatted(source);
onion_response_write0(res, json_str);
free(json_str);
cJSON_Delete(doc);
return OCS_PROCESSED;
}
int file(UNUSED(void *p), onion_request *req, onion_response *res) {
const char *arg_uuid = onion_request_get_query(req, "1");
if (arg_uuid == NULL) {
return OCS_PROCESSED;
}
const char *next = arg_uuid;
cJSON *doc = NULL;
cJSON *index_id = NULL;
cJSON *source = NULL;
while (true) {
doc = elastic_get_document(next);
source = cJSON_GetObjectItem(doc, "_source");
index_id = cJSON_GetObjectItem(source, "index");
if (index_id == NULL) {
cJSON_Delete(doc);
return OCS_NOT_PROCESSED;
}
cJSON *parent = cJSON_GetObjectItem(source, "parent");
if (parent == NULL) {
break;
}
next = parent->valuestring;
}
index_t *idx = get_index_by_id(index_id->valuestring);
if (idx == NULL) {
cJSON_Delete(doc);
return OCS_NOT_PROCESSED;
}
int ret;
if (strlen(idx->desc.rewrite_url) == 0) {
return serve_file_from_disk(source, idx, req, res);
ret = serve_file_from_disk(source, idx, req, res);
} else {
return serve_file_from_url(source, idx, req, res);
ret = serve_file_from_url(source, idx, req, res);
}
cJSON_Delete(doc);
return ret;
}
int status(UNUSED(void *p), UNUSED(onion_request *req), onion_response *res) {
set_default_headers(res);
onion_response_set_header(res, "Content-Type", "application/x-empty");
char *status = elastic_get_status();
if (strcmp(status, "open") == 0) {
onion_response_set_code(res, 204);
} else {
onion_response_set_code(res, 500);
}
free(status);
return OCS_PROCESSED;
}
void serve(const char *hostname, const char *port) {
@@ -372,17 +415,18 @@ void serve(const char *hostname, const char *port) {
onion_set_hostname(o, hostname);
onion_set_port(o, port);
onion_url *urls = onion_root_url(o);
onion_url *urls = onion_url_new();
// Static paths
onion_set_root_handler(o, auth_basic(WebCtx.b64credentials, onion_url_to_handler(urls)));
onion_url_add(urls, "", search_index);
onion_url_add(urls, "css", style);
onion_url_add(urls, "js", javascript);
onion_url_add(urls, "img/bg-bars.png", bg_bars);
onion_url_add(urls, "img/sprite-skin-flat.png", img_sprite_skin_flag);
onion_url_add(urls, "es", search);
onion_url_add(urls, "scroll", scroll);
onion_url_add(urls, "status", status);
onion_url_add(
urls,
"^t/([a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12})/"
@@ -390,8 +434,10 @@ void serve(const char *hostname, const char *port) {
thumbnail
);
onion_url_add(urls, "^f/([a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12})$", file);
onion_url_add(urls, "^d/([a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12})$", document_info);
onion_url_add(urls, "i", index_info);
printf("Starting web server @ http://%s:%s\n", hostname, port);
onion_listen(o);

File diff suppressed because one or more lines are too long

1
utf8.h Submodule

Submodule utf8.h added at 2a7c5bfa95

443
web/css/dark.css Normal file
View File

@@ -0,0 +1,443 @@
*:focus {
outline: 0;
}
.info-icon {
width: 1rem;
margin-right: 0.2rem;
cursor: pointer;
color: #757575;
line-height: 1rem;
height: 1.1rem;
}
.info-icon:hover {
color: inherit;
}
.modal-title {
max-width: calc(100% - 2rem);
overflow: hidden;
text-overflow: ellipsis;
}
.path-row {
display: -ms-flexbox;
display: flex;
-ms-flex-align: start;
align-items: flex-start;
}
.tag-container {
margin-left: 0.3rem;
}
.path-line {
color: #BBB;
text-overflow: ellipsis;
overflow: hidden;
}
a {
color: #00BCD4;
}
body {
overflow-y: scroll;
background: black;
}
.progress {
margin-top: 1em;
}
.card, .modal-content {
margin-top: 1em;
background: #212121;
color: #e0e0e0;
border-radius: 1px;
border: none;
}
.table {
color: #e0e0e0;
}
.table td, .table th {
border: none;
}
.table thead th {
border-bottom: 1px solid #646464;
}
.modal-header .close {
color: #e0e0e0;
text-shadow: none;
}
.modal-header {
border-bottom: 1px solid #646464;
}
.sub-document {
background: #37474F !important;
}
.list-group-item.sub-document {
border-top: 1px solid #646464 !important;
}
.sub-document .text-muted {
color: #8a949c !important;
}
.list-group-item {
background: #212121;
color: #e0e0e0;
border-top: 1px solid #424242;
border-bottom: none;
border-left: none;
border-right: none;
}
.list-group-item:first-child {
border-top: none;
}
.navbar-brand {
font-size: 1.75rem;
padding: 0;
color: #f5f5f5;
}
.navbar {
background: #546b7a;
}
.navbar a:hover {
color: #fff;
}
.navbar span {
color: #eee;
}
.document {
padding: 0.5rem;
}
.document p {
margin-bottom: 0;
}
.document:hover p {
text-decoration: underline;
}
.badge-video {
color: #FFFFFF;
background-color: #F27761;
}
.badge-image {
color: #FFFFFF;
background-color: #AA99C9;
}
.badge-audio {
color: #FFFFFF;
background-color: #00ADEF;
}
.badge-resolution {
color: #212529;
background-color: #B0BEC5;
}
.badge-text {
color: #FFFFFF;
background-color: #FAAB3C;
}
.card-img-overlay {
pointer-events: none;
padding: 0.75rem;
bottom: unset;
top: 0;
left: unset;
right: unset;
}
.file-title {
width: 100%;
line-height: 1rem;
height: 1.1rem;
font-size: 10pt;
white-space: nowrap;
text-overflow: ellipsis;
overflow: hidden;
color: #00BCD4;
}
.badge {
margin-right: 3px;
}
.badge-user {
color: #212529;
background-color: #e0e0e0;
}
.fit {
display: block;
min-width: 64px;
max-width: 100%;
max-height: 175px;
margin: 0 auto 0;
padding: 3px 3px 0 3px;
width: auto;
height: auto;
}
.fit-sm {
display: block;
max-width: 64px;
max-height: 64px;
margin: 0 auto 0;
width: auto;
height: auto;
}
.audio-fit {
height: 39px;
vertical-align: bottom;
display: inline;
width: 100%;
}
@media (min-width: 1200px) {
.card-columns {
column-count: 4;
}
}
@media (min-width: 1500px) {
.container {
max-width: 1440px;
}
.card-columns {
column-count: 5;
}
}
@media (min-width: 1800px) {
.container {
max-width: 1550px;
}
}
mark {
background: #fff217;
border-radius: 0;
padding: 1px 0;
}
.content-div {
font-family: SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace;
font-size: 13px;
padding: 1em;
background-color: #37474F;
border: 1px solid #616161;
border-radius: 4px;
margin: 3px;
white-space: normal;
color: rgb(224, 224, 224);
}
.irs-single, .irs-from, .irs-to {
font-size: 13px;
background-color: #00BCD4;
}
.irs-slider {
cursor: col-resize;
}
.irs {
margin-top: 1em;
margin-bottom: 1em;
}
.custom-select {
overflow: auto;
background-color: #37474F;
border: 1px solid #616161;
color: #bdbdbd;
}
.custom-select:focus {
border-color: #757575;
outline: 0;
box-shadow: 0 0 0 .2rem rgba(0, 123, 255, .25);
}
option {
outline: none;
}
.form-control {
background-color: #37474F;
border: 1px solid #616161;
color: #fff;
}
.form-control:focus {
background-color: #546E7A;
color: #fff;
}
.input-group-text {
background: #263238;
border: 1px solid #616161;
color: #dbdbdb;
}
::placeholder {
color: #BDBDBD !important;
opacity: 1;
}
.inspire-tree .selected > .wholerow, .inspire-tree .selected > .title-wrap:hover + .wholerow {
background: none;
}
.inspire-tree .icon-expand::before, .inspire-tree .icon-collapse::before {
background-color: black;
}
.inspire-tree .title {
color: #eee;
}
.inspire-tree {
font-weight: 400;
font-size: 14px;
font-family: Helvetica, Nueue, Verdana, sans-serif;
max-height: 350px;
overflow: auto;
}
.page-indicator {
line-height: 1rem;
padding: 0.5rem;
background: #212121;
color: #eee;
margin-top: 1em;
}
.btn-xs {
padding: .1rem .3rem;
font-size: .875rem;
border-radius: .2rem;
}
.btn {
color: #eee;
}
.nav-tabs .nav-link {
color: #e0e0e0;
}
.nav-tabs .nav-item.show .nav-link, .nav-tabs .nav-link.active {
background-color: #212121;
border-color: #616161 #616161 #212121;
color: #e0e0e0;
}
.nav-tabs .nav-link:focus, .nav-tabs .nav-link:focus {
border-color: #616161 #616161 #212121;
color: #e0e0e0;
}
.nav-tabs .nav-link:focus, .nav-tabs .nav-link:hover {
border-color: #e0e0e0 #e0e0e0 #212121;
color: #e0e0e0;
}
.nav-tabs {
border-bottom: #616161;
}
.nav {
margin-top: 0.5rem;
}
@media (max-width: 800px) {
#treeTabs {
flex-basis: inherit;
flex-grow: inherit;
}
}
.list-group {
margin-top: 1em;
}
.list-group-item {
padding: .25rem 0.5rem;
}
.wrapper-sm {
min-width: 64px;
}
.media-expanded {
display: inherit;
}
.media-expanded .fit {
max-height: 250px;
}
@media (max-width: 600px) {
.media-expanded .fit {
max-height: none;
}
.tagline {
display: none;
}
}
.version {
color: #00BCD4;
margin-left: -18px;
margin-top: -14px;
font-size: 11px;
}
@media (min-width: 800px) {
.small-btn {
display: none;
}
.large-btn {
display: inherit;
}
}
@media (max-width: 801px) {
.small-btn {
display: inherit;
}
.large-btn {
display: none;
}
}

1
web/css/jquery.toast.min.css vendored Normal file
View File

@@ -0,0 +1 @@
.jq-toast-wrap,.jq-toast-wrap *{margin:0;padding:0}.jq-toast-wrap{display:block;position:fixed;width:250px;pointer-events:none!important;letter-spacing:normal;z-index:9000!important}.jq-toast-wrap.bottom-left{bottom:20px;left:20px}.jq-toast-wrap.bottom-right{bottom:20px;right:40px}.jq-toast-wrap.top-left{top:20px;left:20px}.jq-toast-wrap.top-right{top:20px;right:40px}.jq-toast-single{display:block;width:100%;padding:10px;margin:0 0 5px;border-radius:4px;font-size:12px;font-family:arial,sans-serif;line-height:17px;position:relative;pointer-events:all!important;background-color:#444;color:#fff}.jq-toast-single h2{font-family:arial,sans-serif;font-size:14px;margin:0 0 7px;background:0 0;color:inherit;line-height:inherit;letter-spacing:normal}.jq-toast-single a{color:#eee;text-decoration:none;font-weight:700;border-bottom:1px solid #fff;padding-bottom:3px;font-size:12px}.jq-toast-single ul{margin:0 0 0 15px;background:0 0;padding:0}.jq-toast-single ul li{list-style-type:disc!important;line-height:17px;background:0 0;margin:0;padding:0;letter-spacing:normal}.close-jq-toast-single{position:absolute;top:3px;right:7px;font-size:14px;cursor:pointer}.jq-toast-loader{display:block;position:absolute;top:-2px;height:5px;width:0;left:0;border-radius:5px;background:red}.jq-toast-loaded{width:100%}.jq-has-icon{padding:10px 10px 10px 50px;background-repeat:no-repeat;background-position:10px}.jq-icon-info{background-image:url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABgAAAAYCAYAAADgdz34AAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsMAAA7DAcdvqGQAAAGwSURBVEhLtZa9SgNBEMc9sUxxRcoUKSzSWIhXpFMhhYWFhaBg4yPYiWCXZxBLERsLRS3EQkEfwCKdjWJAwSKCgoKCcudv4O5YLrt7EzgXhiU3/4+b2ckmwVjJSpKkQ6wAi4gwhT+z3wRBcEz0yjSseUTrcRyfsHsXmD0AmbHOC9Ii8VImnuXBPglHpQ5wwSVM7sNnTG7Za4JwDdCjxyAiH3nyA2mtaTJufiDZ5dCaqlItILh1NHatfN5skvjx9Z38m69CgzuXmZgVrPIGE763Jx9qKsRozWYw6xOHdER+nn2KkO+Bb+UV5CBN6WC6QtBgbRVozrahAbmm6HtUsgtPC19tFdxXZYBOfkbmFJ1VaHA1VAHjd0pp70oTZzvR+EVrx2Ygfdsq6eu55BHYR8hlcki+n+kERUFG8BrA0BwjeAv2M8WLQBtcy+SD6fNsmnB3AlBLrgTtVW1c2QN4bVWLATaIS60J2Du5y1TiJgjSBvFVZgTmwCU+dAZFoPxGEEs8nyHC9Bwe2GvEJv2WXZb0vjdyFT4Cxk3e/kIqlOGoVLwwPevpYHT+00T+hWwXDf4AJAOUqWcDhbwAAAAASUVORK5CYII=);background-color:#31708f;color:#d9edf7;border-color:#bce8f1}.jq-icon-warning{background-image:url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABgAAAAYCAYAAADgdz34AAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsMAAA7DAcdvqGQAAAGYSURBVEhL5ZSvTsNQFMbXZGICMYGYmJhAQIJAICYQPAACiSDB8AiICQQJT4CqQEwgJvYASAQCiZiYmJhAIBATCARJy+9rTsldd8sKu1M0+dLb057v6/lbq/2rK0mS/TRNj9cWNAKPYIJII7gIxCcQ51cvqID+GIEX8ASG4B1bK5gIZFeQfoJdEXOfgX4QAQg7kH2A65yQ87lyxb27sggkAzAuFhbbg1K2kgCkB1bVwyIR9m2L7PRPIhDUIXgGtyKw575yz3lTNs6X4JXnjV+LKM/m3MydnTbtOKIjtz6VhCBq4vSm3ncdrD2lk0VgUXSVKjVDJXJzijW1RQdsU7F77He8u68koNZTz8Oz5yGa6J3H3lZ0xYgXBK2QymlWWA+RWnYhskLBv2vmE+hBMCtbA7KX5drWyRT/2JsqZ2IvfB9Y4bWDNMFbJRFmC9E74SoS0CqulwjkC0+5bpcV1CZ8NMej4pjy0U+doDQsGyo1hzVJttIjhQ7GnBtRFN1UarUlH8F3xict+HY07rEzoUGPlWcjRFRr4/gChZgc3ZL2d8oAAAAASUVORK5CYII=);background-color:#8a6d3b;color:#fcf8e3;border-color:#faebcc}.jq-icon-error{background-image:url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABgAAAAYCAYAAADgdz34AAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsMAAA7DAcdvqGQAAAHOSURBVEhLrZa/SgNBEMZzh0WKCClSCKaIYOED+AAKeQQLG8HWztLCImBrYadgIdY+gIKNYkBFSwu7CAoqCgkkoGBI/E28PdbLZmeDLgzZzcx83/zZ2SSXC1j9fr+I1Hq93g2yxH4iwM1vkoBWAdxCmpzTxfkN2RcyZNaHFIkSo10+8kgxkXIURV5HGxTmFuc75B2RfQkpxHG8aAgaAFa0tAHqYFfQ7Iwe2yhODk8+J4C7yAoRTWI3w/4klGRgR4lO7Rpn9+gvMyWp+uxFh8+H+ARlgN1nJuJuQAYvNkEnwGFck18Er4q3egEc/oO+mhLdKgRyhdNFiacC0rlOCbhNVz4H9FnAYgDBvU3QIioZlJFLJtsoHYRDfiZoUyIxqCtRpVlANq0EU4dApjrtgezPFad5S19Wgjkc0hNVnuF4HjVA6C7QrSIbylB+oZe3aHgBsqlNqKYH48jXyJKMuAbiyVJ8KzaB3eRc0pg9VwQ4niFryI68qiOi3AbjwdsfnAtk0bCjTLJKr6mrD9g8iq/S/B81hguOMlQTnVyG40wAcjnmgsCNESDrjme7wfftP4P7SP4N3CJZdvzoNyGq2c/HWOXJGsvVg+RA/k2MC/wN6I2YA2Pt8GkAAAAASUVORK5CYII=);background-color:#a94442;color:#f2dede;border-color:#ebccd1}.jq-icon-success{background-image:url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABgAAAAYCAYAAADgdz34AAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsMAAA7DAcdvqGQAAADsSURBVEhLY2AYBfQMgf///3P8+/evAIgvA/FsIF+BavYDDWMBGroaSMMBiE8VC7AZDrIFaMFnii3AZTjUgsUUWUDA8OdAH6iQbQEhw4HyGsPEcKBXBIC4ARhex4G4BsjmweU1soIFaGg/WtoFZRIZdEvIMhxkCCjXIVsATV6gFGACs4Rsw0EGgIIH3QJYJgHSARQZDrWAB+jawzgs+Q2UO49D7jnRSRGoEFRILcdmEMWGI0cm0JJ2QpYA1RDvcmzJEWhABhD/pqrL0S0CWuABKgnRki9lLseS7g2AlqwHWQSKH4oKLrILpRGhEQCw2LiRUIa4lwAAAABJRU5ErkJggg==);color:#dff0d8;background-color:#3c763d;border-color:#d6e9c6}

View File

@@ -1,4 +1,46 @@
body {overflow-y:scroll;}
*:focus {
outline: 0;
}
.info-icon {
width: 1rem;
margin-right: 0.2rem;
cursor: pointer;
color: #757575;
line-height: 1rem;
height: 1rem;
}
.info-icon:hover {
color: inherit;
}
.modal-title {
max-width: calc(100% - 2rem);
overflow: hidden;
text-overflow: ellipsis;
}
.path-row {
display: -ms-flexbox;
display: flex;
-ms-flex-align: start;
align-items: flex-start;
}
.tag-container {
margin-left: 0.3rem;
}
.path-line {
color: #444;
text-overflow: ellipsis;
overflow: hidden;
}
body {
overflow-y: scroll;
}
.progress {
margin-top: 1em;
@@ -6,14 +48,23 @@ body {overflow-y:scroll;}
.card {
margin-top: 1em;
box-shadow: 0 .125rem .25rem rgba(0, 0, 0, .075) !important;
}
.sub-document {
background: #AB47BC1F !important;
}
.navbar-brand {
font-size: 1.75rem;
padding: 0;
}
.navbar {
background: #F7F7F7; border-bottom: solid 1px #dfdfdf;
background: #F7F7F7;
border-bottom: solid 1px #dfdfdf;
}
.document {
padding: 0.5rem;
}
@@ -46,6 +97,11 @@ body {overflow-y:scroll;}
background-color: #FFC107;
}
.badge-user {
color: #212529;
background-color: #e0e0e0;
}
.badge-text {
color: #FFFFFF;
background-color: #FAAB3C;
@@ -62,6 +118,9 @@ body {overflow-y:scroll;}
}
.file-title {
width: 100%;
line-height: 1rem;
height: 1.1rem;
font-size: 10pt;
white-space: nowrap;
text-overflow: ellipsis;
@@ -83,10 +142,20 @@ body {overflow-y:scroll;}
height: auto;
}
.fit-sm {
display: block;
max-width: 64px;
max-height: 64px;
margin: 0 auto 0;
width: auto;
height: auto;
}
.audio-fit {
height: 39px;
vertical-align: bottom;
display: inline;
width: 100%;
}
@media (min-width: 1200px) {
@@ -96,16 +165,17 @@ body {overflow-y:scroll;}
}
@media (min-width: 1500px) {
.container {
.container {
max-width: 1440px;
}
.card-columns {
column-count: 5;
}
}
@media (min-width: 1800px) {
.container {
.container {
max-width: 1550px;
}
}
@@ -117,13 +187,15 @@ mark {
}
.content-div {
font-family: SFMono-Regular,Menlo,Monaco,Consolas,"Liberation Mono","Courier New",monospace;
font-family: SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace;
font-size: 13px;
padding: 1em;
background-color: #f5f5f5;
border: 1px solid #ccc;
border-radius: 4px;
margin: 3px;
white-space: normal;
color: #000;
}
.irs-single, .irs-from, .irs-to {
@@ -143,8 +215,7 @@ mark {
margin-bottom: 1em;
}
.inspire-tree .selected > .wholerow, .inspire-tree .selected > .title-wrap:hover + .wholerow
{
.inspire-tree .selected > .wholerow, .inspire-tree .selected > .title-wrap:hover + .wholerow {
background: none;
}
@@ -159,10 +230,78 @@ mark {
.page-indicator {
line-height: 1rem;
padding: 0.5rem;
background: #f8f9fa;
margin-top: 1em;
}
.btn-xs {
padding: .1rem .3rem;
font-size: .875rem;
border-radius: .2rem;
}
}
.nav {
margin-top: 0.5rem;
}
@media (max-width: 800px) {
#treeTabs {
flex-basis: inherit;
flex-grow: inherit;
}
}
.list-group {
margin-top: 1em;
}
.list-group-item {
padding: .25rem 0.5rem;
}
.wrapper-sm {
min-width: 64px;
}
.media-expanded {
display: inherit;
}
.media-expanded .fit {
max-height: 250px;
}
@media (max-width: 600px) {
.media-expanded .fit {
max-height: none;
}
.tagline {
display: none;
}
}
.version {
color: #007bff;
margin-left: -18px;
margin-top: -14px;
font-size: 11px;
}
@media (min-width: 800px) {
.small-btn {
display: none;
}
.large-btn {
display: inherit;
}
}
@media (max-width: 801px) {
.small-btn {
display: inherit;
}
.large-btn {
display: none;
}
}

1
web/css/smartphoto.min.css vendored Normal file

File diff suppressed because one or more lines are too long

Binary file not shown.

Before

Width:  |  Height:  |  Size: 8.3 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 595 B

1
web/js/7_jquery.toast.min.js vendored Normal file

File diff suppressed because one or more lines are too long

Some files were not shown because too many files have changed in this diff Show More