mirror of
https://github.com/simon987/sist2.git
synced 2025-04-10 05:56:46 +00:00
Update docs
This commit is contained in:
parent
5f0957d029
commit
d9d77de47f
@ -157,6 +157,7 @@ indices, but it uses much less memory and is easier to set up.
|
|||||||
| Manual tagging | ✓ | ✓ |
|
| Manual tagging | ✓ | ✓ |
|
||||||
| User scripts | ✓ | ✓ |
|
| User scripts | ✓ | ✓ |
|
||||||
| Media Type breakdown for search results | | ✓ |
|
| Media Type breakdown for search results | | ✓ |
|
||||||
|
| Embeddings search | ✓ *O(n)* | ✓ *O(logn)* |
|
||||||
|
|
||||||
### NER
|
### NER
|
||||||
|
|
||||||
|
@ -175,6 +175,32 @@ Using a version >=7.14.0 is recommended to enable the following features:
|
|||||||
When using a legacy version of ES, a notice will be displayed next to the sist2 version in the web UI.
|
When using a legacy version of ES, a notice will be displayed next to the sist2 version in the web UI.
|
||||||
If you don't care about the features above, you can ignore it or disable it in the configuration page.
|
If you don't care about the features above, you can ignore it or disable it in the configuration page.
|
||||||
|
|
||||||
|
# Embeddings search
|
||||||
|
|
||||||
|
Since v3.2.0, User scripts can be used to generate _embeddings_ (vector of float32 numbers) which are stored in the .sist2 index file
|
||||||
|
(see [scripting](scripting.md)). Embeddings can be used for:
|
||||||
|
|
||||||
|
* Nearest-neighbor queries (e.g. "return the documents most similar to this one")
|
||||||
|
* Semantic searches (e.g. "return the documents that are most closely related to the given topic")
|
||||||
|
|
||||||
|
In theory, embeddings can be created for any type of documents (image, text, audio etc.).
|
||||||
|
|
||||||
|
For example, the [clip](https://github.com/simon987/sist2-script-clip) User Script, generates 512-d embeddings of images
|
||||||
|
(videos are also supported using the thumbnails generated by sist2). When the user enters a query in the "Embeddings Search"
|
||||||
|
textbox, the query's embedding is generated in their browser, leveraging the ONNX web runtime.
|
||||||
|
|
||||||
|
<details>
|
||||||
|
<summary>Screenshots</summary>
|
||||||
|
|
||||||
|

|
||||||
|

|
||||||
|
|
||||||
|
1. Embeddings search bar. You can select the model using the dropdown on the left.
|
||||||
|
2. This icon appears for indices with embeddings search enabled.
|
||||||
|
3. Documents with this icon have embeddings. Click on the icon to perform KNN search.
|
||||||
|
</details>
|
||||||
|
|
||||||
|
|
||||||
# Tagging
|
# Tagging
|
||||||
|
|
||||||
### Manual tagging
|
### Manual tagging
|
||||||
@ -199,43 +225,4 @@ See [Automatic tagging](#automatic-tagging) for information about tag
|
|||||||
|
|
||||||
### Automatic tagging
|
### Automatic tagging
|
||||||
|
|
||||||
See [scripting](scripting.md) documentation.
|
See [scripting](scripting.md) documentation.
|
||||||
|
|
||||||
# Sidecar files
|
|
||||||
|
|
||||||
When scanning, sist2 will read metadata from `.s2meta` JSON files and overwrite the
|
|
||||||
original document's indexed metadata (does not modify the actual file). Sidecar metadata files will also work inside archives.
|
|
||||||
Sidecar files themselves are not saved in the index.
|
|
||||||
|
|
||||||
This feature is useful to leverage third-party applications such as speech-to-text or
|
|
||||||
OCR to add additional metadata to a file.
|
|
||||||
|
|
||||||
**Example**
|
|
||||||
|
|
||||||
```
|
|
||||||
~/Documents/
|
|
||||||
├── Video.mp4
|
|
||||||
└── Video.mp4.s2meta
|
|
||||||
```
|
|
||||||
|
|
||||||
The sidecar file must have exactly the same file path and the `.s2meta` suffix.
|
|
||||||
|
|
||||||
`Video.mp4.s2meta`:
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"content": "This sidecar file will overwrite some metadata fields of Video.mp4",
|
|
||||||
"author": "Some author",
|
|
||||||
"duration": 12345,
|
|
||||||
"bitrate": 67890,
|
|
||||||
"some_arbitrary_field": [1,2,3]
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
```
|
|
||||||
sist2 scan ~/Documents -o ./docs.sist2
|
|
||||||
sist2 index ./docs.sist2
|
|
||||||
```
|
|
||||||
|
|
||||||
*NOTE*: It is technically possible to overwrite the `tag` value using sidecar files, however,
|
|
||||||
it is not currently possible to restore both manual tags and sidecar tags without user scripts
|
|
||||||
while reindexing.
|
|
BIN
docs/embeddings-1.png
Normal file
BIN
docs/embeddings-1.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 90 KiB |
BIN
docs/embeddings-2.png
Normal file
BIN
docs/embeddings-2.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 996 KiB |
Loading…
x
Reference in New Issue
Block a user