mirror of
https://github.com/simon987/sist2.git
synced 2025-04-04 07:52:59 +00:00
Update docs
This commit is contained in:
parent
5f0957d029
commit
d9d77de47f
@ -157,6 +157,7 @@ indices, but it uses much less memory and is easier to set up.
|
||||
| Manual tagging | ✓ | ✓ |
|
||||
| User scripts | ✓ | ✓ |
|
||||
| Media Type breakdown for search results | | ✓ |
|
||||
| Embeddings search | ✓ *O(n)* | ✓ *O(logn)* |
|
||||
|
||||
### NER
|
||||
|
||||
|
@ -175,6 +175,32 @@ Using a version >=7.14.0 is recommended to enable the following features:
|
||||
When using a legacy version of ES, a notice will be displayed next to the sist2 version in the web UI.
|
||||
If you don't care about the features above, you can ignore it or disable it in the configuration page.
|
||||
|
||||
# Embeddings search
|
||||
|
||||
Since v3.2.0, User scripts can be used to generate _embeddings_ (vector of float32 numbers) which are stored in the .sist2 index file
|
||||
(see [scripting](scripting.md)). Embeddings can be used for:
|
||||
|
||||
* Nearest-neighbor queries (e.g. "return the documents most similar to this one")
|
||||
* Semantic searches (e.g. "return the documents that are most closely related to the given topic")
|
||||
|
||||
In theory, embeddings can be created for any type of documents (image, text, audio etc.).
|
||||
|
||||
For example, the [clip](https://github.com/simon987/sist2-script-clip) User Script, generates 512-d embeddings of images
|
||||
(videos are also supported using the thumbnails generated by sist2). When the user enters a query in the "Embeddings Search"
|
||||
textbox, the query's embedding is generated in their browser, leveraging the ONNX web runtime.
|
||||
|
||||
<details>
|
||||
<summary>Screenshots</summary>
|
||||
|
||||

|
||||

|
||||
|
||||
1. Embeddings search bar. You can select the model using the dropdown on the left.
|
||||
2. This icon appears for indices with embeddings search enabled.
|
||||
3. Documents with this icon have embeddings. Click on the icon to perform KNN search.
|
||||
</details>
|
||||
|
||||
|
||||
# Tagging
|
||||
|
||||
### Manual tagging
|
||||
@ -199,43 +225,4 @@ See [Automatic tagging](#automatic-tagging) for information about tag
|
||||
|
||||
### Automatic tagging
|
||||
|
||||
See [scripting](scripting.md) documentation.
|
||||
|
||||
# Sidecar files
|
||||
|
||||
When scanning, sist2 will read metadata from `.s2meta` JSON files and overwrite the
|
||||
original document's indexed metadata (does not modify the actual file). Sidecar metadata files will also work inside archives.
|
||||
Sidecar files themselves are not saved in the index.
|
||||
|
||||
This feature is useful to leverage third-party applications such as speech-to-text or
|
||||
OCR to add additional metadata to a file.
|
||||
|
||||
**Example**
|
||||
|
||||
```
|
||||
~/Documents/
|
||||
├── Video.mp4
|
||||
└── Video.mp4.s2meta
|
||||
```
|
||||
|
||||
The sidecar file must have exactly the same file path and the `.s2meta` suffix.
|
||||
|
||||
`Video.mp4.s2meta`:
|
||||
```json
|
||||
{
|
||||
"content": "This sidecar file will overwrite some metadata fields of Video.mp4",
|
||||
"author": "Some author",
|
||||
"duration": 12345,
|
||||
"bitrate": 67890,
|
||||
"some_arbitrary_field": [1,2,3]
|
||||
}
|
||||
```
|
||||
|
||||
```
|
||||
sist2 scan ~/Documents -o ./docs.sist2
|
||||
sist2 index ./docs.sist2
|
||||
```
|
||||
|
||||
*NOTE*: It is technically possible to overwrite the `tag` value using sidecar files, however,
|
||||
it is not currently possible to restore both manual tags and sidecar tags without user scripts
|
||||
while reindexing.
|
||||
See [scripting](scripting.md) documentation.
|
BIN
docs/embeddings-1.png
Normal file
BIN
docs/embeddings-1.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 90 KiB |
BIN
docs/embeddings-2.png
Normal file
BIN
docs/embeddings-2.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 996 KiB |
Loading…
x
Reference in New Issue
Block a user