Flag explanation in README.md

2025-12-14 15:49:02 +00:00 · 2019-02-22 05:59:59 +01:00
parent 9e9b606250
commit 88856c1c19
1 changed files with 34 additions and 2 deletions
--- a/README.md
+++ b/README.md
@@ -9,7 +9,9 @@
 https://od-db.the-eye.eu/
-#### Usage
+## Usage
 ### Deploys
 1. With Config File (if `config.yml` found in working dir)
    - Download [default config](https://github.com/terorie/od-database-crawler/blob/master/config.yml)
@@ -22,3 +24,33 @@ https://od-db.the-eye.eu/
    - Every flag is available as an environment variable:
      `--server.crawl_stats` ➡️ `OD_SERVER_CRAWL_STATS`
    - Start with `./od-database-crawler server <flags>`
 3. With Docker
    ```dockerfile
    docker run \
        -e OD_SERVER_URL=xxx \
        -e OD_SERVER_TOKEN=xxx \
        terorie/od-database-crawler
    ```
 ### Flag reference
 Here are the most important config flags. For more fine control, take a look at `/config.yml`.
 | Flag/Config             | Environment/Docker         | Description                                                  | Example                             |
 | ----------------------- | -------------------------- | ------------------------------------------------------------ | ----------------------------------- |
 | `server.url`            | `OD_SERVER_URL`            | OD-DB Server URL                                             | `https://od-db.mine.the-eye.eu/api` |
 | `server.token`          | `OD_SERVER_TOKEN`          | OD-DB Server Access Token                                    | _Ask Hexa **TM**_                   |
 | `server.recheck`        | `OD_SERVER_RECHECK`        | Job Fetching Interval                                        | `3s`                                |
 | `output.crawl_stats`    | `OD_OUTPUT_CRAWL_STATS`    | Crawl Stats Logging Interval (0 = disabled)                  | `500ms`                             |
 | `output.resource_stats` | `OD_OUTPUT_RESORUCE_STATS` | Resource Stats Logging Interval (0 = disabled)               | `8s`                                |
 | `output.log`            | `OD_OUTPUT_LOG`            | Log File (none = disabled)                                   | `crawler.log`                       |
 | `crawl.tasks`           | `OD_CRAWL_TASKS`           | Max number of sites to crawl concurrently                    | `500`                               |
 | `crawl.connections`     | `OD_CRAWL_CONNECTIONS`     | HTTP connections per site                                    | `1`                                 |
 | `crawl.retries`         | `OD_CRAWL_RETRIES`         | How often to retry after a temporary failure (e.g. `HTTP 429` or timeouts) | `5`                                 |
 | `crawl.dial_timeout`    | `OD_CRAWL_DIAL_TIMEOUT`    | TCP Connect timeout                                          | `5s`                                |
 | `crawl.timeout`         | `OD_CRAWL_TIMEOUT`         | HTTP request timeout                                         | `20s`                               |
 | `crawl.user-agent`      | `OD_CRAWL_USER_AGENT`      | HTTP Crawler User-Agent                                      | `googlebot/1.2.3`                   |
 | `crawl.job_buffer`      | `OD_CRAWL_JOB_BUFFER`      | Number of URLs to keep in memory/cache, per job. The rest is offloaded to disk. Decrease this value if the crawler uses too much RAM. (0 = Disable Cache, -1 = Only use Cache) | `5000`                              |