mirror of
				https://github.com/terorie/od-database-crawler.git
				synced 2025-10-31 13:26:51 +00:00 
			
		
		
		
	Fix README.md format
This commit is contained in:
		
							parent
							
								
									c9ff102d80
								
							
						
					
					
						commit
						8b9d8bfd17
					
				
							
								
								
									
										34
									
								
								README.md
									
									
									
									
									
								
							
							
						
						
									
										34
									
								
								README.md
									
									
									
									
									
								
							| @ -26,7 +26,7 @@ https://od-db.the-eye.eu/ | ||||
|     - Start with `./od-database-crawler server <flags>` | ||||
| 
 | ||||
|  3. With Docker | ||||
|     ```dockerfile | ||||
|     ```bash | ||||
|     docker run \ | ||||
|         -e OD_SERVER_URL=xxx \ | ||||
|         -e OD_SERVER_TOKEN=xxx \ | ||||
| @ -37,20 +37,18 @@ https://od-db.the-eye.eu/ | ||||
| 
 | ||||
| Here are the most important config flags. For more fine control, take a look at `/config.yml`. | ||||
| 
 | ||||
| | Flag/Config             | Environment/Docker         | Description                                                  | Example                             | | ||||
| | ----------------------- | -------------------------- | ------------------------------------------------------------ | ----------------------------------- | | ||||
| | `server.url`            | `OD_SERVER_URL`            | OD-DB Server URL                                             | `https://od-db.mine.the-eye.eu/api` | | ||||
| | `server.token`          | `OD_SERVER_TOKEN`          | OD-DB Server Access Token                                    | _Ask Hexa **TM**_                   | | ||||
| | `server.recheck`        | `OD_SERVER_RECHECK`        | Job Fetching Interval                                        | `3s`                                | | ||||
| | `output.crawl_stats`    | `OD_OUTPUT_CRAWL_STATS`    | Crawl Stats Logging Interval (0 = disabled)                  | `500ms`                             | | ||||
| | `output.resource_stats` | `OD_OUTPUT_RESORUCE_STATS` | Resource Stats Logging Interval (0 = disabled)               | `8s`                                | | ||||
| | `output.log`            | `OD_OUTPUT_LOG`            | Log File (none = disabled)                                   | `crawler.log`                       | | ||||
| | `crawl.tasks`           | `OD_CRAWL_TASKS`           | Max number of sites to crawl concurrently                    | `500`                               | | ||||
| | `crawl.connections`     | `OD_CRAWL_CONNECTIONS`     | HTTP connections per site                                    | `1`                                 | | ||||
| | `crawl.retries`         | `OD_CRAWL_RETRIES`         | How often to retry after a temporary failure (e.g. `HTTP 429` or timeouts) | `5`                                 | | ||||
| | `crawl.dial_timeout`    | `OD_CRAWL_DIAL_TIMEOUT`    | TCP Connect timeout                                          | `5s`                                | | ||||
| | `crawl.timeout`         | `OD_CRAWL_TIMEOUT`         | HTTP request timeout                                         | `20s`                               | | ||||
| | `crawl.user-agent`      | `OD_CRAWL_USER_AGENT`      | HTTP Crawler User-Agent                                      | `googlebot/1.2.3`                   | | ||||
| | `crawl.job_buffer`      | `OD_CRAWL_JOB_BUFFER`      | Number of URLs to keep in memory/cache, per job. The rest is offloaded to disk. Decrease this value if the crawler uses too much RAM. (0 = Disable Cache, -1 = Only use Cache) | `5000`                              | | ||||
| 
 | ||||
| 
 | ||||
| | Flag/Environment                                        | Description                                                  | Example                             | | ||||
| | ------------------------------------------------------- | ------------------------------------------------------------ | ----------------------------------- | | ||||
| | `server.url`<br />`OD_SERVER_URL`                       | OD-DB Server URL                                             | `https://od-db.mine.the-eye.eu/api` | | ||||
| | `server.token`<br />`OD_SERVER_TOKEN`                   | OD-DB Server Access Token                                    | _Ask Hexa **TM**_                   | | ||||
| | `server.recheck`<br />`OD_SERVER_RECHECK`               | Job Fetching Interval                                        | `3s`                                | | ||||
| | `output.crawl_stats`<br />`OD_OUTPUT_CRAWL_STATS`       | Crawl Stats Logging Interval (0 = disabled)                  | `500ms`                             | | ||||
| | `output.resource_stats`<br />`OD_OUTPUT_RESORUCE_STATS` | Resource Stats Logging Interval (0 = disabled)               | `8s`                                | | ||||
| | `output.log`<br />`OD_OUTPUT_LOG`                       | Log File (none = disabled)                                   | `crawler.log`                       | | ||||
| | `crawl.tasks`<br />`OD_CRAWL_TASKS`                     | Max number of sites to crawl concurrently                    | `500`                               | | ||||
| | `crawl.connections`<br />`OD_CRAWL_CONNECTIONS`         | HTTP connections per site                                    | `1`                                 | | ||||
| | `crawl.retries`<br />`OD_CRAWL_RETRIES`                 | How often to retry after a temporary failure (e.g. `HTTP 429` or timeouts) | `5`                                 | | ||||
| | `crawl.dial_timeout`<br />`OD_CRAWL_DIAL_TIMEOUT`       | TCP Connect timeout                                          | `5s`                                | | ||||
| | `crawl.timeout`<br />`OD_CRAWL_TIMEOUT`                 | HTTP request timeout                                         | `20s`                               | | ||||
| | `crawl.user-agent`<br />`OD_CRAWL_USER_AGENT`           | HTTP Crawler User-Agent                                      | `googlebot/1.2.3`                   | | ||||
| | `crawl.job_buffer`<br />`OD_CRAWL_JOB_BUFFER`           | Number of URLs to keep in memory/cache, per job. The rest is offloaded to disk. Decrease this value if the crawler uses too much RAM. (0 = Disable Cache, -1 = Only use Cache) | `5000`                              | | ||||
|  | ||||
		Loading…
	
	
			
			x
			
			
		
	
		Reference in New Issue
	
	Block a user