mirror of
				https://github.com/simon987/od-database.git
				synced 2025-10-25 19:56:51 +00:00 
			
		
		
		
	Update README.md
This commit is contained in:
		
							parent
							
								
									bbd5c7694c
								
							
						
					
					
						commit
						fff013f253
					
				| @ -1,5 +1,13 @@ | |||||||
| # OD-Database | # OD-Database | ||||||
| 
 | 
 | ||||||
|  | OD-Database is a web-crawling project that aims to index a very large number of file links and their basic metadata from open directories (misconfigured Apache/Nginx/FTP servers, or more often, mirrors of various public services). | ||||||
|  | 
 | ||||||
|  | Each crawler instance fetches tasks from the central server and pushes the result once completed. A single instance can crawl hundreds of websites at the same time (Both FTP and HTTP(S)) and the central server is capable of ingesting thousands of new documents per second.  | ||||||
|  | 
 | ||||||
|  | The data is indexed into elasticsearch and made available via the web frontend (Currently hosted at https://od-db.the-eye.eu/). There is currently ~1.8 billion files indexed (total of about 300Gb of raw data). The raw data is made available as a CSV file [here](https://od-db.the-eye.eu/dl). | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | ### Contributing    | ||||||
| Suggestions/concerns/PRs are welcome | Suggestions/concerns/PRs are welcome | ||||||
| 
 | 
 | ||||||
| ## Installation | ## Installation | ||||||
|  | |||||||
		Loading…
	
	
			
			x
			
			
		
	
		Reference in New Issue
	
	Block a user