mirror of
https://github.com/simon987/od-database.git
synced 2025-04-20 02:46:45 +00:00
Update README.md
This commit is contained in:
parent
fff013f253
commit
0c423ee9a9
@ -4,7 +4,9 @@ OD-Database is a web-crawling project that aims to index a very large number of
|
|||||||
|
|
||||||
Each crawler instance fetches tasks from the central server and pushes the result once completed. A single instance can crawl hundreds of websites at the same time (Both FTP and HTTP(S)) and the central server is capable of ingesting thousands of new documents per second.
|
Each crawler instance fetches tasks from the central server and pushes the result once completed. A single instance can crawl hundreds of websites at the same time (Both FTP and HTTP(S)) and the central server is capable of ingesting thousands of new documents per second.
|
||||||
|
|
||||||
The data is indexed into elasticsearch and made available via the web frontend (Currently hosted at https://od-db.the-eye.eu/). There is currently ~1.8 billion files indexed (total of about 300Gb of raw data). The raw data is made available as a CSV file [here](https://od-db.the-eye.eu/dl).
|
The data is indexed into elasticsearch and made available via the web frontend (Currently hosted at https://od-db.the-eye.eu/). There is currently ~1.93 billion files indexed (total of about 300Gb of raw data). The raw data is made available as a CSV file [here](https://od-db.the-eye.eu/dl).
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
|
||||||
### Contributing
|
### Contributing
|
||||||
|
Loading…
x
Reference in New Issue
Block a user