Update readme

This commit is contained in:
Simon Fortier 2019-08-09 10:16:36 -04:00
parent f664a6f0df
commit 04befa5e0e
4 changed files with 41 additions and 4 deletions

View File

@ -1,7 +1,29 @@
# reddit_feed # reddit_feed
Fetches comments & submissions from reddit and publishes Fault-tolerant daemon that fetches comments &
serialised JSON to RabbitMQ for real-time ingest. submissions from reddit and publishes serialised
JSON to RabbitMQ for real-time ingest.
Can read up to 100 items per second, as per the API limits Can read up to 100 items per second (~6k/min), as per the API limits for
(60 requests per minute, 100 items per request). a single client.
Can optionally push monitoring data to InfluxDB. Below is an
example of Grafana being used to display it.
The daemon will attempt to stay exactly `REALTIME_DELAY` (60 by default)
seconds 'behind' realtime items. Of course, it can lag several minutes
behind when the number of events exceeds 100/s. In this case,
it will eventually catch up to the target time during lower traffic
hours.
![monitoring](monitoring.png)
Tested on GNU/Linux amd64, it uses ~60Mb of memory.
### Usage
```
python run.py <RabbitMQ host>
```
(It is recommended to run with `supervisord`,
see [supervisord_reddit_feed.ini](supervisord_reddit_feed.ini))

BIN
monitoring.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 66 KiB

2
run.py Normal file → Executable file
View File

@ -1,3 +1,5 @@
#!/bin/env python
import datetime import datetime
import json import json
import logging import logging

View File

@ -0,0 +1,13 @@
; Move this file to /etc/supervisor.d/ to enable it
; Make sure to change the RabbitMQ host, the path and the user!.
; Logs will be saved to directory/reddit_feed.log.
; To enable it:
; sudo supervisorctl
; >update
; >start reddit_feed
[program:reddit_feed]
command=/path/to/reddit_feed/run.py 172.17.0.2
directory=/path/to/reddit_feed
user=some_user