Update readme

This commit is contained in:
Simon Fortier 2019-08-09 10:16:36 -04:00
parent f664a6f0df
commit 04befa5e0e
4 changed files with 41 additions and 4 deletions

View File

@ -1,7 +1,29 @@
# reddit_feed
Fetches comments & submissions from reddit and publishes
serialised JSON to RabbitMQ for real-time ingest.
Fault-tolerant daemon that fetches comments &
submissions from reddit and publishes serialised
JSON to RabbitMQ for real-time ingest.
Can read up to 100 items per second, as per the API limits
(60 requests per minute, 100 items per request).
Can read up to 100 items per second (~6k/min), as per the API limits for
a single client.
Can optionally push monitoring data to InfluxDB. Below is an
example of Grafana being used to display it.
The daemon will attempt to stay exactly `REALTIME_DELAY` (60 by default)
seconds 'behind' realtime items. Of course, it can lag several minutes
behind when the number of events exceeds 100/s. In this case,
it will eventually catch up to the target time during lower traffic
hours.
![monitoring](monitoring.png)
Tested on GNU/Linux amd64, it uses ~60Mb of memory.
### Usage
```
python run.py <RabbitMQ host>
```
(It is recommended to run with `supervisord`,
see [supervisord_reddit_feed.ini](supervisord_reddit_feed.ini))

BIN
monitoring.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 66 KiB

2
run.py Normal file → Executable file
View File

@ -1,3 +1,5 @@
#!/bin/env python
import datetime
import json
import logging

View File

@ -0,0 +1,13 @@
; Move this file to /etc/supervisor.d/ to enable it
; Make sure to change the RabbitMQ host, the path and the user!.
; Logs will be saved to directory/reddit_feed.log.
; To enable it:
; sudo supervisorctl
; >update
; >start reddit_feed
[program:reddit_feed]
command=/path/to/reddit_feed/run.py 172.17.0.2
directory=/path/to/reddit_feed
user=some_user