update about page

2025-04-09 21:46:42 +00:00 · 2019-11-03 21:58:59 -05:00 · 2019-11-03 21:58:59 -05:00 · 5270edcc89
commit 5270edcc89
parent fe50947f93
6 changed files with 32 additions and 6 deletions
--- a/content/about/index.md
+++ b/content/about/index.md
@ -2,7 +2,6 @@
 title: "About"
 date: 2019-09-13T09:30:47-04:00
 draft: false
-author: "simon987"
 ---

 Source code of this website can be found [here](https://github.com/simon987/dataarchivist.net).
--- a/content/posts/cf_1.md
+++ b/content/posts/cf_1.md
@ -34,7 +34,7 @@ for item, board in scanner.all_posts():
 ## Deduplication

 To avoid publishing the same item twice, the application keeps track of what items were visited in its **state**. 
-Items that have the same `last_modified`, `reply_count` or `timestamp` value as the state doesn't need to be visited again.
+Items that have the same `last_modified`, `reply_count` or `timestamp` value as the state don't need to be visited again.

 This deduplication step greatly reduces the amount of HTTP requests necessary to stay up to date, and more importantly,
 it enables the crawler to quickly resume where it left off in the case of a fault.
--- a/content/posts/scrape_1.md
+++ b/content/posts/scrape_1.md
@ -57,7 +57,7 @@ with `l.b.MD5_SECRET_KEY` as the salt.

 {{< figure src="/scrape/dev_tools3.png" title="MD5_SECRET_KEY hidden in plain sight">}}

-The secret code was hidden only a few keystrokes into the source. Now that we have all the puzzle pieces,
+The secret code was hidden only a few keystrokes away into the source. Now that we have all the puzzle pieces,
 let's hack together a simple Python script to automate the download process: 

 {{<highlight python "linenos=table,linenostart=18">}}
--- a/content/posts/tt_1.md
+++ b/content/posts/tt_1.md
@ -33,7 +33,7 @@ def search_artist(name, mbid):
        conn.commit()
 {{</highlight>}}

-I need to call `search_artist()` about 350000 times and I don't want to bother setting up multithreading, error handling and
+I need to call `search_artist()` about 350'000 times and I don't want to bother setting up multithreading, error handling and
 keeping the script up to date on an arbitrary server so let's integrate it in the tracker.

 ## Configurating the task_tracker project
@ -60,6 +60,7 @@ The way **task_tracker_drone** works is by passing the task object and project s
 executable file called `run` in the root of the git repository. It also expects a json
 object telling it if the task was processed successfully, and if there are additionnal actions that needs to be executed:

+**Expected result in stdout**:
 {{<highlight json >}}
 {
  "result": 1,
@ -80,7 +81,7 @@ The way **task_tracker_drone** works is by passing the task object and project s

 This is what the body of the final worker script looks like:

-The program expects the task recipe and project secret as arguments, and it outputs the result object
+The program receives the task recipe and project secret as arguments, and it outputs the result object
 to stdout.

 {{<highlight python >}}
@ -119,5 +120,30 @@ print(json.dumps({
 {{</highlight>}}


+## Allocating worker machines

-{{< figure src="/tt/perms.png" title="Private project require approval">}}
+On the worker machines, you can execute the task runner and it will automatically start
+working on the available projects. Private projects require explicit explicit approval to start executing tasks:
+
+{{<highlight bash >}}
+git clone https://github.com/simon987/task_tracker_drone
+cd task_tracker_drone
+python -m pip install -r requirements.txt
+
+python ./src/drone.py "https://exemple-api-url.com/api" "worker alias"
+
+# Request access for 1 r={"ok":true}
+# Request access for 2 r={"ok":true}
+# Request access for 3 r={"ok":true}
+# Starting 10 working contexts
+# No tasks, waiting...
+{{</highlight>}}
+
+
+{{< figure src="/tt/perms.png" title="Private projects require approval">}}
+
+As soon as you give permission to the worker, it will automatically start executing tasks.
+When a task fails, it will be put back in the task queue up to `task["max_retries"]` times.
+The logs can be found on the web interface:
+
+{{< figure src="/tt/logs.png" title="Logs page">}}
--- a/jenkins/Jenkinsfile
+++ b/jenkins/Jenkinsfile
@ -8,6 +8,7 @@ remote.allowAnyHosts = true
 remote.retryCount = 3
 remote.retryWaitSec = 3
 logLevel = 'FINER'
+remote.port = 2299

 pipeline {
    agent none
--- a/static/tt/logs.png
+++ b/static/tt/logs.png