Commit Graph

15 Commits

Author SHA1 Message Date
1aeadaf3b7
Stringify tokens and return message number from processing 2025-01-23 11:29:07 +00:00
02071758b5
Send messages to Neptune Redis via PubSub 2023-01-12 07:20:43 +00:00
49f46c33ba
Fully implement Elasticsearch indexing 2022-11-22 20:15:02 +00:00
51a9b2af79
Improve memory usage and fix 4chan crawler 2022-10-21 07:20:30 +01:00
dc1ed1fe10
Print the length of the flattened list in debug message 2022-10-21 07:20:30 +01:00
f774f4c2d2
Add some environment variables to control debug output 2022-10-21 07:20:30 +01:00
06e80a9759 Time stuff and switch to gensim for tokenisation 2022-10-01 14:46:45 +01:00
02ff44a6f5 Use only one Redis key for the queue to make chunk size more precise for thread allocation 2022-09-30 07:22:22 +01:00
09fc63d0ad Make debug output cleaner 2022-09-22 17:39:29 +01:00
4a60dec964 Remove debugging code and fix regex substitution 2022-09-21 12:48:54 +01:00
ced3a251b2 Normalise fields in processing and remove invalid characters 2022-09-21 10:01:12 +01:00
31c58dd85b Make CPU threads configurable 2022-09-20 22:29:13 +01:00
a89b5a8b6f Implement sentiment/NLP annotation and optimise processing 2022-09-16 17:09:49 +01:00
f432e9b29e Properly process Redis buffered messages and ingest into Kafka 2022-09-14 18:32:32 +01:00
c5f01c3084 Ingest into Kafka and queue messages better 2022-09-13 22:17:46 +01:00