Mark Veidemanis
|
49f46c33ba
|
Fully implement Elasticsearch indexing
|
2022-11-22 20:15:02 +00:00 |
Mark Veidemanis
|
51a9b2af79
|
Improve memory usage and fix 4chan crawler
|
2022-10-21 07:20:30 +01:00 |
Mark Veidemanis
|
dc1ed1fe10
|
Print the length of the flattened list in debug message
|
2022-10-21 07:20:30 +01:00 |
Mark Veidemanis
|
f774f4c2d2
|
Add some environment variables to control debug output
|
2022-10-21 07:20:30 +01:00 |
Mark Veidemanis
|
06e80a9759
|
Time stuff and switch to gensim for tokenisation
|
2022-10-01 14:46:45 +01:00 |
Mark Veidemanis
|
02ff44a6f5
|
Use only one Redis key for the queue to make chunk size more precise for thread allocation
|
2022-09-30 07:22:22 +01:00 |
Mark Veidemanis
|
09fc63d0ad
|
Make debug output cleaner
|
2022-09-22 17:39:29 +01:00 |
Mark Veidemanis
|
4a60dec964
|
Remove debugging code and fix regex substitution
|
2022-09-21 12:48:54 +01:00 |
Mark Veidemanis
|
ced3a251b2
|
Normalise fields in processing and remove invalid characters
|
2022-09-21 10:01:12 +01:00 |
Mark Veidemanis
|
31c58dd85b
|
Make CPU threads configurable
|
2022-09-20 22:29:13 +01:00 |
Mark Veidemanis
|
a89b5a8b6f
|
Implement sentiment/NLP annotation and optimise processing
|
2022-09-16 17:09:49 +01:00 |
Mark Veidemanis
|
f432e9b29e
|
Properly process Redis buffered messages and ingest into Kafka
|
2022-09-14 18:32:32 +01:00 |
Mark Veidemanis
|
c5f01c3084
|
Ingest into Kafka and queue messages better
|
2022-09-13 22:17:46 +01:00 |