Mark Veidemanis
|
51a9b2af79
|
Improve memory usage and fix 4chan crawler
|
2 years ago |
Mark Veidemanis
|
dc1ed1fe10
|
Print the length of the flattened list in debug message
|
2 years ago |
Mark Veidemanis
|
f774f4c2d2
|
Add some environment variables to control debug output
|
2 years ago |
Mark Veidemanis
|
06e80a9759
|
Time stuff and switch to gensim for tokenisation
|
2 years ago |
Mark Veidemanis
|
02ff44a6f5
|
Use only one Redis key for the queue to make chunk size more precise for thread allocation
|
2 years ago |
Mark Veidemanis
|
09fc63d0ad
|
Make debug output cleaner
|
2 years ago |
Mark Veidemanis
|
4a60dec964
|
Remove debugging code and fix regex substitution
|
2 years ago |
Mark Veidemanis
|
ced3a251b2
|
Normalise fields in processing and remove invalid characters
|
2 years ago |
Mark Veidemanis
|
31c58dd85b
|
Make CPU threads configurable
|
2 years ago |
Mark Veidemanis
|
a89b5a8b6f
|
Implement sentiment/NLP annotation and optimise processing
|
2 years ago |
Mark Veidemanis
|
f432e9b29e
|
Properly process Redis buffered messages and ingest into Kafka
|
2 years ago |
Mark Veidemanis
|
c5f01c3084
|
Ingest into Kafka and queue messages better
|
2 years ago |