Commit Graph

24 Commits

Author SHA1 Message Date
Mark Veidemanis 63081f68b7
Use only one Redis key for the queue to make chunk size more precise for thread allocation 2022-09-30 07:22:22 +01:00
Mark Veidemanis a8dbabd85e
Implement uvloop 2022-09-23 07:20:30 +01:00
Mark Veidemanis 56b5c85fac
Print Ingest settings on start 2022-09-23 08:32:29 +01:00
Mark Veidemanis d6d19625f3
Remove commented code for debugging 2022-09-21 10:02:05 +01:00
Mark Veidemanis cf4aa45663
Normalise fields in processing and remove invalid characters 2022-09-21 10:01:12 +01:00
Mark Veidemanis 027c43b60a
Don't muddle up the topics when sending Kafka batches 2022-09-20 23:03:02 +01:00
Mark Veidemanis 2c5133a546
Make performance settings configurable 2022-09-20 22:22:13 +01:00
Mark Veidemanis 143f2a0bf0
Implement sentiment/NLP annotation and optimise processing 2022-09-16 17:09:49 +01:00
Mark Veidemanis 4ea77ac543
Properly process Redis buffered messages and ingest into Kafka 2022-09-14 18:32:32 +01:00
Mark Veidemanis fec0d379a6
Ingest into Kafka and queue messages better 2022-09-13 22:17:46 +01:00
Mark Veidemanis bf802d7fdf
Reformat 2022-09-07 07:20:30 +01:00
Mark Veidemanis cdd12cd082
Implement threshold writing to Redis and manticore ingesting from Redis 2022-09-07 07:20:30 +01:00
Mark Veidemanis 62fe03a6cb
Increase thread delay time 2022-09-05 07:20:30 +01:00
Mark Veidemanis 297bbbe035
Alter schemas and 4chan performance settings 2022-09-05 07:20:30 +01:00
Mark Veidemanis ed7c439b56
Remove some debugging code 2022-09-05 07:20:30 +01:00
Mark Veidemanis 9c9d49dcd2
Reformat and set the net and channel for 4chan 2022-09-05 07:20:30 +01:00
Mark Veidemanis dcd648e1d2
Make crawler more efficient and implement configurable parameters 2022-09-05 07:20:30 +01:00
Mark Veidemanis 318a8ddbd5
Split thread list into chunks to save memory 2022-09-05 07:20:30 +01:00
Mark Veidemanis 20e22ae7ca
Reformat code 2022-09-04 21:40:04 +01:00
Mark Veidemanis 8feccbbf00
Reinstate Redis cache 2022-09-04 21:38:53 +01:00
Mark Veidemanis db46fea550
Run processing in thread 2022-09-04 21:29:00 +01:00
Mark Veidemanis 22cef33342
Implement aiohttp 2022-09-04 19:44:25 +01:00
Mark Veidemanis 663a26778d
Begin implementing aiohttp 2022-09-04 13:47:32 +01:00
Mark Veidemanis 36de004ee5
Implement running Discord and 4chan gathering simultaneously 2022-09-02 22:30:45 +01:00