30 Commits (master)

Author SHA1 Message Date
Mark Veidemanis 210237b50a
Update pre-commit versions 1 year ago
Mark Veidemanis 0ab67becff
Give option for only crawling some boards 1 year ago
Mark Veidemanis 51a9b2af79
Improve memory usage and fix 4chan crawler 2 years ago
Mark Veidemanis e32b330ef4
Switch to SSDB for message queueing 2 years ago
Mark Veidemanis ab5e85c5c6 Begin switching away from Redis 2 years ago
Mark Veidemanis 5c91f1af87 Remove commented debug code 2 years ago
Mark Veidemanis 02ff44a6f5 Use only one Redis key for the queue to make chunk size more precise for thread allocation 2 years ago
Mark Veidemanis a2f88e29e6 Implement uvloop 2 years ago
Mark Veidemanis f0df3e80fd Print Ingest settings on start 2 years ago
Mark Veidemanis 5ebae02bf2 Remove commented code for debugging 2 years ago
Mark Veidemanis ced3a251b2 Normalise fields in processing and remove invalid characters 2 years ago
Mark Veidemanis 2763e52e6b Don't muddle up the topics when sending Kafka batches 2 years ago
Mark Veidemanis 40a0c2d22e Make performance settings configurable 2 years ago
Mark Veidemanis a89b5a8b6f Implement sentiment/NLP annotation and optimise processing 2 years ago
Mark Veidemanis f432e9b29e Properly process Redis buffered messages and ingest into Kafka 2 years ago
Mark Veidemanis c5f01c3084 Ingest into Kafka and queue messages better 2 years ago
Mark Veidemanis c2bdb3fd15 Reformat 2 years ago
Mark Veidemanis 5c3b338017 Implement threshold writing to Redis and manticore ingesting from Redis 2 years ago
Mark Veidemanis 7bb2264d91 Increase thread delay time 2 years ago
Mark Veidemanis 1858e06c4b Alter schemas and 4chan performance settings 2 years ago
Mark Veidemanis ddcfa614ad Remove some debugging code 2 years ago
Mark Veidemanis d1c6bd1fb5 Reformat and set the net and channel for 4chan 2 years ago
Mark Veidemanis b8d2ecc009 Make crawler more efficient and implement configurable parameters 2 years ago
Mark Veidemanis f8fc5e1a1b Split thread list into chunks to save memory 2 years ago
Mark Veidemanis 6e00f70184 Reformat code 2 years ago
Mark Veidemanis 0f717b987d Reinstate Redis cache 2 years ago
Mark Veidemanis 60c43b4eb5 Run processing in thread 2 years ago
Mark Veidemanis db23b31f30 Implement aiohttp 2 years ago
Mark Veidemanis f7860bf08b Begin implementing aiohttp 2 years ago
Mark Veidemanis 734a2b7879 Implement running Discord and 4chan gathering simultaneously 2 years ago