Mark Veidemanis
|
0ab67becff
|
Give option for only crawling some boards
|
2022-12-22 07:20:26 +00:00 |
Mark Veidemanis
|
51a9b2af79
|
Improve memory usage and fix 4chan crawler
|
2022-10-21 07:20:30 +01:00 |
Mark Veidemanis
|
e32b330ef4
|
Switch to SSDB for message queueing
|
2022-10-21 11:53:29 +01:00 |
Mark Veidemanis
|
ab5e85c5c6
|
Begin switching away from Redis
|
2022-10-21 11:14:51 +01:00 |
Mark Veidemanis
|
5c91f1af87
|
Remove commented debug code
|
2022-09-30 07:22:22 +01:00 |
Mark Veidemanis
|
02ff44a6f5
|
Use only one Redis key for the queue to make chunk size more precise for thread allocation
|
2022-09-30 07:22:22 +01:00 |
Mark Veidemanis
|
a2f88e29e6
|
Implement uvloop
|
2022-09-23 07:20:30 +01:00 |
Mark Veidemanis
|
f0df3e80fd
|
Print Ingest settings on start
|
2022-09-23 08:32:29 +01:00 |
Mark Veidemanis
|
5ebae02bf2
|
Remove commented code for debugging
|
2022-09-21 10:02:05 +01:00 |
Mark Veidemanis
|
ced3a251b2
|
Normalise fields in processing and remove invalid characters
|
2022-09-21 10:01:12 +01:00 |
Mark Veidemanis
|
2763e52e6b
|
Don't muddle up the topics when sending Kafka batches
|
2022-09-20 23:03:02 +01:00 |
Mark Veidemanis
|
40a0c2d22e
|
Make performance settings configurable
|
2022-09-20 22:22:13 +01:00 |
Mark Veidemanis
|
a89b5a8b6f
|
Implement sentiment/NLP annotation and optimise processing
|
2022-09-16 17:09:49 +01:00 |
Mark Veidemanis
|
f432e9b29e
|
Properly process Redis buffered messages and ingest into Kafka
|
2022-09-14 18:32:32 +01:00 |
Mark Veidemanis
|
c5f01c3084
|
Ingest into Kafka and queue messages better
|
2022-09-13 22:17:46 +01:00 |
Mark Veidemanis
|
c2bdb3fd15
|
Reformat
|
2022-09-07 07:20:30 +01:00 |
Mark Veidemanis
|
5c3b338017
|
Implement threshold writing to Redis and manticore ingesting from Redis
|
2022-09-07 07:20:30 +01:00 |
Mark Veidemanis
|
7bb2264d91
|
Increase thread delay time
|
2022-09-05 07:20:30 +01:00 |
Mark Veidemanis
|
1858e06c4b
|
Alter schemas and 4chan performance settings
|
2022-09-05 07:20:30 +01:00 |
Mark Veidemanis
|
ddcfa614ad
|
Remove some debugging code
|
2022-09-05 07:20:30 +01:00 |
Mark Veidemanis
|
d1c6bd1fb5
|
Reformat and set the net and channel for 4chan
|
2022-09-05 07:20:30 +01:00 |
Mark Veidemanis
|
b8d2ecc009
|
Make crawler more efficient and implement configurable parameters
|
2022-09-05 07:20:30 +01:00 |
Mark Veidemanis
|
f8fc5e1a1b
|
Split thread list into chunks to save memory
|
2022-09-05 07:20:30 +01:00 |
Mark Veidemanis
|
6e00f70184
|
Reformat code
|
2022-09-04 21:40:04 +01:00 |
Mark Veidemanis
|
0f717b987d
|
Reinstate Redis cache
|
2022-09-04 21:38:53 +01:00 |
Mark Veidemanis
|
60c43b4eb5
|
Run processing in thread
|
2022-09-04 21:29:00 +01:00 |
Mark Veidemanis
|
db23b31f30
|
Implement aiohttp
|
2022-09-04 19:44:25 +01:00 |
Mark Veidemanis
|
f7860bf08b
|
Begin implementing aiohttp
|
2022-09-04 13:47:32 +01:00 |
Mark Veidemanis
|
734a2b7879
|
Implement running Discord and 4chan gathering simultaneously
|
2022-09-02 22:30:45 +01:00 |