Multi-source OSINT data collection and parallel processing tool. Indexes 4chan, Discord and IRC, reorganizes the data into a common format, annotates language, sentiment and tokens in multiple threads, and outputs the results to Elasticsearch.
Go to file
Mark Veidemanis dcd648e1d2
Make crawler more efficient and implement configurable parameters
2022-09-05 07:20:30 +01:00
docker Split thread list into chunks to save memory 2022-09-05 07:20:30 +01:00
schemas Reformat code 2022-09-04 21:40:04 +01:00
sources Make crawler more efficient and implement configurable parameters 2022-09-05 07:20:30 +01:00
.gitignore Reinstate Redis cache 2022-09-04 21:38:53 +01:00
.pre-commit-config.yaml Reinstate Redis cache 2022-09-04 21:38:53 +01:00
db.py Make crawler more efficient and implement configurable parameters 2022-09-05 07:20:30 +01:00
docker-compose.yml Run processing in thread 2022-09-04 21:29:00 +01:00
monolith.py Reformat code 2022-09-04 21:40:04 +01:00
requirements.txt Run processing in thread 2022-09-04 21:29:00 +01:00
util.py Run processing in thread 2022-09-04 21:29:00 +01:00