monolith

Multi-source OSINT data collection and parallel processing tool. Indexes 4chan, Discord and IRC, reorganizes the data into a common format, annotates language, sentiment and tokens in multiple threads, and outputs the results to Elasticsearch.

Go to file

Mark Veidemanis cc6340acab Add persistent Redis data store and copy over Druid config to production		2022-10-04 20:26:58 +01:00
docker	Add persistent Redis data store and copy over Druid config to production	2022-10-04 20:26:58 +01:00
legacy	Use only one Redis key for the queue to make chunk size more precise for thread allocation	2022-09-30 07:22:22 +01:00
processing	Time stuff and switch to gensim for tokenisation	2022-10-01 14:46:45 +01:00
schemas	Implement threshold writing to Redis and manticore ingesting from Redis	2022-09-07 07:20:30 +01:00
sources	Remove commented debug code	2022-09-30 07:22:22 +01:00
.gitignore	Add config directories to gitignore	2022-09-08 09:45:18 +01:00
.pre-commit-config.yaml	Reinstate Redis cache	2022-09-04 21:38:53 +01:00
db.py	Remove commented debug code	2022-09-30 07:22:22 +01:00
docker-compose.yml	Add persistent Redis data store and copy over Druid config to production	2022-10-04 20:26:58 +01:00
env.example	Document new PROCESS_THREADS setting in example file	2022-09-20 22:43:04 +01:00
environment	Add Apache Superset and fix Druid resource usage	2022-10-04 20:17:04 +01:00
event_log.txt	Implement sentiment/NLP annotation and optimise processing	2022-09-16 17:09:49 +01:00
monolith.py	Use only one Redis key for the queue to make chunk size more precise for thread allocation	2022-09-30 07:22:22 +01:00
requirements.txt	Time stuff and switch to gensim for tokenisation	2022-10-01 14:46:45 +01:00
util.py	Implement sentiment/NLP annotation and optimise processing	2022-09-16 17:09:49 +01:00