This website requires JavaScript.
Explore
Help
Sign In
Pathogen
/
monolith
Watch
2
Star
0
Fork
You've already forked monolith
0
Code
Issues
Pull Requests
Projects
Releases
Wiki
Activity
Multi-source OSINT data collection and parallel processing tool. Indexes 4chan, Discord and IRC, reorganizes the data into a common format, annotates language, sentiment and tokens in multiple threads, and outputs the results to Elasticsearch.
507
Commits
1
Branch
1
Tag
8
MiB
Python
99.1%
Dockerfile
0.4%
Makefile
0.3%
Shell
0.2%
44d6d90325
Go to file
HTTPS
Download ZIP
Download TAR.GZ
Download BUNDLE
Clone in VS Code
Cite this repository
APA
BibTeX
Cancel
Mark Veidemanis
44d6d90325
Update Druid spec
2022-11-21 18:59:53 +00:00
docker
Improve memory usage and fix 4chan crawler
2022-10-21 07:20:30 +01:00
legacy
Improve memory usage and fix 4chan crawler
2022-10-21 07:20:30 +01:00
processing
Improve memory usage and fix 4chan crawler
2022-10-21 07:20:30 +01:00
schemas
Implement threshold writing to Redis and manticore ingesting from Redis
2022-09-07 07:20:30 +01:00
sources
Improve memory usage and fix 4chan crawler
2022-10-21 07:20:30 +01:00
.gitignore
Update gitignore
2022-10-21 11:53:28 +01:00
.pre-commit-config.yaml
Add ripsecrets to pre-commit hook
2022-11-03 07:20:30 +00:00
Makefile
Clean up docker environment
2022-10-19 16:45:18 +01:00
db.py
Don't shadow previous iterator variable
2022-10-21 07:20:30 +01:00
docker-compose.yml
Clean up docker environment
2022-10-19 16:45:18 +01:00
druid-spec.json
Update Druid spec
2022-11-21 18:59:53 +00:00
env.example
Document new PROCESS_THREADS setting in example file
2022-09-20 22:43:04 +01:00
environment
Clean up docker environment
2022-10-19 16:45:18 +01:00
monolith.py
Use only one Redis key for the queue to make chunk size more precise for thread allocation
2022-09-30 07:22:22 +01:00
requirements.txt
Time stuff and switch to gensim for tokenisation
2022-10-01 14:46:45 +01:00
util.py
Implement sentiment/NLP annotation and optimise processing
2022-09-16 17:09:49 +01:00