This website requires JavaScript.
Explore
Help
Sign In
Pathogen
/
monolith
Watch
2
Star
0
Fork
You've already forked monolith
0
Code
Issues
Pull Requests
Projects
Releases
Wiki
Activity
Multi-source OSINT data collection and parallel processing tool. Indexes 4chan, Discord and IRC, reorganizes the data into a common format, annotates language, sentiment and tokens in multiple threads, and outputs the results to Elasticsearch.
505
Commits
1
Branch
1
Tag
8
MiB
Python
99.1%
Dockerfile
0.4%
Makefile
0.3%
Shell
0.2%
51a9b2af79
Go to file
HTTPS
Download ZIP
Download TAR.GZ
Download BUNDLE
Clone in VS Code
Cite this repository
APA
BibTeX
Cancel
Mark Veidemanis
51a9b2af79
Improve memory usage and fix 4chan crawler
2022-10-21 07:20:30 +01:00
docker
Improve memory usage and fix 4chan crawler
2022-10-21 07:20:30 +01:00
legacy
Improve memory usage and fix 4chan crawler
2022-10-21 07:20:30 +01:00
processing
Improve memory usage and fix 4chan crawler
2022-10-21 07:20:30 +01:00
schemas
Implement threshold writing to Redis and manticore ingesting from Redis
2022-09-07 07:20:30 +01:00
sources
Improve memory usage and fix 4chan crawler
2022-10-21 07:20:30 +01:00
.gitignore
Update gitignore
2022-10-21 11:53:28 +01:00
.pre-commit-config.yaml
Reinstate Redis cache
2022-09-04 21:38:53 +01:00
Makefile
Clean up docker environment
2022-10-19 16:45:18 +01:00
db.py
Don't shadow previous iterator variable
2022-10-21 07:20:30 +01:00
docker-compose.yml
Clean up docker environment
2022-10-19 16:45:18 +01:00
druid-spec.json
Add example Druid spec
2022-10-21 07:20:30 +01:00
env.example
Document new PROCESS_THREADS setting in example file
2022-09-20 22:43:04 +01:00
environment
Clean up docker environment
2022-10-19 16:45:18 +01:00
monolith.py
Reformat
2022-09-30 15:23:00 +01:00
requirements.txt
Time stuff and switch to gensim for tokenisation
2022-10-01 14:46:45 +01:00
util.py
Implement sentiment/NLP annotation and optimise processing
2022-09-16 17:09:49 +01:00