An Event Driven Software-as-a-Service social media/news outlet analyzer that tracks brand name and keyword sentiment on Reddit and Twitter in real time to provide alerting and trend insight. Tech stack is Python+FastAPI, Redis for PubSub, PostgreSQL, and InfluxDB.
This project started sometime in early 2021 when myself and two friends, Tyson and Axel, decided we wanted to build something.
What we decided on was a SaaS application where a brand could track keyword(s) sentiment online in real-time.
Say the Coca Cola company wants to monitor the words “Coke” and “Coca Cola”, then they would sign up, register those keywords, and we would start scraping Reddit, Twitter, and several dozen aggregate news outlets.
The architecture diagram here shows the rough flow of the project:
Scraper REST API
This manages the Set of terms in Redis that we are to scrape via web or API.
In the example above, Coca Cola would add the terms they want to track and it would result in two calls like:
POST /search/terms/coke
POST /search/terms/coca%20cola
From there the next step is the Scraper Daemon
Scraper Daemon
This is a simple daemon written in Python, it starts and manages a subprocess that does the actual work.
The subprocess handles pulling the keywords/terms from Redis, creating purpose built scrapers for Reddit, Twitter, and the news outlets, then passing the terms to them and saving the results back into Redis.
Spam filtering
Once the spam filtering daemon pulls new scraped posts from Redis it will throw away any that it deems are spammy.
This is built with Spacy, Pandas, and NLTK (Natural Language Tool Kit) trained on datasets found on Kaggle.
There are several pre-cleaned open source datasets we used, one is based on SMS messages, and another on tweets. We train the model and save it to disk using Spacy during our build step.
Any messages that are left over after this go back into Redis.
Sentiment Analysis
The final step is the Sentiment Analysis.
The Analyzer Daemon pulls the non-spam messages from Redis, runs it through VADER and TextBlob and averages the score.
The result is a Float between 0.0 (Bad) and 1.0 (Good).
This number can be then be added to InfluxDB which our app would pull from and then visualize for our customers.
If a company releases a new product/update/feature they can watch this graph update in real-time and see it’s reception.