This project uses Apache airflow to run a data pipeline that fetches tweet count data from Twitter API, and Headline totals from News API. It also uses spark to get daily price changes (opening and closing daily) using Yahoo Finance API.
Data obtained from each source is stored into a MYSQL database.
The data is then analysed to find the sentiments and its correlation with price movement of the stock.
Finally, the data pipeline uses Papermill to execute jupyter notebook for preparing the final visualization report using Bokeh to show various visualizations on the collected data.
- Twitter API
- News API
- Yfinance API
- Airflow
- Jupyter Notebook
- Spark
- Pandas
- Papermill
- NLTK
- Bokeh
- Create accounts with twitter and news api to get the required keys for calling the apis.
- Update the json files with the api keys and other parameters of py file locations to be run through the data pipeline
- Copy the dags (data pipeline) file sentiment_analysis_data_pipeline_ps.py and save it in your dags folder of your Airflow environment
- upload the pipeline JSON file in Airflow as a parameter
- Schedule and trigger the data pipeline form Airflow
