Finance Data Pipeline Project
Finance Data Pipeline Project
Quick Preview: A data pipeline designed to automate the extraction, processing, and storage of financial data using Apache Airflow, Twelve Data API.
Project Overview
This project provides a simple yet powerful data pipeline that automates the process of fetching financial data from an API, converting it into a CSV file. The process is fully automated using Apache Airflow, which schedules and manages the various tasks involved in the pipeline.

Key Features
- Automated Data Extraction: Fetches financial data from the Twelve Data API
- Data Processing: Converts the API response into a CSV file for easy storage and analysis
- Orchestration with Apache Airflow: Manages and schedules tasks in the pipeline, ensuring reliable and repeatable processes
- Real-time Financial Data: Access to real-time and historical stock information
Click to see detailed technical implementation
Technical Architecture
Core Technologies
- Python - Core programming language used for data extraction and processing
- Apache Airflow - Orchestrates the pipeline, handling task dependencies and scheduling
- Twelve Data API - Source of financial data, providing real-time and historical stock information
- CSV - Format used to store the extracted data
Pipeline Architecture
# Simplified DAG structure for finance data pipeline
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime, timedelta
default_args = {
'owner': 'finance-team',
'depends_on_past': False,
'start_date': datetime(2024, 1, 1),
'email_on_failure': True,
'retries': 2,
'retry_delay': timedelta(minutes=5)
}
dag = DAG(
'finance_data_pipeline',
default_args=default_args,
description='Finance data processing pipeline',
schedule_interval='@daily',
catchup=False
)
Data Processing Workflow
-
Data Extraction
- Connect to Twelve Data API
- Fetch real-time and historical stock data
- Handle API rate limits and authentication
-
Data Processing
- Parse JSON response from API
- Clean and validate financial data
- Convert data into structured CSV format
-
Data Storage
- Organize files with proper naming conventions
- Implement data retention policies
-
Monitoring & Logging
- Track pipeline execution status
- Log errors and performance metrics
- Send notifications on pipeline failures
Key Components
Data Sources
- Twelve Data API: Comprehensive financial data provider
- Real-time Market Data: Live stock prices and market information
- Historical Data: Historical stock performance and trends
Processing Features
- Error Handling: Robust error handling and retry mechanisms
- Data Validation: Ensure data quality and completeness
- Scalable Architecture: Handle varying data volumes efficiently
Storage & Output
- CSV Format: Industry-standard format for data analysis
- Organized Structure: Logical file organization for easy access
Installation & Usage
Prerequisites
- Python 3.x installed on your system
- Apache Airflow installed and configured
- API Key from Twelve Data API
Setup Steps
-
Clone the Repository:
git clone https://github.com/ahmedmakroum/pipelinestocks.git cd finance-data-pipeline -
Install Python Dependencies:
pip install -r requirements.txt -
Set Up Airflow:
- Place the
dag.pyscript in your Airflow DAGs directory - Start the Airflow scheduler and web server:
airflow scheduler airflow webserver - Place the
Running the Pipeline
-
Triggering the DAG:
- Access the Airflow web UI at
http://localhost:8080 - Locate the
finance_data_pipelineDAG - Trigger the DAG manually or set it to run on a schedule
- Access the Airflow web UI at
-
Monitoring:
- Monitor the DAG’s execution from the Airflow UI
- Check the GCS bucket to verify that the CSV files are being uploaded correctly
Project Links
- GitHub Repository - Complete source code and documentation
- Twelve Data API - Financial data API documentation
Key Technologies Demonstrated
Apache Airflow - Workflow orchestration and task scheduling
Python Data Processing - API integration and data transformation
Financial Data APIs - Real-time market data integration
CSV Processing - Data format conversion and storage
Future Enhancements
- Real-time Streaming: Implement real-time data streaming for live market updates
- Data Analytics: Add analytical capabilities for financial insights
- Multi-source Integration: Support for additional financial data providers
- Machine Learning: Integrate ML models for predictive analytics
- Dashboard Integration: Create interactive dashboards for data visualization
Contributing
Contributions are welcome! Please fork the repository and submit a pull request for review.
Interested in financial data engineering or building automated trading systems? Let’s connect and discuss how modern data pipelines can enhance financial analysis and decision-making!