Home Projects Blog Contact

Finance Data Pipeline Project

📈 Finance Data Pipeline Project

Quick Preview: A data pipeline designed to automate the extraction, processing, and storage of financial data using Apache Airflow, Twelve Data API, and Google Cloud Storage.

🎯 Project Overview

This project provides a simple yet powerful data pipeline that automates the process of fetching financial data from an API, converting it into a CSV file, and uploading it to Google Cloud Storage. The process is fully automated using Apache Airflow, which schedules and manages the various tasks involved in the pipeline.

Finance Data Pipeline

⚡ Key Features


🔍 Click to see detailed technical implementation

🛠️ Technical Architecture

Core Technologies

Pipeline Architecture

# Simplified DAG structure for finance data pipeline
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime, timedelta

default_args = {
    'owner': 'finance-team',
    'depends_on_past': False,
    'start_date': datetime(2024, 1, 1),
    'email_on_failure': True,
    'retries': 2,
    'retry_delay': timedelta(minutes=5)
}

dag = DAG(
    'finance_data_pipeline',
    default_args=default_args,
    description='Finance data processing pipeline',
    schedule_interval='@daily',
    catchup=False
)

Data Processing Workflow

  1. Data Extraction

    • Connect to Twelve Data API
    • Fetch real-time and historical stock data
    • Handle API rate limits and authentication
  2. Data Processing

    • Parse JSON response from API
    • Clean and validate financial data
    • Convert data into structured CSV format
  3. Data Storage

    • Upload processed CSV files to Google Cloud Storage
    • Organize files with proper naming conventions
    • Implement data retention policies
  4. Monitoring & Logging

    • Track pipeline execution status
    • Log errors and performance metrics
    • Send notifications on pipeline failures

🔧 Key Components

Data Sources

Processing Features

Storage & Output


🚀 Installation & Usage

Prerequisites

Setup Steps

  1. Clone the Repository:

    git clone https://github.com/ahmedmakroum/pipelinestocks.git
    cd finance-data-pipeline
  2. Install Python Dependencies:

    pip install -r requirements.txt
  3. Set Up Airflow:

    • Place the dag.py script in your Airflow DAGs directory
    • Start the Airflow scheduler and web server:
    airflow scheduler
    airflow webserver
  4. Configure Google Cloud Storage:

    • Ensure your GCP credentials are set up and your project is selected
    • Update the bucket name in dag.py to match your GCS bucket

Running the Pipeline

  1. Triggering the DAG:

    • Access the Airflow web UI at http://localhost:8080
    • Locate the finance_data_pipeline DAG
    • Trigger the DAG manually or set it to run on a schedule
  2. Monitoring:

    • Monitor the DAG’s execution from the Airflow UI
    • Check the GCS bucket to verify that the CSV files are being uploaded correctly

🎯 Key Technologies Demonstrated

Apache Airflow - Workflow orchestration and task scheduling
Python Data Processing - API integration and data transformation
Google Cloud Platform - Cloud storage and data management
Financial Data APIs - Real-time market data integration
CSV Processing - Data format conversion and storage

🔮 Future Enhancements


🤝 Contributing

Contributions are welcome! Please fork the repository and submit a pull request for review.


Interested in financial data engineering or building automated trading systems? Let’s connect and discuss how modern data pipelines can enhance financial analysis and decision-making!