Skip to content

Project History & Evolution

The OuSpark project evolved through multiple phases, each addressing growing requirements and technical challenges. This journey showcases how a simple idea transformed into a comprehensive result analytics platform.

Phase 1: Basic scraping and analysis

Tools Used:

  • Python (requests, BeautifulSoup, pandas)
  • Matplotlib for basic visualization
  • Jupyter Notebook for development
  • CSV files for data storage

Achievements:

  • ✅ Basic result extraction
  • ✅ Pass/fail ratio calculation
  • ✅ Subject-wise performance analysis
  • ✅ Simple student rankings

Phase 2: Architectural Improvements

graph LR
A[Raw HTML] --> B[Results Extraction]
B --> C[Data Cleaning]
C --> D[Data Visualization]
D --> E[Export to PDF]

New Technologies:

  • Streamlit: Interactive web application
  • ydata-profiling: Advanced data analysis
  • Nivo Charts: Interactive visualizations
  • PDF Export: Shareable reports

Key Learnings:

  • Importance of separation of concerns
  • Need for interactive interfaces
  • Value of exportable analytics

Streamlit Webapp UI:


Phase 3: Scalable Tech Stack

Requirements:

  • Cross-platform deployment (Mobile + Desktop)
  • Stable production performance
  • Good integration with chosen backend

Final Stack Decision:

  • Frontend: Dart (Flutter)
  • Backend: Supabase (PostgreSQL + Auth + Storage)
  • Pipeline: Python (FastAPI), YAML, Airflow, Kestra, Docker.

Phase 4: UI/UX Design

Before:

  • Basic Streamlit interface
  • Limited customization
  • Poor mobile experience

After:

  • Professional Figma designs
  • Custom animations (Lottie + Rive)
  • Mobile-first approach
  • Consistent branding

Phase 5: Performance Optimization ⚡

Problem: CPU utilization reached ~70% daily due to heavy read operations

Investigation Results:

  • Search Results was the most-used feature (Firebase Analytics insight) in the app.
  • Heavy database I/O was causing bottlenecks

Solution Architecture

graph LR
A[User Request] --> B{Redis Cache}
B -->|Hit| C[Fast Response]
B -->|Miss| D[Database Query]
D --> E[Update Cache]
E --> C

Implementation:

  • Redis Integration: Upstash serverless Redis
  • Supabase Edge Functions: Custom caching logic
  • Prewarming Strategy: Populate cache proactively

Results:

  • 📊 Disk I/O: 90% → 1% reduction
  • Response Time: Significantly improved
  • 🎯 User Experience: Seamless and fast

Disk Io Consumed Per Day

Phase 6: Pipeline Automation

Challenge: Manual pipeline execution was error-prone and time-consuming

Solutions Evaluated:

Apache Airflow

  • DAG-based workflow definition
  • Sequential task execution
  • Pros: Mature, well-documented
  • Cons: Complex setup, heavy resource usage

Kestra

  • YAML-based workflow definition
  • Sequential API call to execute pipeline
  • Pros: YAML configuration, better UI
  • Cons: Newer technology, smaller community

Final Choice: Both tools were implemented, with Kestra preferred for its superior UI and YAML-based configuration.

Performance Improvements Over Time

Metric Phase 1 Phase 2 Phase 6 Improvement
Processing Time 1+ hour 1+ hour 9 minutes 85% faster
Result Extraction Manual 1 hour 5 minutes 92% faster
Concurrent Users 1 ~10 200+ 200% increase
Platform Support 1 (Jupyter) 1 (Web) 3 (Mobile/Desktop/Web) 300% increase

This evolution demonstrates how thoughtful technology choices and iterative development can transform a simple concept into a robust, scalable platform serving thousands of users.