CS2 Professional Match Analytics Pipeline
Posted on 2025-09-2025
Tags: Python
, PostgreSQL
, Web Scraping
, Streamlit
, Data Engineering
, ETL
Problem:
Counter-Strike 2 players and esports bettors lack structured, up-to-date data on player performance and match history. HLTV provides rich insights, but data is fragmented and hard to access or analyze at scale. This project aimed to centralize CS2 performance data and make it accessible for analysis, betting strategies, and personal improvement.
Approach:
- Engineered a modular scraping and parsing pipeline using
SeleniumBase
to extract match, player, and map-level data from HLTV
- Structured the data into a PostgreSQL database with:
- Matches, players, maps, and historical team data
- Snapshot tracking for team rosters, player transfers, and aliases
- Composite keys, foreign constraints, and indexing for performance
- Built queue-based scraping architecture to support scalable data ingestion
- Developed a Streamlit frontend for displaying match stats, trends, and insights
- Designed system for future demo file parsing and advanced analytics (e.g. positioning, utility usage)
Outcome:
- Centralized historical CS2 match and player data into a queryable, auditable PostgreSQL schema
- Delivered an early-stage Streamlit dashboard for:
- Team and player performance trends
- Map win/loss rates and event metadata
- Scouting insights for players, bettors, or analysts
- Built a scalable architecture to support future ML modeling (e.g. win prediction, rating inflation detection)
Repo: GitHub - CS2 Analytics
Demo: (Add Streamlit app link or screenshots if available)
Categories: Machine Learning, Real Estate, Python