| Open Source Lakehouse Platform |
|
|
|
| Written by Administrator |
| Sunday, 14 June 2026 06:59 |
|
AI-assisted deployment of a fully open-source lakehouse platform on Scaleway. Pick only the components you need — the AI provisions infrastructure, applies configuration, sets up integrations between services, and validates the final deployment. Components can be added or removed later without rebuilding the stack. Scaleway Object Storage (S3)S3-compatible object storage for raw, silver, and gold data layers. Single source of truth for the lakehouse warehouse, accessed by Spark, Trino, and ClickHouse through the s3a:// protocol. Open-source storage layer on top of S3 providing ACID transactions, schema enforcement, time-travel, and unified batch/streaming reads. The foundation of the bronze — silver — gold medallion architecture. Apache KafkaDurable event log and streaming backbone. Captures change-data-capture streams from source systems, decouples producers from consumers, and feeds real-time pipelines into Spark, ClickHouse, and downstream APIs. Apache SparkDistributed processing engine for ETL/ELT pipelines, Delta Lake writes, and large-scale data transformations. Deployed on Kubernetes via the Spark Operator. OpenMetadataCentralized metadata catalog and data lineage platform. Tracks schemas, ownership, quality metrics, and dependencies across all components. ClickHouseReal-time analytical columnar database for sub-second queries on billions of rows. Ideal for dashboards and ad-hoc OLAP workloads. Apache AirflowWorkflow orchestration and scheduling. DAG-based pipeline definitions for batch and incremental data processing across the platform. KubernetesContainer orchestration platform underneath everything. All services run as deployments on a single Scaleway-managed k8s cluster. Apache SupersetSelf-service business intelligence and dashboarding. Connects directly to Trino and ClickHouse for interactive exploration of the lakehouse. TrinoDistributed SQL query engine that federates across the Delta Lake (S3), PostgreSQL, and other catalogs. Query everywhere with one SQL dialect. PostgreSQLRelational store for Hive Metastore, Airflow metadata, OpenMetadata catalog, and other transactional workloads. |
| Last Updated on Wednesday, 17 June 2026 07:46 |



