Multi-Stage Query Engine
Overview and Purpose
The Multi-Stage Query Engine (MSQE) is StarTree Cloud’s advanced query processing system that enables complex analytical operations such as joins, window functions, and set operations across multiple tables. Unlike the traditional single-stage engine that uses a simple scatter-gather approach, MSQE breaks queries into multiple processing stages that can be efficiently executed across distributed servers.
MSQE is particularly valuable for:
- Joining data across multiple tables
- Complex analytical queries requiring multi-step processing
- Queries involving window functions and advanced aggregations
- Set operations (UNION, INTERSECT, MINUS)
- Applications requiring insights from related data in separate tables
Note: MSQE is the primary query mode in StarTree Cloud and is recommended for most analytical workloads, especially those involving joins or complex processing.
Enabling MSQE
To query using distributed joins, window functions, and other multi-stage operators, you need to enable the multi-stage query engine. There are several ways to do this:
Using the Query Console
In the StarTree Cloud Query Console, simply select the Use Multi-Stage Engine checkbox before running your query.
Using REST APIs
When using the Controller Admin API:
When using the Broker Query API:
Using the SET Command
You can enable MSQE directly in your SQL query by adding a SET command at the beginning:
Key Capabilities
Table Joins
MSQE supports joining tables in a distributed manner using various strategies:
Join strategies include:
- Lookup joins (for small dimension tables)
- Colocated joins (when data is partitioned on join keys)
- Query-time partition joins
- Broadcast joins
Window Functions
MSQE enables analytical functions over partitions of result sets:
Set Operations
MSQE supports operations across multiple query results:
How MSQE Works
MSQE processes queries through a multi-stage execution pipeline:
- Query Planning: The query is parsed and transformed into a logical plan
- Stage Assignment: The plan is divided into multiple execution stages
- Distributed Execution: Stages are distributed across servers for parallel processing
- Data Shuffling: When needed, data is moved between stages using optimized shuffling
- Result Collection: Final results are gathered at the root stage and returned
Each stage serves a specific purpose:
- Leaf Stages: Read data directly from tables
- Intermediate Stages: Process operations like joins and aggregations
- Root Stage: Collects and finalizes results for the client
Using MSQE
Checking Execution Plans
To understand how a query will be executed, use the EXPLAIN PLAN command:
The output shows the logical plan with stages indicated by PinotLogicalExchange
operators.
Optimization Tips
- Join Key Selection: Choose join keys that match table partitioning when possible
- Filtering Early: Apply filters before joins to reduce data movement
- Dimension Table Design: Keep dimension tables small for efficient lookup joins
- Column Selection: Select only necessary columns to minimize data transfer
Performance Considerations
- Join Complexity: Performance scales with join complexity and table sizes
- Data Distribution: Evenly distributed data performs better than skewed distributions
- Partitioning Strategy: Align table partitioning with common join keys
- Memory Requirements: Complex joins may require additional memory