Here’s a summary of the Arrow dev mailing list activity:
Ongoing
Discussions
- ADBC Configuration: Discussions are ongoing regarding standardized configuration file locations and formats for ADBC drivers, with environment variables favored for ease of use and Kubernetes integration (May 27, June 2).
- C++ CMake Build System: Simplifying the C++ CMake build configuration, especially concerning static and shared library linkage, is under consideration (June 3). The effort includes evaluating the simultaneous building of shared and static libraries.
- Monorepo Maintenance: Strategies for managing stale issues and pull requests in the monorepo are being discussed. Proposals include closing untouched PRs and warning/closing old issues, potentially adding “stale” and “needs champion” labels (June 17).
- AWS Credit Usage: The optimal use of donated AWS credits is being explored, focusing on improvements to CI, GPU testing, benchmarking, and large memory tests (June 12).
Emerging
Themes
- Legacy Feature Removal: There’s a trend towards deprecating and removing older features like Feather V1 format in C++, encouraging migration to newer alternatives (June 3). Also, work has begun to remove Skyhook from the main repository, moving it to a separate repository (June 16).
- New Interval Types for Parquet/Iceberg: Discussions are happening around the introduction of new interval types in Parquet and Iceberg, and the challenges and approaches for aligning those with Arrow’s type system (June 21). A canonical extension type could reduce reinvention of timestamp and duration types.
- Maintenance and Tooling: Development of a maintenance dashboard aims to improve project oversight by providing summaries of mailing list discussions (June 17). The integration of tools like Kapa.ai bot on dev docs (June 12) show interest in enhancing community support.
Potential
Roadblocks
- Complexity of Build Systems: The discussion around CMake configuration highlights the complexity of managing build systems and supporting various build configurations (June 3).
- Maintaining Deprecated Features: Balancing the removal of legacy features with the potential impact on downstream users and existing workflows is a consideration, as seen in the Feather V1 deprecation discussion (June 3).
- Windows Server 2019 End-of-Life: The upcoming end of support for Windows Server 2019 runners requires updating CI configurations across various Arrow repositories (June 17).
Strategic
Plans
- Project Component Decoupling: The effort to split language implementations into separate repositories continues (Rust, Swift, C#) (May 16, May 19, May 20, May 23).
- Release Management: Recent activity includes voting on and completing releases for Apache Arrow Rust Object Store 0.12.2 and Apache Arrow Go 18.3.1 (June 6, June 17).
- External Project Integration: A proposal to donate the arrow-gpu project suggests interest in expanding Arrow’s capabilities in GPU-accelerated computing (June 6).