Open Issues (last 90 days)

56

Issues from New Contributors

26

Open PRs (last 90 days)

28

PRs from New Contributors

12

Open issues from last 90 days (new contributors highlighted)
Date Title
2026-03-27 [Python] Possible follow-up on arithmetic support
2026-03-26 [Python] Pandas is deprecating dataframe interchange protocol
2026-03-20 [Python] pa.concat_batches() to include promote_options like pa.concat_tables()
2026-03-15 [C++][Python] Compute function to generate date from year / month / day
2026-03-13 [C++][Python] Table.join_asof occasionally fails in doctest
2026-03-13 [Python] Implement DictionaryArray converter for type pa.large_string and pa.large_binary
2026-03-09 [Python] Memory Leak while iterating batches of pyarrow dataset
2026-03-04 [C++][Python] Investigate OpenTelemetry works as expected on Windows
2026-03-03 [Python][C++][GPU] Python Cuda jobs fail with ‘cuda.bindings.driver.CUcontext’ object has no attribute ‘value’
2026-02-28 [Python] MacOs stuck on importing (23.0.1) while works on (15.0.1)
2026-02-26 [Python] Investigate whether we can rely on pyproject.toml for build dependencies
2026-02-26 [Python] Validate PYARROW_GENERATE_COVERAGE works as expected
2026-02-25 [C++][Python] Filtering corrupts data in column containing a list array
2026-02-24 [Python] Single chunk ChunkedArray doesn’t correctly respect copy in __array__method
2026-02-23 [Python][Parquet] Add options to control writing of Bloom filters to parquet.write_table
2026-02-22 [Python][Dataset] Add filters parameter to orc.read_table() for predicate pushdown
2026-02-18 [Python][Packaging] Wrong ARROW_SIMD_LEVEL=SSE4_2 on arm64 macOS wheels
2026-02-18 [Python] Add a PyArrow sanitizers build
2026-02-17 [Python ] Table.to_batches() loses schema information when table has zero rows
2026-02-17 [Python] Expose RecordBatchFileReader::CountRows in Python
2026-02-16 [Python] test_memory.py fails with -DARROW_MIMALLOC=OFF
2026-02-14 [Python] RecordBatch.serialize() should support writing into a pre-allocated buffer
2026-02-13 [Python] Wheel / sdist build uses docstrings generation script we don’t want to distribute
2026-02-12 [Python] RecordBatch.serialize() silently drops dictionary messages
2026-02-12 [Python] Fix DeprecationWarnings in PyArrow tests
2026-02-11 [Python] Deprecate pyarrow.gandiva
2026-02-11 [Python] data loss converting to Table from pandas Timedelta built using replace
2026-02-09 [Python][Annotations] Remaining test fixes
2026-02-09 [Python][Annotations] Flight, CUDA and other modules
2026-02-09 [Python][Annotations] Parquet and Dataset
2026-02-09 [Python][Annotations] Filesystems
2026-02-09 [Python][Annotations] I/O, IPC, and Serialization Formats
2026-02-09 [Python][Annotations] Compute module
2026-02-09 [Python][Annotations] Core Data Structures
2026-02-06 [Python] Consider pa.date32/64.to_pandas_dtype() returning datetime64[s] instead of datetime64[ms]
2026-02-05 [Python] PyArrow cannot read from a newline-delimited JSON file with inconsistent column types, even if parse_options specifies a schema
2026-02-05 [C++][Python] Act on existing deprecations
2026-01-31 [Python][Annotations] Add internal type system stubs (_types, error, _stubs_typing)
2026-01-29 [Python] Non-UTF-8 bytes should be disallowed in custom_metadata
2026-01-29 [Python][Packaging] Merge Windows wheels base and test base images into a single image
2026-01-28 [Dev][Python] Unused python scripts under python/scripts
2026-01-26 [Python] Wrong cast from StringArray to pandas 3 when element is None
2026-01-26 [Python][Packaging] Drop archery and docker for our Windows wheels and build on the GitHub runner directly
2026-01-26 ORC Predicate Pushdown
2026-01-23 EHN: Add a is_castable function and/or errors=coerce option to cast
2026-01-23 [CI] Remove all getattr(name) -> Any from *.pyi files before next release
2026-01-23 [Python] DEFAULT option in simd_level of cli.py missing
2026-01-23 [Python] A “personal data” boolean in field metadata
2026-01-22 [Python] extract_regex and extract_regex_span only extract the first match
2026-01-09 [Python] Drop support for pandas < 2.0.0
2026-01-02 [Python][Types] Type stub improvements for better coverage with Arrow IPC and compute operations
2025-12-30 [Python] A standard field/column description in metadata
2025-12-30 [Python] Pyarrow csv reader : ability to specify a max number of rows to read
2025-12-29 [C++][Python] Incorrect results from hash_pivot_wider
2025-12-29 pyright type error when hinting pyarrow.compute functions as Callable[[ChunkedArray[Any]], ChunkedArray[Any]]
2025-12-28 ParquetDataset filtering on hive partitionned datasets accesses unrelated directories and files
Open PRs from last 90 days (new contributors highlighted)
Date Title
2026-03-24 Gh 49232 deprecate feather python
2026-03-16 GH-49522: [CI] Update chrome_version for emscripten job to latest stable (v146)
2026-03-09 GH-49397: [Python] Fix PYARROW_GENERATE_COVERAGE end-to-end flow
2026-02-23 GH-49376: [Python][Parquet] Add ability to write Bloom filters from pyarrow
2026-02-22 GH-41365: [Python] Fix S3 URI with whitespace silently falling back to LocalFileSystem
2026-02-22 GH-49358: [Python][Doc] Add import statement to filters_to_expression docstring example
2026-02-21 GH-49273: [Python] Move stub docstring script to _build_utils with gr…
2026-02-20 GH-49351: [C++] Check TZDIR environment variable in vendored date library
2026-02-17 GH-49302: [Python][Doc] Incorrect parameter descriptions in SparseCSCMatrix.from_numpy
2026-02-14 GH-49285: [Python] Add buffer parameter to RecordBatch.serialize()
2026-02-13 GH-49255: Fix pandas deprecation warnings in Categorical tests
2026-02-12 GH-49258: [C++][Python] Add public APIs for reading and serializing IPC dictionary messages
2026-02-11 GH-49002: [Python] Fix array.to_pandas string type conversion for arrays with None
2026-02-10 GH-49168: [Python] map date32/date64 to datetime64[s] in to_pandas_dtype
2026-02-09 GH-49199: [Python][Annotations] Remaining test fixes
2026-02-09 GH-49198: [Python][Annotations] Flight, CUDA and other modules
2026-02-09 GH-49197: [Python][Annotations] Parquet and Dataset
2026-02-09 GH-49196: [Python][Annotations] Filesystems
2026-02-09 GH-49195: [Python][Annotations] I/O, IPC, and Serialization Formats
2026-02-09 GH-49194: [Python][Annotations] Compute module
2026-02-09 GH-49193: [Python][Annotations] Core Data Structures
2026-02-06 GH-48182: [Python] Add regression test for RecordBatch.from_struct_array with list offsets
2026-02-05 GH-43075: [Docs][Python] Document that source parameter to IPC reader…
2026-01-23 GH-48957: [Benchmarking] Add temporal type support to benchmarks
2026-01-19 GH-43510: [PYTHON] Move NumPy specific tests to separate test file.
2026-01-09 GH-35830: [C++] Fix fixed-size list scalar hashing with non-zero offsets
2026-01-04 GH-48695: [Python][C++] Add max_rows parameter to CSV reader
2026-01-02 GH-46872: [C++][Python] Move Arange utility function to an Arrow C++

Open Issues (last 90 days)

13

Issues from New Contributors

5

Open PRs (last 90 days)

10

PRs from New Contributors

3

Open issues from last 90 days (new contributors highlighted)
Date Title
2026-03-27 [CI][R] AMD64 Windows R release fails with IOError: Bucket ‘ursa-labs-r-test’ not found
2026-03-27 [R] Add AGENTS.md file to R package
2026-03-24 [R][CI] r-binary-packages crossbow job fails for CRAN patch releases
2026-03-17 [R] Implement dplyr recode_values(), replace_values(), and replace_when()
2026-02-24 [R][CI] r-devdocs crossbow job fails during gap between C++ and R releases
2026-02-11 [R] Deprecate Feather reader and writer
2026-02-04 [R] parquet does not retain haven::tagged_na()
2026-02-04 [R] Bindings for new dplyr verbs
2026-02-03 [R] Compilation fails when Parquet support is disabled
2026-01-20 [R] write_to_raw is very slow
2026-01-12 [R] arrow::write_parquet error with zero-length datetimes in R 4.5.2
2026-01-12 [CI][R] Update our gcc12 job for R CI
2026-01-02 [R] “Invalid metadata$r” warning
Open PRs from last 90 days (new contributors highlighted)
Date Title
2026-03-27 GH-49609: [CI][R] AMD64 Windows R release fails with IOError: Bucket ‘ursa-labs-r-test’ not found
2026-03-27 GH-48712: [R] “Invalid metadata$r” warning
2026-03-19 GH-32123: [R] Expose azure blob filesystem
2026-03-17 GH-49534: [R] Implement dplyr recode_values(), replace_values(), and replace_when()
2026-03-02 Fixed usage of libarrow built from the git checkout not the released version from apt
2026-02-18 GH-39600: [R] Add trademark attribution to pkgdown site footer
2026-02-13 GH-49237: [R] Deprecate Feather reader and writer
2026-02-03 GH-49129: [R][Parquet] Guard Parquet dataset code with ARROW_R_WITH_PARQUET
2026-01-14 GH-48832: [R] Fix crash with zero-length POSIXct tzone attribute
2026-01-13 Update dplyr-funcs-doc.R to fix a typo

Open Issues (last 90 days)

86

Issues from New Contributors

30

Open PRs (last 90 days)

57

PRs from New Contributors

18

Open issues from last 90 days (new contributors highlighted)
Date Title
2026-03-27 Inspecting cmake presets options not available in cmake 4.x
2026-03-24 [C++][FlightRPC] [ODBC] DEB Linux Installer
2026-03-24 [C++][CI] StructToStructSubset test failure with libc++ 22.1.1
2026-03-23 [C++][FlightRPC][ODBC] Enable ODBC build on Debian
2026-03-18 [C++][FlightRPC][ODBC] Enable ODBC test build on Linux
2026-03-18 [C++][FlightRPC] Decouple Flight Serialize/Deserialize from gRPC transport
2026-03-18 [C++][Parquet] parquet-scan doesn’t show the usage info
2026-03-17 [C++][FlightRPC][ODBC] Change Windows ODBC to Static Linkage
2026-03-17 [C++][FlightRPC][ODBC] Add CI steps to support Windows DLL and MSI signing
2026-03-15 [C++][Python] Compute function to generate date from year / month / day
2026-03-13 [C++][Python] Table.join_asof occasionally fails in doctest
2026-03-12 [C++][Parquet] parquet-writer-test Large memory tests fail on TestColumnWriter.WriteLargeDictEncodedPage and TestColumnWriter.ThrowsOnDictIndicesTooLarge
2026-03-12 [C++][Parquet] I found formating errors in parquet.thrift
2026-03-11 [C++][FlightRPC][ODBC] Read booleans for applicable SqlInfoOptions
2026-03-10 [C++][CI] Investigate structure-aware fuzzing
2026-03-09 [C++][FlightRPC][ODBC] Fix inconsistent values in SQLGetInfo in global connection
2026-03-05 [C++][FlightRPC][ODBC] Enable ODBC build on Linux
2026-03-04 [C++][Python] Investigate OpenTelemetry works as expected on Windows
2026-03-04 Potential dereference of nullptr
2026-03-03 [Python][C++][GPU] Python Cuda jobs fail with ‘cuda.bindings.driver.CUcontext’ object has no attribute ‘value’
2026-03-03 [C++] Multi-threaded logging can mix up messages
2026-02-27 [C++][CI][Packaging][FlightPRC] ODBC macOS DMG installer signing
2026-02-26 [C++] Allow filesystems to be implemented in separate library (+ move remote filesystems out of libarrow into their own shared libraries)
2026-02-26 [C++][Python] GcsFileSystem: can’t access Requester Pays buckets
2026-02-25 [C++][Python] Filtering corrupts data in column containing a list array
2026-02-25 [C++] fatal error: ‘span’ file not found
2026-02-24 [C++][FlightRPC][ODBC] Enable DSN default values on macOS
2026-02-24 Parquet StreamReader should clarify its contract for parquet files without a schema.
2026-02-22 [C++][Dataset] ORC predicate pushdown: full operator and type coverage
2026-02-22 [C++][Dataset] Add OrcFileFragment with stripe filtering and predicate pushdown
2026-02-22 [C++][ORC] Add stripe statistics API to ORCFileReader
2026-02-20 [C++] Vendored date library does not respect TZDIR environment variable
2026-02-16 [C++] Change the C Type of HalfFloatType
2026-02-15 [C++][ORC] Add OrcFileFragment with stripe-level subsetting
2026-02-13 [C++] Synthetic OOM tests are allocator-sensitive with mimalloc
2026-02-13 [Doc][C++] Document security model for Arrow C++
2026-02-13 [C++][CI] arrow-json-test segfaults occasionally on AMD64 Windows MinGW MINGW64 C++ job
2026-02-12 [C++][CI] Use differential fuzzing
2026-02-12 C++ documentation for Parquet files is misleading for includes
2026-02-11 [C++][FlightRPC] ODBC: Replace boost::xpressive with alternatives
2026-02-11 [C++][FlightRPC] ODBC: Replace boost::beast with alternatives
2026-02-11 [C++][FlightRPC] ODBC: Replace boost::variant with std::variant
2026-02-11 [C++] Deprecate Feather reader and writer
2026-02-10 [C++][Compute] Allow reusing hash table across multiple is_in / SetLookup kernel calls
2026-02-10 [C++] Assertion fail during test script
2026-02-07 [C++] Enable hardware support for arrow::util::Float16 on GCC and Clang
2026-02-05 [C++] Allow disabling extension type deserialization when reading IPC
2026-02-05 [C++][Python] Act on existing deprecations
2026-02-04 [C++] Remove todo asking to use assign_by_moving
2026-02-03 [C++] Complete std::numeric_limits specialization
2026-02-02 [C++] Support half-float tensors in equality comparison
2026-02-02 [C++] Add bounds checking to DataType::field() to return nullptr for out-of-bounds access
2026-01-31 [C++] Doc update in aggregate_basic for first-last
2026-01-30 [C++] CMake Windows Install - Wrong Config File Locations
2026-01-30 Please update OpenSSL
2026-01-30 [C++] Null pointer dereference in SimpleTable constructor
2026-01-30 [C++] Remove the TODO asking to remove check_metadata in compare.cc
2026-01-28 [C++] Enable Minimal Find Arrow package test for MSVC CI with VCPKG
2026-01-27 [C++][Parquet] Add arrow::Result versions for parquet::arrow::RowGroupReader::ReadTable
2026-01-25 [C++] arrow::schema construction performance degrades to O(n^2) on libc++ for duplicate/unnamed fields
2026-01-23 EHN: Add a is_castable function and/or errors=coerce option to cast
2026-01-21 [C++] Optimize Decimal128 abs hotspot; add exec‑only TPC‑H Q1 benchmark
2026-01-21 [C++] Building Flight with bundled gRPC / Abseil fails with gcc-15 and C++20
2026-01-21 [C++][CI] Fuzz more IPC reading features
2026-01-21 [C++] Support List, Map, and Struct types in replace_with_mask, fill_null_forward, and fill_null_backward
2026-01-20 [C++] Optimize StructArray diffing with field by field comparison
2026-01-16 [C++] Be able to replace default Codec implementation
2026-01-16 [C++] Enable use of USE_OS_TZDB with conda, also on windows
2026-01-13 [C++][FS]Core dump when s3fs allow_delayed_open is on.
2026-01-09 [C++] Remove redundant nullptr assignment in FixedSizeBinary to String/Binary cast
2026-01-09 [C++] Address “Compatibility with CMake < 3.5 has been removed” error
2026-01-08 [C++][Windows] mmap returns Win32 error codes instead of errno
2026-01-07 [C++][Python] Inconsistent strftime %Z flag output on Windows
2026-01-07 [C++] Substrait serialised expression size increases exponentially due to extension URIs being repeated
2026-01-06 [C++][Parquet] Fix undefined behavior in memcpy with nullptr for empty ByteArray
2026-01-06 [C++] C++20: Re-enable timezone tests once GCC fixes chrono::time_zone::get_info behavior
2026-01-06 [C++] Some CTypeTraits are missing
2026-01-06 [C++] Eliminate Array boxing in scalar string kernels
2026-01-06 [C++][FlightPRC][Arrow Flight SQL ODBC] Support descriptor fields SQLGetDescField and SQLSetDescField
2026-01-06 [C++] Cache compiled regex matchers in string kernels
2026-01-04 [C++] Test filter operations with random null probabilities
2026-01-03 [C++][FlightRPC][MSVC] Flight SQL: Potential deadlock error after C++ 20 requirement is enabled
2025-12-31 [ALP][Parquet] Add C++ implementation of ALPpd encoder/decoder
2025-12-30 [C++] Remove IPC dependency from array_union_test.cc by moving MakeUnion to testing utilities
2025-12-29 [C++][Python] Incorrect results from hash_pivot_wider
2025-12-28 ParquetDataset filtering on hive partitionned datasets accesses unrelated directories and files
Open PRs from last 90 days (new contributors highlighted)
Date Title
2026-03-27 GH-49609: [CI][R] AMD64 Windows R release fails with IOError: Bucket ‘ursa-labs-r-test’ not found
2026-03-27 GH-49601: [C++] Update bundled AWS SDK C++ for C23
2026-03-26 GH-49392: [C++][Compute] Fix fixed-width gather byte offset overflow in list filtering
2026-03-26 GH-48590: [C++] Migrate SFINAE enable_if patterns to C++20 concepts
2026-03-24 DRAFT: set up static build of ODBC FlightSQL driver
2026-03-23 GH-49582: [C++][FlightRPC] Add ODBC Debian support
2026-03-19 GH-49463: [C++][FlightRPC] Add Ubuntu ODBC Support
2026-03-18 GH-49548: [C++][FlightRPC] Decouple Flight Serialize/Deserialize from gRPC transport
2026-03-18 GH-49539: [C++][Parquet] Fix argument count check in parquet_scan
2026-03-14 GH-49502: [Parquet][C++] Fix missing overflow check for dictionary encoder indices count
2026-03-14 GH-32381: [C++] Improve error handling for hash table merges
2026-03-10 GH-49274: [Doc][C++] Document security model for Arrow C++
2026-03-10 GH-49445: [C++]: Fix potential dereference of nullptr
2026-03-05 GH-49272: [C++][CI] Fix intermittent segfault in arrow-json-test with MinGW
2026-02-26 GH-37065: [C++][Parquet] use std::optional in parquet statistics
2026-02-24 GH-49360: [C++][ORC] Add stripe statistics API to ORCFileReader
2026-02-22 GH-41365: [Python] Fix S3 URI with whitespace silently falling back to LocalFileSystem
2026-02-20 GH-49351: [C++] Check TZDIR environment variable in vendored date library
2026-02-20 MINOR: [C++] use std::move to avoid unnecessary copies
2026-02-19 GH-37937: [C++][FlightRPC] Investigate using gRPC’s generic API using gRPC’s BidiReactor
2026-02-18 GH-49315: [C++] Optimize Decimal128 abs; add exec-only TPC-H Q1 benchmark
2026-02-17 GH-45193: [C++][Compute] Treat NaNs and nulls as distinct values in rank tie-breaking
2026-02-16 GH-48408: [C++] Enable ULP-based float comparison
2026-02-15 GH-49288: [C++][ORC] Add OrcFileFragment with stripe-level subsetting
2026-02-13 [C++] Harden synthetic OOM tests for mimalloc
2026-02-12 GH-49258: [C++][Python] Add public APIs for reading and serializing IPC dictionary messages
2026-02-08 [C++] feat: add streaming Snappy codec using official framing format
2026-02-05 [WIP] CLI program to write ALP encode example parquet files
2026-02-03 GH-49123: [C++] Complete std::numeric_limits specialization
2026-02-03 MINOR: [C++] Remove obsolete TODO comment from mode kernel registration
2026-02-02 GH-49112: [C++] Support half-float tensors in equality comparison
2026-02-02 GH-49110: [C++] Add bounds checking to DataType::field() to return nullptr for out-of-bounds access
2026-01-31 GH-49107: [C++] Doc update in aggregate_basic for first-last
2026-01-30 GH-49079: [C++] Fix null pointer dereference in SimpleTable constructor
2026-01-30 GH-49073: [C++] Remove the TODO asking to remove check_metadata in compare.cc
2026-01-23 GH-48926: [C++] Upgrade Abseil/Protobuf/GRPC/Google-Cloud-CPP bundled versions
2026-01-21 GH-48918: [C++] Support List, Map, and Struct types in replace_with_mask, fill_null_forward, and fill_null_backward
2026-01-21 GH-37221: [C++] Refactor REE utilities with separating internal API, add flexible type matcher, and JSON support
2026-01-20 GH-46901: [C++][Compute] Add remainder and mod kernels
2026-01-20 GH-48906: [C++] Optimize StructArray diffing with field by field comparison
2026-01-20 GH-36503: [C++] Make DictionaryArray::dictionary() thread-safe
2026-01-14 Fix error with Boost::headers when building with Thrift
2026-01-13 GH-48120: [C++][ODBC] Remove and replace IsComplexType with is_nested
2026-01-13 GH-48837: [C++] Remove invalid DCHECK when allow_delayed_open is true
2026-01-10 GH-47173: [C++] Implement JSON writer
2026-01-09 GH-48802: [C++] Replace redundant nullptr assignment to DCHECK_EQ in FixedSizeBinary to String/Binary cast
2026-01-09 GH-35830: [C++] Fix fixed-size list scalar hashing with non-zero offsets
2026-01-08 GH-48761: [C++] Fix duplicate Substrait function URI registration
2026-01-06 GH-48744: [C++][Parquet] Fix undefined behavior in memcpy with nullptr
2026-01-06 GH-48733: [C++] Eliminate Array boxing in scalar string kernels
2026-01-06 Optimize column reader by bitmap counting definition levels
2026-01-04 GH-48723: [C++] Test filter operations with random null probabilities
2026-01-04 GH-48695: [Python][C++] Add max_rows parameter to CSV reader
2026-01-03 GH-47995: [C++][Parquet] Fix empty string min/max statistics being lost during merge
2026-01-03 GH-36283: [C++] Fix predicate pushdown for nullable columns with range statistics
2026-01-02 GH-46872: [C++][Python] Move Arange utility function to an Arrow C++
2025-12-30 GH-48682: [C++] Remove IPC dependency from array_union_test.cc by moving MakeUnion to testing utilities

Open Issues (and change since 1 month ago)

2933 (-226)

Open PRs (and change since 1 month ago)

324 (-19)
Development Activity - Commits by Month
Dev Mailing List Summary

LLM-generated summary of this month’s activity on the dev mailing list.

  • Numba support for PyArrow: Proposal to integrate Numba JIT support directly into PyArrow to enable high-performance calculations on Arrow data while managing C++ ABI stability. Thread

  • JDK 17 minimum for Arrow Java: Consensus reached to raise the minimum requirement for the Java implementation to JDK 17 starting with version 20.0.0. Thread

  • Arrow Java 19.0.0 release: The community approved the 19.0.0 release for Java, which includes a transition of the release process to new GPG keys. Thread

  • Flight SQL execution path consolidation: Discussion on unifying query and update paths in Flight SQL to improve driver consistency and reduce roundtrips for prepared statements. Thread

  • ADBC driver migration: Update on the migration of BigQuery, Databricks, and Snowflake drivers to a new dedicated community-driven GitHub organization. Thread

  • Critical Java Flight regression: Investigation of a bug in versions 16-18 where using Unix Domain Sockets with Epoll causes a ClassCastException in the FlightClient. Thread

  • Map type field naming: Debate over whether language implementations should strictly enforce standard “key”, “value”, and “entries” field names for Map types to ensure cross-language compatibility. Thread

  • LLMs for project maintenance: Exploratory discussion on utilizing Large Language Models to automate repo management tasks like issue labeling and triage. Thread

  • Arrow Rust 58.1.0 release: The Rust implementation and the Object Store sub-project both successfully released new versions following community votes. Thread

PRs by contributor type (last 18 months)
Recent PRs from new contributors
Date Title Author State
2026-03-24 Gh 49232 deprecate feather python piyushka-ally open
2026-03-19 GH-47685: [Docs][Python] Add nested grouping to Python docs TOC Desel72 open
2026-03-19 GH-49555: [Python][Packaging] Add riscv64 manylinux wheel builds to release pipeline gounthar open
2026-03-18 GH-47696: [Docs] PyCapsule protocol implementation status Desel72 open
2026-03-18 GH-49539: [C++][Parquet] Fix argument count check in parquet_scan domibel open
2026-03-14 GH-49502: [Parquet][C++] Fix missing overflow check for dictionary encoder indices count aryansri05 open
2026-03-14 GH-32381: [C++] Improve error handling for hash table merges kris-gaudel open
2026-03-11 GH-49497: [FlightRPC] Add is_update field to ActionCreatePreparedStatementResult ennuite open
2026-03-10 GH-49445: [C++]: Fix potential dereference of nullptr CT811 open
2026-03-09 GH-49397: [Python] Fix PYARROW_GENERATE_COVERAGE end-to-end flow omertt27 open
2026-03-02 Fixed usage of libarrow built from the git checkout not the released version from apt GavkareShubham open
2026-02-28 GH-37958: [MATLAB] Centralize Command Window hyperlink formatting kriyanshii open
2026-02-24 GH-49360: [C++][ORC] Add stripe statistics API to ORCFileReader cbb330 open
2026-02-22 GH-41365: [Python] Fix S3 URI with whitespace silently falling back to LocalFileSystem ernestprovo23 open
2026-02-21 GH-49273: [Python] Move stub docstring script to _build_utils with gr… vanshaj2023 open
2026-02-21 GH-49273: [Python] Move stub docstring script to _build_utils with gr… vanshaj2023 closed
2026-02-20 GH-49351: [C++] Check TZDIR environment variable in vendored date library canassa open
2026-02-20 MINOR: [C++] use std::move to avoid unnecessary copies SYaoJun open
2026-02-18 GH-49315: [C++] Optimize Decimal128 abs; add exec-only TPC-H Q1 benchmark Manas103 open
2026-02-16 GH-49272: [C++][CI] Fix intermittent segfault in arrow-json-test on M… vanshaj2023 closed
2026-02-15 GH-49288: [C++][ORC] Add OrcFileFragment with stripe-level subsetting ShreyeshArangath open
2026-02-14 GH-49285: [Python] Add buffer parameter to RecordBatch.serialize() rustyconover open
2026-02-13 [C++] Harden synthetic OOM tests for mimalloc k8ika0s open
2026-02-13 [C++] Harden synthetic OOM tests for mimalloc k8ika0s closed
2026-02-12 GH-49258: [C++][Python] Add public APIs for reading and serializing IPC dictionary messages rustyconover open
2026-02-12 MINOR: [Docs] Document tzdata behaviour on non-FHS systems shr3yas-k open
2026-02-10 GH-49168: [Python] map date32/date64 to datetime64[s] in to_pandas_dtype PratyushD21 open
2026-02-08 GH-48986: [Python][Dataset] Add filters parameter to orc.read_table() for predicate pushdown (15/15) cbb330 closed
2026-02-06 GH-48182: [Python] Add regression test for RecordBatch.from_struct_array with list offsets Gnananjali open
2026-02-05 GH-43075: [Docs][Python] Document that source parameter to IPC reader… aayush-1o open
2026-02-03 GH-49129: [R][Parquet] Guard Parquet dataset code with ARROW_R_WITH_PARQUET IsabelParedes open
2026-01-31 GH-49107: [C++] Doc update in aggregate_basic for first-last FrenchCommando open
2026-01-28 GH-35460: Use simdjson instead of RapidJSON Taepper closed
2026-01-27 GH-48986: [C++][Docs] Add documentation and examples for ORC predicate pushdown (14/14) cbb330 closed
2026-01-27 GH-48986: [C++][Dataset] Improve error handling and validation (13/14) cbb330 closed
2026-01-27 GH-48986: [C++][Dataset] Add NULL handling predicate support (12/14) cbb330 closed
2026-01-27 GH-48986: [C++][Dataset] Add NOT operator support (11/14) cbb330 closed
2026-01-27 GH-48986: [C++][Dataset] Add OR compound predicate support (10/14) cbb330 closed
2026-01-27 GH-48986: [C++][Dataset] Add AND compound predicate support (9/14) cbb330 closed
2026-01-27 GH-48986: [C++][Dataset] Add equality and IN operator support (8/14) cbb330 closed
2026-01-27 GH-48986: [C++][Dataset] Add INT32 support with overflow protection (7/14) cbb330 closed
2026-01-27 GH-48986: [C++][Dataset] Add support for remaining INT64 comparison operators (6/14) cbb330 closed
2026-01-27 GH-48986: [Python] Add placeholder for ORC predicate pushdown Python bindings (6/15) cbb330 closed
2026-01-27 GH-48986: [C++][Dataset] Integrate ORC stripe filtering with dataset scanner (5/14) cbb330 closed
2026-01-27 GH-48986: [C++][Dataset] Add basic ORC stripe filtering API with predicate pushdown (4/14) cbb330 closed
2026-01-27 GH-48986: [C++][Dataset] Add lazy evaluation infrastructure for ORC predicate pushdown (3/14) cbb330 closed
2026-01-27 GH-48986: [C++][Dataset] Add Arrow expression builder for ORC statistics (2/14) cbb330 closed
2026-01-27 GH-48986: [C++][Dataset] Add ORC stripe statistics extraction foundation (1/14) cbb330 closed
2026-01-26 GH-48986: [C++][Dataset] Add ORC predicate pushdown with stripe filtering cbb330 closed
2026-01-21 GH-48931: [C++] Optimize Decimal128 abs; add exec-only TPC-H Q1 benchmark Manas103 closed
2026-01-20 GH-46901: [C++][Compute] Add remainder and mod kernels fangchenli open
2026-01-19 GH-43510: [PYTHON] Move NumPy specific tests to separate test file. Reranko05 open
2026-01-19 GH-33241: [Archery] Replace github3 with pygithub fangchenli open
2026-01-14 Fix error with Boost::headers when building with Thrift RUSLoker open
2026-01-13 Update dplyr-funcs-doc.R to fix a typo Moohan open
2026-01-08 GH-48761: [C++] Fix duplicate Substrait function URI registration mohit7705 open
2026-01-08 GH-48788: [C++] Fix Windows mmap error code mapping in mman.h Manas103 closed
2026-01-06 Optimize column reader by bitmap counting definition levels aryan-cloud open
2026-01-04 GH-48695: [Python][C++] Add max_rows parameter to CSV reader hyangminj open
Good First Issues by Component
Good First Issues
82 issues across 11 components
Component Issues
C++ 31
Python 19
R 12
Documentation 7
MATLAB 4
(No component) 2
FlightRPC 2
Parquet 2
Archery 1
Continuous Integration 1
Other 1
Needs Champion
Needs Champion
74 items - sorted by age (newest first)
Title Type Component Created Age (days)
[R] creating arrow supported expressions Issue R 2025-02-05 416
[Docs] Clean up LICENSE and NOTICE files Issue Documentation 2024-11-28 485
[R] Provide helpful hints for NotImplemented kernel errors Issue R 2024-11-15 498
[C++][Python] Potential improvements around supply chain security Issue C++, Python 2024-11-09 504
[Python] Efficient way to iterate over groups Issue Python 2024-11-07 506
[R] Add Cumsum and duplicated bindings to datasets in R Issue R 2024-11-07 506
[CI][Packaging][Python] Enable BuildKit for building wheel on Windows Issue Python, Continuous Integration, Packaging 2024-11-07 506
[Dev][Archery] Use --arrow-ref instead of --arrow-sha in archery crossbow submit Issue Archery, Developer Tools 2024-10-31 513
[Format][Docs] Describe C device interface on C data interface and C stream interface docs pages Issue Documentation, Format 2024-10-26 518
Misleading error message when casting Issue Python 2024-10-25 519
[CI][C++] Use a separated Docker image for Emscripten Issue C++, Continuous Integration 2024-10-18 526
[CI][C++] Add clang-cl job Issue C++ 2024-10-11 533
[GLib] Add a sub Buffer class for GBytes based buffer Issue GLib 2024-10-09 535
[C++] Binary View Compute Kernels Issue C++ 2024-10-08 536
[R] please write unregister_scalar_function and/or make registration local/temporary Issue R 2024-10-04 540
[R] Support integer date and time classes from data.table Issue R 2024-09-19 555
[C++][Parquet] Add support for arrow::ArrayStatistics Issue Parquet, C++ 2024-08-04 601
[R] Subtracting X days from a given date in ymd format Issue R 2024-08-01 604
[Python] Move tests that are explicitly about conversion to/from numpy on test_array.py to a separate file Issue Python 2024-08-01 604
[R] Implement anonymous functions in calls to dplyr::across Issue R 2024-07-10 626
[CI][Dev] Add shell script formatter Issue Continuous Integration, Developer Tools 2024-06-28 638
[C++] Add support for system mimalloc Issue C++ 2024-06-19 647
[Python] Get size of IPC File ahead of time Issue Python 2024-06-07 659
[Python] Conversion to/from numpy 2.0+ new StringDType Issue Python 2024-06-06 660
[Docs][Format] Move IPC format spec back into a separate page Issue Documentation, Format 2024-05-15 682
[C++][Python] Update DLPack version Issue C++, Python 2024-05-15 682
[C++][Parquet] Predicate pushdown through arrow::dataset::ScanBuilder::Filter() not available on list fields Issue Parquet, C++ 2024-05-14 683
[R] Unable to disable url-encoding Issue R 2024-05-10 687
[Python] Pyarrow fs incorrectly resolves S3 URIs with white space as a local path Issue Python 2024-04-24 703
[CI][Archery] Archery linking should also check for undefined symbols Windows Issue Archery 2024-04-03 724
[CI][Archery] Archery linking should also check for undefined symbols macOS Issue Archery 2024-04-03 724
[CI][Archery] Archery linking should also check for undefined symbols Linux Issue Archery 2024-04-03 724
[Ruby] Improve Ruby’s GC integration Issue Ruby 2024-03-29 729
[Python] Allow pyarrow import to fail without triggering Py_FatalError Issue Python 2024-03-27 731
[Parquet] Make default fallback encoding choice smarter Issue Parquet, C++ 2024-03-18 740
[Python][Docs] Max batch size for Dataset Issue Python, Documentation 2024-03-15 743
[C++] Update vendored FlatBuffers to 24 Issue C++ 2024-03-14 744
[C++] String manipulation on a dictionary column Issue C++ 2024-03-08 750
[Python] Consider splitting _lib module into several parts Issue Python 2024-02-20 767
[Docs] Add a doc section for tensor arrays Issue Documentation 2024-02-08 779
[R] Export functions for low-level pointer operations Issue R 2024-01-25 793
[R] Update the docs to show how to avoid situations like data loss with leading zero in partition column Issue R 2024-01-17 801
[R] Use correct attribution in the footer of pkgdown site Issue R 2024-01-14 804
[Python] Clean up ExtensionType.__reduce__ Issue Python 2023-12-06 843
[R] Inconsistent naming Issue R 2023-10-25 885
[R] open_dataset - format is unclear Issue R 2023-10-10 900
[R] expose decimal_point argument in CSVConvertOptions Issue R 2023-10-03 907
[R] open_dataset() behavior with incorrectly quoted input data Issue R 2023-09-27 913
Update default version in parquet.rst PR Documentation 2023-08-17 954
GH-36831: [C++] DictionaryArray support for MinMax Function PR C++ 2023-08-10 961
Missing kernels for ordering with struct types Issue Python 2023-08-09 962
[C++][Parquet] Process parquet rowgroups without Arrow conversion Issue Parquet, C++, Python 2023-05-17 1046
[C++] Why is arrow mmap marked MAP_PRIVATE (during read)? Issue C++ 2023-04-25 1068
[Python] Bindings for FixedShapeTensorType.FromTensor/ToTensor and FixedShapeTensorArray.strides Issue Python 2023-04-12 1081
[Python][Docs] Update/rearrange Data Types section and add FixedShapeTensorType Issue Python, Documentation 2023-04-12 1081
[R] Add an argument to open_csv_dataset() to repair duplicated column names or ignore them? Issue R 2023-04-07 1086
[Python] unexpected URL encoded path (white spaces) when uploading to S3 Issue Python 2023-04-05 1088
[Format][FlightRPC] Transfer FlightData in pieces Issue FlightRPC, Format 2023-03-07 1117
[C++][Python] Support parsing a StringArray full of JSON to a Table Issue C++, Python 2023-01-13 1170
[C++] Decide on duplicate column handling in scanner, add more tests Issue C++ 2022-11-22 1222
[C++] Add a “list_contains” kernel Issue C++ 2022-10-19 1256
“Edit this page” on docstring generated docs gives 404 Issue Documentation 2022-06-10 1387
[Archery] Add documentation for local development in archery/crossbow Issue Developer Tools 2022-04-12 1446
[R] Arrow/Parquet can’t open encrypted parquet files Issue R 2022-01-26 1522
[Python][Docs] Opening a partitioned dataset with schema and filter Issue Python, Documentation 2022-01-12 1536
[C++] Name the threads in thread pools Issue C++ 2022-01-07 1541
[Python] Support other interval types Issue Python 2021-10-07 1633
[Python] Array.__str__ shows misleading output for timestamp types with time zone set Issue C++, Python 2020-07-19 2078
[C++/Python] Kernel for SetItem(IntegerArray, values) (“replace_with_indices”) Issue C++, Python 2020-07-13 2084
[R] Add bindings to ConcatenateTables Issue R 2020-05-09 2149
[Crossbow] Eliminate libgit2 dependency Issue Developer Tools 2020-03-12 2207
[C++][Python] Support ExtensionType arrays in more kernels Issue C++ 2019-07-09 2454
[Doc] Better document the Tensor classes in the prose documentation Issue C++, Python, Documentation 2019-07-04 2459
[Python] Add documentation section for integrations with PyTorch, TensorFlow Issue Python, Documentation 2018-02-01 2977