Open Issues (last 90 days)

31

Issues from New Contributors

15

Open PRs (last 90 days)

15

PRs from New Contributors

11

Open issues from last 90 days (new contributors highlighted)
Date Title
2026-07-02 [Python] Add AGENTS.md
2026-07-01 [Python] to_pylist() on list-typed arrays is several times slower than converting via to_pandas()
2026-07-01 [Docs][C++][Python] Update docs, add examples, tutorials
2026-07-01 [Python][Parquet] FIXED_LEN_BYTE_ARRAY fails to cast to UUID on Python 3.14 / Nightly builds
2026-06-29 [Python] Dask integration test fails due to what seems a pytest mark collection error
2026-06-24 [Python] PyArrow copies large BinaryView buffers during construction and scalar extraction
2026-06-16 [Python][C++] Deadlock in ensure_s3_finalized atexit handler when pyarrow.fs.S3FileSystem reference outlives interpreter shutdown (pyarrow 24.x regression)
2026-06-12 [Python] Consistent to_pandas_dtype (and to_pandas) behavior for canonical extension types
2026-06-10 [Python][Parquet] pyarrow.parquet.write_table ignores version and coerce_timestamps arguments for Time64 fields
2026-06-09 [Python][Parquet] Support reading and writing Parquet Variant columns
2026-06-09 [Python] Bindings for Variant canonical extension type
2026-06-04 [Packaging][Python] Add support for Python 3.15
2026-06-01 [C++][Python] AzureFileSystem can’t use GetProperties with User Delegation SAS tokens
2026-05-24 [Python] Expose Expression.field_refs() to enumerate referenced fields
2026-05-20 [CI][Python] Revert pinning miniforge once mamba solver issue is resolved
2026-05-12 [Python] Azure with SAS Keys
2026-05-12 [Python] Improve Extension Types Support in PyArrow (umbrella issue)
2026-05-01 [Python] FixedShapedTensor to_pandas_dtype is returning NotImplementedError
2026-04-28 [C++][Python] Bind unresolved Substrait expressions using a supplied schema
2026-04-27 [Python] Convert a ListScalar of StructScalar to numpy.recarray
2026-04-22 [Python] test_fastparquet_cross_compatibility fails when using latest PyArrow and Pandas
2026-04-21 [Python] Scalar arithmetic dunders raise TypeError instead of returning NotImplemented
2026-04-21 [Release][Python] Python wheels fail to upload to PyPI due to quota
2026-04-21 [Python][Parquet] Per-column keys for low-level encryption/decryption properties API
2026-04-16 [Python] Allow thousands_sep as a option in pa.ParseOptions
2026-04-15 [Python][Docs] Add code examples for compute function any/all
2026-04-14 [Python] Add row_splits/offsets methods for VariableShapeTensorArray
2026-04-13 [Python] Array.from_buffers docs should mention safety issues
2026-04-07 fixed_size_list<T>[0] parquet round-trips fail: writes fine, read-back raises ArrowInvalid
2026-04-07 [C++][Python] Implement search_sorted kernel for all primitive types and run-end encoded arrays
2026-04-06 [Python] Implement support for the experimental Async Device Stream Interface
Open PRs from last 90 days (new contributors highlighted)
Date Title
2026-07-03 WIP: [Release] Verify release-25.0.0-rc1
2026-07-01 GH-50326: [Python] Speed up to_pylist for list-like and string arrays
2026-07-01 GH-50312: [Python] Fix UUID extension type round-trip to pandas returning bytes
2026-06-11 GH-26685: [Python][C++] Trim buffers when pickling a sliced array
2026-06-10 GH-41488: [C++][Python] Apply timestamp_parsers as fallback when parsing CSV date and time columns
2026-06-10 GH-49907: [Python] Implement FixedShapeTensorType.to_pandas_dtype
2026-05-24 GH-50027: [Format][C++][Python] Add arrow.range canonical extension type
2026-05-23 GH-36388: [C++][Python] Return error from MakeArrayFromScalar on offset overflow
2026-04-27 GH-48344: [Python] Fix Table.from_struct_array for empty ChunkedArray
2026-04-25 GH-39961: [C++][Python] Propagate CSV parse delimiter to write options
2026-04-23 GH-49321: [C++][Python] Add ASAN / UBSAN pixi builds for Arrow and PyArrow
2026-04-22 GH-49826: [Python] Return NotImplemented from Scalar/Array arithmetic dunders for unsupported types
2026-04-21 GH-49826: [Python] Scalar arithmetic dunders return NotImplemented on unknown operand types
2026-04-09 GH-49058: [Python] Disallow non-UTF-8 bytes in custom metadata
2026-04-07 GH-49677 [Python][C++][Compute] Add search sorted compute kernel

Open Issues (last 90 days)

7

Issues from New Contributors

2

Open PRs (last 90 days)

4

PRs from New Contributors

0

Open issues from last 90 days (new contributors highlighted)
Date Title
2026-07-02 [R] read_ipc_stream fails to unify nested uint64 fields inside a Struct array across record batches
2026-06-22 [R] Data race issue from R API requests in parallel region
2026-06-07 [R] R arrow crash
2026-05-19 [R][Wasm] Fix Error: thread constructor failed: Not supported under Wasm
2026-04-22 [R] Date Inf/ -Inf silently corrupted when converting to Arrow date32
2026-04-11 [R] array_to_vector() lacks coverage for several core Arrow types
2026-04-07 [R] CRAN packaging checklist for version 24.0.0
Open PRs from last 90 days (new contributors highlighted)
Date Title
2026-07-03 WIP: [Release] Verify release-25.0.0-rc1
2026-07-02 GH-49607: [R] Add AGENTS.md file to R package
2026-07-01 GH-49231: [C++] Deprecate Feather reader and writer
2026-07-01 MINOR: [R] Adjust parallel::detectCores if NA to not break –parallel input in CMake

Open Issues (last 90 days)

59

Issues from New Contributors

16

Open PRs (last 90 days)

56

PRs from New Contributors

28

Open issues from last 90 days (new contributors highlighted)
Date Title
2026-07-03 [C++][Gandiva] Out-of-bounds read in utf8_length_ignore_invalid on truncated multi-byte input
2026-07-03 [C++] KeyValueMetadata::DeleteMany method crashes with duplicate 0 indexes
2026-07-03 [C++] KeyValueMetadata Bugs
2026-07-03 [C++] KeyValueMetadata accepts duplicate keys
2026-07-02 [C++] Add AGENTS.md to C++ dir
2026-07-02 [C++] Add ComputeLogicalNullCount method to Datum
2026-07-01 [C++][Benchmark] Add additional benchmark for Batch/TableToTensor
2026-07-01 [C++][Parquet] Reject outlandish values in DELTA_BINARY_PACKED decoder
2026-07-01 [C++][Docs] Add guidance about memory bombs
2026-06-30 [C++][IPC] Consider validation of index sizes in SparseCSFIndex::Make
2026-06-29 [C++][FlightRPC] Refactor GRPC server and transport classes
2026-06-29 [C++] Implement VisitTwoBitRuns and VisitTwoSetBitRuns
2026-06-25 [C++] Add GetSpan to ArrayData
2026-06-25 [C++] Simplify or remove arrow/util/functional.h
2026-06-25 [C++] Change BaseBinaryBuilder::GetView to return a view with a null data pointer for entries added via AppendNull
2026-06-25 [C++] Reuse abstraction for null partitions in sorting functions
2026-06-24 [C++] Arrow test ‘arrow-utility-test’ contains container-overflow error
2026-06-18 [C++][Compute] Support string_view/binary_view keys in the hash-aggregate Grouper
2026-06-18 [C++] Compilation error xsimd/xsimd.hpp no such file or directory
2026-06-16 [C++] Move S3 to its own shared library
2026-06-16 [C++][Gandiva] REPLACE throws “Buffer overflow for output string” for results larger than 64 KB
2026-06-11 [C++] Update bundled RE2
2026-06-10 [C++] content-encoding field missing from s3 fs.
2026-06-10 [C++] arrow-utility-test fails spuriously(?) on osx-arm64
2026-06-09 [C++][Gandiva] castVARCHAR(decimal128) can corrupt native memory and return invalid buffers.
2026-06-08 [C++] Make S3Filesystem consume key-value options
2026-06-04 [C++][FlightRPC][ODBC] Simplify package_odbc.yml
2026-06-02 [C++][Compute] Remove redundant cast kernels
2026-06-01 [C++][Python] AzureFileSystem can’t use GetProperties with User Delegation SAS tokens
2026-05-29 [C++] All data is null for one column in one row group for parquet, arrow will encode with dictionary, while parquet-java use encode plain
2026-05-18 [Docs][C++][Parquet] Add API reference
2026-05-14 [C++][Gandiva] Add 2 arg REGEXP_EXTRACT function
2026-05-13 [C++] Fix remaining overflow and negative length handling issues in Gandiva string functions
2026-05-09 [C++][Parquet] Uncontrolled Memory Allocation (OOM) in Parquet Delta decoders
2026-05-08 [C++] HeadBucket called in S3FS breaking IAM scoped prefixes
2026-04-30 [C++] Deprecate RandomAccessFile::Read{At,Async} without allow_short_read
2026-04-28 [C++][Compute] true_unless_null kernel output incorrect when run-end encoded array contains null
2026-04-28 [C++][Python] Bind unresolved Substrait expressions using a supplied schema
2026-04-28 [C++] BufferBuilder integer overflow in size calculations reachable from JSON parsing
2026-04-28 [C++] Neon not enabled on Windows ARM64
2026-04-24 [C++] Compiler error in xsimd_ep_ep-install

_sse2.hpp with msvc on windows
2026-04-22 [CI][C++] Set test include/exclude by environment variable
2026-04-22 [C++] error: no member named ‘log2p1’ in namespace ‘std’
2026-04-21 [C++][FlightRPC][ODBC] linux-packages RPM installer
2026-04-21 [C++][FlightRPC][ODBC] linux-packages DEB installer
2026-04-20 [C++][FlightRPC][ODBC] Flakey GRPC_CALL_ERROR_TOO_MANY_OPERATIONS error in ODBC Linux
2026-04-20 [C++] arrow::Decimal128::FromString silently truncates when the input string has more than 38 significant digits
2026-04-20 [C++][FlightRPC][ODBC] Only run ODBC tests in ODBC CI
2026-04-20 [C++][FlightRPC][ODBC] Skip build for non-ODBC test binaries in CI
2026-04-20 [C++][FlightRPC][ODBC] Address Disconnect() in FlightSQL ODBC test suite
2026-04-18 [Format][FlightSQL] Add dialect-related SqlInfo codes (LIMIT/OFFSET syntax, NULLS ordering clause, boolean/datetime literals)
2026-04-17 [C++][FlightRPC][ODBC] Implement support for SQL_DRIVER_AWARE_POOLING_SUPPORTED
2026-04-17 [C++][FlightRPC] crashes during gRPC status conversion with Clang-built Arrow 17.0.0
2026-04-14 [Release][C++] C++ Extra workflow for the apache-arrow-24.0.0-rc0 was canceled
2026-04-12 [C++][Parquet] Use PLAIN as default encoding for float32 and float64 columns in Parquet writer
2026-04-10 [C++][Parquet] Make writing of ColumnMetaData.path_in_schema optional
2026-04-10 [C++][Format] Can’t roundtrip Dictionary of ExtensionType over IPC
2026-04-09 [C++][Flight] ODBC BlockingQueue not working properly on Linux
2026-04-07 [C++] RecordBatch::MakeEmpty() may drop ordered flag and unsignedness of dictionary types when creating dictionary-encoded column
Open PRs from last 90 days (new contributors highlighted)
Date Title
2026-07-03 GH-50341: [C++] Add AGENTS.md to C++ dir
2026-07-03 WIP: [Release] Verify release-25.0.0-rc1
2026-07-03 GH-50355: [C++][Gandiva] fix out-of-bounds read in utf8_length_ignore_invalid
2026-07-03 GH-50351: [C++] Fix KeyValueMetadata::DeleteMany crash with duplicate indices
2026-07-03 GH-50348: [C++] Fix duplicate keys issue in KeyValueMetadata
2026-07-02 GH-50338: [C++] Add ComputeLogicalNullCount to Datum
2026-07-02 WIP: Dummy PR to check maint-25.0.0 status
2026-07-01 GH-49231: [C++] Deprecate Feather reader and writer
2026-07-01 GH-50251: Add GetSpan() convenience method to ArrayData
2026-06-30 GH-50222: [C++] Use FetchContent for xsimd
2026-06-29 GH-50280: [C++] Implement VisitTwoBitRuns and VisitTwoSetBitRuns methods
2026-06-27 GH-35692: [C++][Parquet] Support to read fixed size list array with nulls
2026-06-26 GH-49889: [C++][Compute] Handle logical nulls in validity kernels
2026-06-25 GH-50247: [C++] Reuse abstraction for null partitions in sorting functions
2026-06-20 GH-45948: [C++][Parquet] Variant shredding
2026-06-18 GH-50223: [C++][Compute] Support string_view/binary_view keys in the hash-aggregate Grouper
2026-06-16 GH-50186: [C++][Gandiva] REPLACE throws “Buffer overflow for output string” for results larger than 64 KB
2026-06-12 GH-50148: [C++] Add Content-Encoding support to S3 filesystem metadata
2026-06-12 GH-43010: [C++][Compute] Support view arrays in selection kernels
2026-06-11 GH-26685: [Python][C++] Trim buffers when pickling a sliced array
2026-06-10 GH-48740: [C++] Add missing CTypeTraits for decimal types
2026-06-10 GH-41488: [C++][Python] Apply timestamp_parsers as fallback when parsing CSV date and time columns
2026-06-09 GH-50140: [C++][Gandiva] Fix castVARCHAR(decimal128) native memory corruption / SIGSEGV on allocation failure
2026-06-09 GH-50136: [C++][Gandiva] Enhance CHR to work with unicode.
2026-06-09 GH-40781: [C++] Mechanism for disabling MemoryPoolStats at compile-time
2026-06-08 MINOR: [C++][Gandiva] cast to unsigned char before ctype calls
2026-06-08 GH-45947 : [C++][Parquet] Variant encoding
2026-06-08 GH-45946: [C++][Parquet] Variant decoding
2026-06-01 GH-45804: [C++][Statistics] Add array statistics schema support
2026-05-29 GH-50067: [C++] bounds-check Feather V1 column buffer slices
2026-05-29 GH-49904: [C++] Deprecate RandomAccessFile legacy ReadAt and ReadAsync
2026-05-27 GH-29309: [C++] Preserve BinaryBuilder data type
2026-05-24 GH-50027: [Format][C++][Python] Add arrow.range canonical extension type
2026-05-23 GH-36388: [C++][Python] Return error from MakeArrayFromScalar on offset overflow
2026-05-22 GH-49482: [C++][FlightRPC][ODBC] Fix inconsistent SQLGetInfo values in global connection
2026-05-21 [C++][Parquet] Saturate ApplicationVersion components instead of atoi UB
2026-05-18 GH-49985: [C++][Gandiva] Duplicate function aliases with same parameters
2026-05-18 GH-49973: [C++] Fix Gandiva string length checks
2026-05-14 GH-49977: [C++][Gandiva] Add regexp_extract optional third parameter function version
2026-05-09 GH-49955: [C++] Fix OOM vulnerability in Parquet Delta decoders
2026-05-04 GH-49957 [C++][Parquet] Support reading dictionary encoded boolean pages
2026-05-02 GH-47662: [C++][Parquet] Reject metadata with null_count on required column
2026-04-29 GH-40024: [C++][Gandiva] Selectively register external C functions based on expression usage
2026-04-28 GH-49884: [C++] Fix integer overflow in BufferBuilder reachable from JSON parsing
2026-04-28 GH-49881 [C++][Parquet] Support writing encrypted bloom filters
2026-04-25 GH-39961: [C++][Python] Propagate CSV parse delimiter to write options
2026-04-24 GH-39808: [C++][Parquet] Evict pre-buffered row-group bytes after decode
2026-04-23 GH-46994: [C++][Parquet] Reuse BinaryView headers for repeated values in dictionary and DELTA_BYTE_ARRAY decoding
2026-04-21 GH-49817: [C++] Reject decimal strings that exceed the target precision
2026-04-20 GH-20314: [C++] Add GCS connection pool size option
2026-04-19 GH-49674: [C++][Array] Preserve ordered flag for DictionaryType in MakeEmptyArray
2026-04-19 GH-41017: [C++] Preserve ordered flag in DictionaryBuilder
2026-04-18 GH-49792: [Format][FlightSQL][C++] Add dialect-related SqlInfo codes
2026-04-17 GH-47877: [C++][FlightRPC] ODBC Linux rpm installer support with Cpack
2026-04-16 GH-33823: [C++][IPC] Improve error messages when opening files that are the wrong format
2026-04-07 GH-49677 [Python][C++][Compute] Add search sorted compute kernel
Activity Trends (2 years)
Dev Mailing List: LLM-generated summary of this month’s activity
  • Conbench v2 redesign: Wes McKinney proposed a major overhaul of the project’s benchmarking infrastructure, introducing a Go-based backend and Svelte frontend to resolve performance bottlenecks and improve maintainability. Thread

  • Variant type implementation coordination: Developers from multiple organizations are collaborating on adding the complex Variant type to Arrow C++, focusing on native performance and Parquet specification compliance. Thread

  • Deprecating Tensor IPC messages: A proposal was introduced to deprecate experimental IPC messages for Tensors and SparseTensors in favor of using canonical extension types like FixedShapeTensor. Thread

  • GitHub Actions resource optimization: In response to ASF infrastructure limits, the project successfully reduced its shared runner usage by approximately 27% through workflow streamlining. Thread

  • JSON representation of Arrow schemas: Discussion is underway to define a canonical human-readable JSON format for schemas to improve ergonomics for ADBC metadata and application-level API contracts. Thread

  • DuckDB ADBC extension donation: The community is evaluating the donation of a new DuckDB extension for ADBC, sparking discussions about project governance and competition with existing community-led drivers. Thread

  • Arrow Erlang repository transfer: The project has officially accepted the donation of the Erlang implementation, which has now been moved to the Apache Arrow organization on GitHub. Thread

  • GitHub Copilot review guidelines: Maintainers are establishing best practices for automated AI code reviews, including disabling them for draft pull requests to reduce clutter and defining how contributors should address AI suggestions. Thread

PRs by contributor type (last 18 months)
Recent PRs from new contributors
Date Title Author State
2026-07-03 GH-50341: [C++] Add AGENTS.md to C++ dir AdvancedUno open
2026-07-03 GH-50355: [C++][Gandiva] fix out-of-bounds read in utf8_length_ignore_invalid Arawoof06 open
2026-07-03 GH-50351: [C++] Fix KeyValueMetadata::DeleteMany crash with duplicate indices AdvancedUno open
2026-07-03 GH-50351: [C++] Fix KeyValueMetadata::DeleteMany crash with duplicate indices AdvancedUno closed
2026-07-03 GH-30800: [Python][Docs] Document partition fields with explicit dataset schemas kalyanamdewri open
2026-07-01 GH-50312: [Python] Fix UUID extension type round-trip to pandas returning bytes parker-cassar open
2026-07-01 GH-50087: [Docs][C++] Fix sentence structure in memory.rst regarding MemoryManager OmBiradar closed
2026-07-01 GH-50311: [C++] KeyValueMetadata::Delete returns IndexError instead of crashing due to seg fault OmBiradar closed
2026-07-01 GH-50251: Add GetSpan() convenience method to ArrayData VedantRalekar open
2026-06-30 MINOR: [Docs] Add Timestamp With Offset to canonical extension types status esadek closed
2026-06-29 MINOR: [CI] Bump actions/cache from 5 to 6 dependabot[bot] closed
2026-06-28 ci: pin GitHub Actions to full commit SHAs XananasX7 closed
2026-06-28 ci: pin GitHub Actions to full commit SHAs XananasX7 closed
2026-06-26 GH-47252: [C++][Compute] Fix sort_indices for temporal types in arrow::Table nfrmtk closed
2026-06-26 GH-50260: [C++] Add ComputeLogicalNullCount to ChunkedArray goel-skd closed
2026-06-24 GH-50237: [C++] Migrate arrow/ipc/metadata_internal.h to Result Shally-Katariya closed
2026-06-22 MINOR: [CI] Bump actions/checkout from 6 to 7 dependabot[bot] closed
2026-06-21 GH-49928: [C++][Parquet] Fix UB in UpdateLevelHistogram from nullptr std::span Diveyam-Mishra closed
2026-06-20 GH-45948: [C++][Parquet] Variant shredding qzyu999 open
2026-06-18 GH-50223: [C++][Compute] Support string_view/binary_view keys in the hash-aggregate Grouper fangchenli open
2026-06-17 GH-49644: [Python] Support converting list of multi-dimensional arrays to FixedShapeTensor aboderinsamuel closed
2026-06-16 GH-46480: [Docs] Add CITATION.cff citation metadata d33bs open
2026-06-16 GH-50197: [C++][Python] Add “hypot” compute kernel shrivasshankar closed
2026-06-12 GH-50148: [C++] Add Content-Encoding support to S3 filesystem metadata alytantawyy open
2026-06-11 GH-26685: [Python][C++] Trim buffers when pickling a sliced array kirilklein open
2026-06-11 GH-48636: [C++][Parquet] Improve parquet reading using multi threads OmBiradar open
2026-06-10 GH-50149: [C++][Parquet] Handle OOM as soft failure in parquet encoding fuzzer omertt27 closed
2026-06-10 GH-48740: [C++] Add missing CTypeTraits for decimal types Naurder open
2026-06-10 GH-41488: [C++][Python] Apply timestamp_parsers as fallback when parsing CSV date and time columns pearu open
2026-06-10 GH-49907: [Python] Implement FixedShapeTensorType.to_pandas_dtype aboderinsamuel open
2026-06-09 Add support for ppc64le Jenkins-J open
2026-06-09 GH-40781: [C++] Mechanism for disabling MemoryPoolStats at compile-time var-nan open
2026-06-08 GH-45947 : [C++][Parquet] Variant encoding qzyu999 open
2026-06-08 GH-45946: [C++][Parquet] Variant decoding qzyu999 open
2026-06-05 GH-49907: [Python] FixedShapeTensor to_pandas_dtype returns NotImplementedError aboderinsamuel open
2026-06-04 GH-50103: [C++] Missing iosfwd include in cpp/src/arrow/util/string_util.h atupone closed
2026-06-03 [WIP][POC] Pfor encoding prtkgaur open
2026-06-02 MINOR: [CI] Bump matlab-actions/run-tests from 3.1.2 to 3.2 dependabot[bot] closed
2026-06-01 GH-45804: [C++][Statistics] Add array statistics schema support cjc0013 open
2026-05-29 GH-49904: [C++] Deprecate RandomAccessFile legacy ReadAt and ReadAsync UdayanMahalwar open
2026-05-27 GH-29309: [C++] Preserve BinaryBuilder data type BITree2004 open
2026-05-27 GH-33420: [R] Improve error message when providing a mix of readr and Arrow options Rich-T-kid closed
2026-05-25 MINOR: [CI] Bump matlab-actions/run-tests from 3.0.0 to 3.1.2 dependabot[bot] closed
2026-05-25 MINOR: [CI] Bump docker/login-action from 4.1.0 to 4.2.0 dependabot[bot] closed
2026-05-25 GH-50077: [C++][IPC] Avoid int64 overflow in ReadSparseCSXIndex jmestwa-coder closed
2026-05-25 GH-50051: [C++][Parquet] Avoid size overflow in WKBBuffer::ReadCoords jmestwa-coder closed
2026-05-25 GH-50078: [C++][ORC] Avoid signed overflow when converting timestamps jmestwa-coder closed
2026-05-24 GH-50026: [C++][Parquet] SIMD-accelerate SBBF probe via branchless autovec dmatth1 closed
2026-05-24 Expose mimalloc allocations for profiler interposition pablogsal closed
2026-05-24 GH-50027: [Format][C++][Python] Add arrow.range canonical extension type Hoeze open
2026-05-23 GH-50184: [C++][Parquet] Avoid reading past truncated statistics values in FormatStatValue jmestwa-coder closed
2026-05-23 GH-36388: [C++][Python] Return error from MakeArrayFromScalar on offset overflow Sriniketh24 open
2026-05-21 [C++][Parquet] Saturate ApplicationVersion components instead of atoi UB rootvector2 open
2026-05-20 Add AGENTS.md — MCP Agent Instructions javierfajardo85-rgb closed
2026-05-19 GH-48801: [C++] Set CMAKE_POLICY_VERSION_MINIMUM for RapidJSON KyleFromNVIDIA closed
2026-05-19 GH-48801: [cmake] Update RapidJSON for CMake 4.0 compatibility KyleFromNVIDIA closed
2026-05-19 GH-49991: [C++][FlightRPC] Fix unity build ordering issue spotaws closed
2026-05-18 GH-49973: [C++] Fix Gandiva string length checks puneetdixit200 open
2026-05-12 GH-49967: [Python][CI] Raise oldest NumPy wheel-test requirement to a patched release arpitjain099 closed
2026-05-12 GH-46856: [C++][Python] Add binary view comparison kernels Periecle closed
2026-05-12 MINOR: [CI] Bump matlab-actions/run-tests from 3.0.0 to 3.1.1 dependabot[bot] closed
2026-05-09 GH-49955: [C++] Fix OOM vulnerability in Parquet Delta decoders sivaadityacoder open
2026-05-06 MINOR: [C++][Parquet][Doc] pointer dereference instead of dot in parquet.rst alexeyroytman open
2026-05-05 GH-49927: [Python][Parquet] Expose bloom_filter_offset and bloom_filter_length to Python in column chunk metadata haziqishere closed
2026-05-05 Fix/display bloom filter offset in column chunk meta data haziqishere closed
2026-05-05 GH-49917: [Python] Remove Py_XDECREF to avoid Use-After-Free on PyList_SetItem in SparseCSFTensorToNdarray wr-web closed
2026-04-28 GH-49875: [Python] Fix timezone dropped when converting tz-aware Categorical to Arrow array AnkitAhlawat7742 closed
2026-04-27 MINOR: [CI] Bump conda-incubator/setup-miniconda from 3.2.0 to 4.0.1 dependabot[bot] closed
2026-04-27 MINOR: [CI] Bump matlab-actions/run-tests from 3.0.0 to 3.1.0 dependabot[bot] closed
2026-04-27 GH-48344: [Python] Fix Table.from_struct_array for empty ChunkedArray 1fanwang open
2026-04-24 GH-39808: [C++][Parquet] Evict pre-buffered row-group bytes after decode justinli500 open
2026-04-24 MINOR: [R] cast() documented but not exported in NAMESPACE AnkitAhlawat7742 closed
2026-04-23 GH-43574: [Python][Parquet] do not add partition columns from file path when reading single file bkurtz closed
2026-04-22 GH-49826: [Python] Return NotImplemented from Scalar/Array arithmetic dunders for unsupported types alex-anast open
2026-04-22 GH-31318: [Python] Add fixed-offset timezones to Hypothesis test strategy alex-anast closed
2026-04-22 GH-45644: [Doc][Python] Document timezone loss when converting timestamp arrays to NumPy alex-anast closed
2026-04-21 GH-49826: [Python] Scalar arithmetic dunders return NotImplemented on unknown operand types SAY-5 open
2026-04-21 GH-49817: [C++] Reject decimal strings that exceed the target precision SAY-5 open
2026-04-21 ci: check upload quota before release jpopesculian open
2026-04-20 GH-49753: [C++][Gandiva] Fix overflow in string functions abtom87 closed
2026-04-20 MINOR: [CI] Bump matlab-actions/run-tests from 3.0.0 to 3.1 dependabot[bot] closed
2026-04-20 GH-20314: [C++] Add GCS connection pool size option azhu248 open
2026-04-20 H-49753: [C++][Gandiva] Fix overflow in string functions. abtom87 closed
2026-04-20 GH-49789: [C++] Use CMAKE_INSTALL_DOCDIR instead of static share/doc/${PROJECT_NAME} Swaroop883 closed
2026-04-19 docs: add safety warning to Array.from_buffers docstring avasis-ai closed
2026-04-19 GH-41017: [C++] Preserve ordered flag in DictionaryBuilder tinezivic open
2026-04-18 GH-49792: [Format][FlightSQL][C++] Add dialect-related SqlInfo codes tokoko open
2026-04-16 GH-49719: [C++] Rename vendored date header guards hrishikeshh-shinde closed
2026-04-16 GH-33823: [C++][IPC] Improve error messages when opening files that are the wrong format RobertLD open
2026-04-15 GH-49759: [C++][Integration] Harden BinaryView JSON parsing with runtime validation metsw24-max closed
2026-04-14 GH-49751: [Python] Add raw fd support to pa.OSFile alippai closed
2026-04-13 MINOR: [CI] Bump actions/github-script from 8 to 9 dependabot[bot] closed
2026-04-13 MINOR: [CI] Bump matlab-actions/setup-matlab from 2.7.0 to 3.0.1 dependabot[bot] closed
2026-04-09 GH-49058: [Python] Disallow non-UTF-8 bytes in custom metadata nitrajen open
2026-04-07 GH-49677 [Python][C++][Compute] Add search sorted compute kernel Alex-PLACET open
2026-04-06 GH-47435: [Python][Parquet] Add direct key encryption/decryption API smaheshwar-pltr closed
2026-04-06 MINOR: [CI] Bump matlab-actions/setup-matlab from 2.7.0 to 3.0.0 dependabot[bot] closed
2026-04-06 MINOR: [CI] Bump matlab-actions/run-tests from 2.3.1 to 3.0.0 dependabot[bot] closed
2026-04-06 MINOR: [CI] Bump docker/login-action from 4.0.0 to 4.1.0 dependabot[bot] closed
2026-04-05 GH-49433: [C++] Buffer ARROW_LOG output to prevent thread interleaving Shockp closed
2026-04-04 GH-49614: [C++] Report an error instead of silent truncation in base64_decode on invalid input Reranko05 closed
Good First Issues by Component
Good First Issues
65 issues across 9 components
Component Issues
C++ 21
Python 18
R 9
Documentation 8
MATLAB 4
Parquet 2
Archery 1
Benchmarking 1
Continuous Integration 1
Needs Champion
Needs Champion
145 items - sorted by age (newest first)
Title Type Component Created Age (days)
[R] read_ipc_stream fails to unify nested uint64 fields inside a Struct array across record batches Issue R 2026-07-02 2
[R] col_types ignored when convert_options specified Issue R 2025-11-16 230
[R] Mixed-type list columns fail with unintuitive error message Issue R 2025-08-27 311
[Python] read_csv converts strings with leading zeros to integers Issue Python, Documentation 2025-06-18 381
[Python][C++] Update type definition of npy_traits for Float16 to arrow::util::Float16 instead of uint16_t Issue C++, Python 2025-06-04 395
[CI][Crossbow][Dev] Continue generating nightlies dashboard for CI: Extra scheduled runs Issue Continuous Integration, Developer Tools 2025-06-02 397
[C++][Parquet] Make error reporting more detailed Issue Parquet, C++ 2025-05-27 403
Consider adding CITATION.cff for citation details Issue Other 2025-05-16 414
[C++] Review arrow/json headers for internal APIs Issue C++ 2025-05-15 415
[C++] Review arrow/csv headers for internal APIs Issue C++ 2025-05-15 415
[C++][Parquet] Review headers for internal APIs Issue Parquet, C++ 2025-05-15 415
[FlightSQL][Docs] Document current state of FlightSQL support Issue FlightRPC, Documentation 2025-05-07 423
[Python] Support pyarrow.Table.cast with CastOptions Issue Python 2025-04-14 446
[Python] Expose testing data generation utility Issue Python 2025-04-07 453
[C++][Docs] C++ documentation for constructors is missing Issue Documentation 2025-03-19 472
[C++] Statistics Schema Implementation Issue C++, Python, Documentation 2025-03-15 476
[C++][Compute] Add FunctionOptions::Validate Issue C++ 2025-03-13 478
[R] Don’t construct arrow_binary class vector in favor of blob::blob Issue R 2025-03-08 483
[R] creating arrow supported expressions Issue R 2025-02-05 514
[Ruby] Improve JRuby Support Issue Ruby 2025-01-21 529
[R] arrow R package: multiple replacement disclaimers for str_replace_all Issue R 2025-01-20 530
[Parquet][C++] PageIndex is useless with current API Issue Parquet, C++ 2025-01-16 534
[C++][Python] Implement pc.equal for List arguments Issue C++, Python 2025-01-04 546
[Docs] Clean up LICENSE and NOTICE files Issue Documentation 2024-11-28 583
[R] Provide helpful hints for NotImplemented kernel errors Issue R 2024-11-15 596
[C++][Python] Potential improvements around supply chain security Issue C++, Python 2024-11-09 602
[Python] Efficient way to iterate over groups Issue Python 2024-11-07 604
[R] Add Cumsum and duplicated bindings to datasets in R Issue R 2024-11-07 604
[CI][Packaging][Python] Enable BuildKit for building wheel on Windows Issue Python, Continuous Integration, Packaging 2024-11-07 604
[Dev][Archery] Use --arrow-ref instead of --arrow-sha in archery crossbow submit Issue Archery, Developer Tools 2024-10-31 611
[Format][Docs] Describe C device interface on C data interface and C stream interface docs pages Issue Documentation, Format 2024-10-26 616
Misleading error message when casting Issue Python 2024-10-25 617
[CI][C++] Use a separated Docker image for Emscripten Issue C++, Continuous Integration 2024-10-18 624
[CI][C++] Add clang-cl job Issue C++ 2024-10-11 631
[GLib] Add a sub Buffer class for GBytes based buffer Issue GLib 2024-10-09 633
[C++] Binary View Compute Kernels Issue C++ 2024-10-08 634
[R] please write unregister_scalar_function and/or make registration local/temporary Issue R 2024-10-04 638
[Python] Allow PyCapsule Interface in pyarrow.scalar constructor? Issue Python 2024-09-25 647
[R] Support integer date and time classes from data.table Issue R 2024-09-19 653
[C++][Parquet] Add support for arrow::ArrayStatistics Issue Parquet, C++ 2024-08-04 699
[R] Subtracting X days from a given date in ymd format Issue R 2024-08-01 702
[Python] Move tests that are explicitly about conversion to/from numpy on test_array.py to a separate file Issue Python 2024-08-01 702
[R] Implement anonymous functions in calls to dplyr::across Issue R 2024-07-10 724
[CI][Dev] Add shell script formatter Issue Continuous Integration, Developer Tools 2024-06-28 736
[C++] Add support for system mimalloc Issue C++ 2024-06-19 745
[Python] Get size of IPC File ahead of time Issue Python 2024-06-07 757
[Python] Conversion to/from numpy 2.0+ new StringDType Issue Python 2024-06-06 758
[Docs][Format] Move IPC format spec back into a separate page Issue Documentation, Format 2024-05-15 780
[C++][Python] Update DLPack version Issue C++, Python 2024-05-15 780
[C++][Parquet] Predicate pushdown through arrow::dataset::ScanBuilder::Filter() not available on list fields Issue Parquet, C++ 2024-05-14 781
[R] Unable to disable url-encoding Issue R 2024-05-10 785
[Python] Pyarrow fs incorrectly resolves S3 URIs with white space as a local path Issue Python 2024-04-24 801
[CI][Archery] Archery linking should also check for undefined symbols Windows Issue Archery 2024-04-03 822
[CI][Archery] Archery linking should also check for undefined symbols macOS Issue Archery 2024-04-03 822
[CI][Archery] Archery linking should also check for undefined symbols Linux Issue Archery 2024-04-03 822
[Ruby] Improve Ruby’s GC integration Issue Ruby 2024-03-29 827
[Python] Allow pyarrow import to fail without triggering Py_FatalError Issue Python 2024-03-27 829
[Parquet] Make default fallback encoding choice smarter Issue Parquet, C++ 2024-03-18 838
[Python][Docs] Max batch size for Dataset Issue Python, Documentation 2024-03-15 841
[C++] Update vendored FlatBuffers to 24 Issue C++ 2024-03-14 842
[C++] String manipulation on a dictionary column Issue C++ 2024-03-08 848
[Python] Consider splitting _lib module into several parts Issue Python 2024-02-20 865
[Docs] Add a doc section for tensor arrays Issue Documentation 2024-02-08 877
[R] Export functions for low-level pointer operations Issue R 2024-01-25 891
[R] Update the docs to show how to avoid situations like data loss with leading zero in partition column Issue R 2024-01-17 899
[C++][Python] Floordiv compute kernel Issue C++, Python 2023-12-29 918
[Python] Add timezone information when printing TimestampArray Issue R, Python 2023-12-20 927
[Python] Clean up ExtensionType.__reduce__ Issue Python 2023-12-06 941
[C++][Parquet] Parquet: support exact in Page/Row-Group level Statistics Issue Parquet, C++ 2023-11-23 954
[R] Write metadata to parquet file as argument to write_parquet() Issue R 2023-11-19 958
[R][Documentation] Document add_filename on open_dataset help page Issue R, Documentation 2023-11-18 959
[R] preserve hive partitions when opening along a path / path vector Issue R, C++, Python 2023-11-15 962
[Python] Support serialization of Arrow files on disk without the identifier “Feather” Issue Python 2023-10-30 978
[R] Inconsistent naming Issue R 2023-10-25 983
[Integration] Test non-zero offsets in C Data Interface Issue Integration 2023-10-19 989
[R][Docs] Add section on debugging S3 in the R developer docs Issue R, Documentation 2023-10-14 994
[R] open_dataset - format is unclear Issue R 2023-10-10 998
[R] expose decimal_point argument in CSVConvertOptions Issue R 2023-10-03 1005
[R] open_dataset() behavior with incorrectly quoted input data Issue R 2023-09-27 1011
[C++] Implement REE support in ArrayFromJSONString Issue C++ 2023-08-23 1046
[R] Error passing data to/from DuckDB - “NotImplemented: Call to R (SafeRecordBatchReader::ReadNext()) from a non-R thread from an unsupported context” Issue R 2023-08-22 1047
GH-36831: [C++] DictionaryArray support for MinMax Function PR C++ 2023-08-10 1059
Missing kernels for ordering with struct types Issue Python 2023-08-09 1060
[C++][Parquet] Process parquet rowgroups without Arrow conversion Issue Parquet, C++, Python 2023-05-17 1144
[C++] Why is arrow mmap marked MAP_PRIVATE (during read)? Issue C++ 2023-04-25 1166
[Python] Bindings for FixedShapeTensorType.FromTensor/ToTensor and FixedShapeTensorArray.strides Issue Python 2023-04-12 1179
[Python][Docs] Update/rearrange Data Types section and add FixedShapeTensorType Issue Python, Documentation 2023-04-12 1179
[R] Add an argument to open_csv_dataset() to repair duplicated column names or ignore them? Issue R 2023-04-07 1184
[Python] unexpected URL encoded path (white spaces) when uploading to S3 Issue Python 2023-04-05 1186
[R][Python] Expand coverage of and align R/Python to C++ CSV WriteOptions Issue R, Python 2023-03-15 1207
[Format][FlightRPC] Transfer FlightData in pieces Issue FlightRPC, Format 2023-03-07 1215
[R] Named lists cannot be serialized to a map column Issue R 2023-03-02 1220
[C++] Create the first binary aggregate function kernel to serve as an example for other implementations Issue C++ 2023-01-30 1251
[R] writing/reading a data.frame with column class ‘list’ changes column class Issue R 2023-01-19 1262
[R] read_csv_arrow()’s timestamp_parsers parameter is a bit light on documentation and doesn’t appear to do anything Issue R 2023-01-16 1265
[C++][Python] Support parsing a StringArray full of JSON to a Table Issue C++, Python 2023-01-13 1268
[R] feather round-trip support for named vectors in list columns Issue R 2022-12-19 1293
[Python] test_get_include failing in conda builds on unix Issue Python, Continuous Integration 2022-12-18 1294
[Release] Changelog.md on master branch has not been updated since 6.0.1 Issue Developer Tools, Release 2022-12-16 1296
[Dev] Comment bot embeds closes comment on code block Issue Developer Tools 2022-12-15 1297
[R] Filter operations not shown when called before summarise Issue R 2022-11-25 1317
[C++] Decide on duplicate column handling in scanner, add more tests Issue C++ 2022-11-22 1320
[C++] Add a “list_contains” kernel Issue C++ 2022-10-19 1354
[C++][Python] Allow an ExtensionType to register or implement custom casts Issue C++, Python 2022-09-29 1374
[R] Update make_date, make_datetime, ISOdate and ISOdatetime to use tz Issue R 2022-07-12 1453
“Edit this page” on docstring generated docs gives 404 Issue Documentation 2022-06-10 1485
[R] printing data in Table/RecordBatch print method Issue R, Python 2022-06-07 1488
[R] Integer overflow causes error - (in dplyr we get an NA with a warning) Issue R 2022-05-03 1523
[Archery] Add documentation for local development in archery/crossbow Issue Developer Tools 2022-04-12 1544
[Python] Version=7.0.0 introduces bug when filtering by empty set during load Issue Python 2022-03-28 1559
[R] Arrow/Parquet can’t open encrypted parquet files Issue R 2022-01-26 1620
[C++][Python] Slicing a table with no columns returns a table with incorrect length. Issue C++, Python 2022-01-22 1624
[Python][Docs] Opening a partitioned dataset with schema and filter Issue Python, Documentation 2022-01-12 1634
[C++] Name the threads in thread pools Issue C++ 2022-01-07 1639
[R] Implement bindings for stringr’s combining strings functions Issue R 2021-11-24 1683
[C++][R]Opening a multi-file dataset and writing a re-partitioned version of it fails Issue R, C++ 2021-11-17 1690
[C++][Dataset] Change scanner readahead limits to be based on bytes instead of number of batches Issue C++ 2021-11-09 1698
[C++][Dataset] Devise a mechanism to limit the total “system ram” (process + cache) used by dataset writes Issue C++ 2021-11-08 1699
[Docs] [Benchmarking] Add conbench to the benchmarking docs Issue Documentation, Benchmarking 2021-11-03 1704
[C++][R] Inconsistent application of type in Datasets via the schema Issue R, C++ 2021-10-14 1724
[R] Selecting colums while reading Parquet file with nested types can give wrong column Issue R 2021-10-11 1727
[R] Support inequality joins Issue R 2021-10-08 1730
[Python] Support other interval types Issue Python 2021-10-07 1731
[R] Empty character attributes not stored Issue R 2021-08-09 1790
[R] Throw helpful errors on bad object types in dplyr expressions Issue R 2021-03-25 1927
[Python][Dataset] The first table schema becomes a common schema for the full Dataset Issue Python, Documentation 2021-03-24 1928
[R] Support for Tensor class Issue R 2021-02-15 1965
[C++][Dataset] Provide more robust handling of comparison guarantees in the presence of implicit casts Issue C++ 2021-02-08 1972
[R] ChunkedArray$create assumes all chunks are the same type Issue R 2021-01-11 2000
[Ruby] Table#initialize examples are out of date Issue Ruby, Documentation 2020-11-14 2058
[C++][Python] .take silently overflow on list array (when casting to large_list is needed) Issue C++, Python 2020-11-04 2068
[Python] Schema Evolution - Add new Field Issue Python 2020-09-08 2125
[C++][Python] pa.array raises for mixed scalar types (float16 + int) Issue Python 2020-08-21 2143
[Python] Array.__str__ shows misleading output for timestamp types with time zone set Issue C++, Python 2020-07-19 2176
[C++/Python] Kernel for SetItem(IntegerArray, values) (“replace_with_indices”) Issue C++, Python 2020-07-13 2182
[Python][Dataset] Detect and use _metadata file in a list of file paths Issue Python 2020-04-14 2272
[C++][Dataset] Handling of duplicate columns in Dataset factory and scanning Issue C++ 2020-03-25 2292
[Crossbow] Eliminate libgit2 dependency Issue Developer Tools 2020-03-12 2305
[C++][Python] ArrowIOError: Invalid Parquet file size is 0 bytes on reading from S3 Issue C++, Python 2020-02-16 2330
[Python] consistently handle conversion of all-NaN arrays across types Issue Python 2019-09-12 2487
[Python] Array equals returns incorrectly if NaNs are in arrays Issue Python 2019-07-25 2536
[C++][Python] Support ExtensionType arrays in more kernels Issue C++ 2019-07-09 2552
[Doc] Better document the Tensor classes in the prose documentation Issue C++, Python, Documentation 2019-07-04 2557
[GLib] Add support for arrow::DictionaryBuilder Issue GLib 2019-03-16 2667
[Python] Add documentation section for integrations with PyTorch, TensorFlow Issue Python, Documentation 2018-02-01 3075
Currently Failing on Main
Workflow Job Failing Since Days Broke Since
Upload R Nightly builds upload 2026-06-19 15 #48886
Crossbow Nightly Report