Issues

51

Pull Requests

30

Stack Overflow Questions

0

Mailing List messages

0

Issues opened through the last 3 months
PRs opened through the last 3 months

Open Issues

51

Issues from New Contributors

25

Open PRs

30

PRs from New Contributors

6

Open issues (new contributors highlighted)
Date Title
2026-01-29 [Python] Non-UTF-8 bytes should be disallowed in custom_metadata
2026-01-29 [Python][Packaging] Merge Windows wheels base and test base images into a single image
2026-01-28 [Dev][Python] Unused python scripts under python/scripts
2026-01-26 [Python] CSV reader returns different values in 23.0.0
2026-01-26 [Python] Wrong cast from StringArray to pandas 3 when element is None
2026-01-26 [Python][Packaging] Drop archery and docker for our Windows wheels and build on the GitHub runner directly
2026-01-26 ORC Predicate Pushdown
2026-01-25 [Python] test failures on pandas 3.0 (currently CI on 2.3.3)
2026-01-23 EHN: Add a is_castable function and/or errors=coerce option to cast
2026-01-23 [CI] Remove getattr(name) -> Any from init.pyi before next release
2026-01-23 [Python] DEFAULT option in simd_level of cli.py missing
2026-01-23 [Python] A “personal data” boolean in field metadata
2026-01-22 [Python] extract_regex and extract_regex_span only extract the first match
2026-01-09 [Python] Drop support for pandas < 2.0.0
2026-01-02 [Python][Types] Type stub improvements for better coverage with Arrow IPC and compute operations
2025-12-30 [Python] A standard field/column description in metadata
2025-12-30 [Python] Pyarrow csv reader : ability to specify a max number of rows to read
2025-12-29 [C++][Python] Incorrect results from hash_pivot_wider
2025-12-29 pyright type error when hinting pyarrow.compute functions as Callable[[ChunkedArray[Any]], ChunkedArray[Any]]
2025-12-28 ParquetDataset filtering on hive partitionned datasets accesses unrelated directories and files
2025-12-25 [PYTHON][DOC] Some PIC functions are not documented in API reference
2025-12-23 [Python] Add PyDecimal_Check(pythopn_decimal) as a ARROW_DCHECK
2025-12-13 [C++][Compute] Accept JoinOptions in binary_join to handle nulls
2025-12-12 [Python] Require numpy 2.x
2025-12-12 [Python] Cannot construct UuidArray from list of UuidScalars
2025-12-12 [CI][Python] Python install manager can’t be used on the free-threaded Win wheels test step
2025-12-11 [Python] Overflow check in datetime conversion in pa.array
2025-12-11 [Python] Handle nested field names when sanitizing table at ParquetWriter when flavor=‘spark’
2025-12-11 [Python] Implement Alphanumeric and Surrogate text in Random schema generator
2025-12-11 [Python] Automatically create child directories in pyarrow.fs.copy_files
2025-12-09 [Python] Change the behaviour of eq
2025-12-05 [Python] Support array indices in pc.list_element
2025-12-04 pa.Table.from_struct_array fails for an empty array
2025-12-04 [Python] Tests fail on Windows with spaces in username due to improper URI encoding
2025-11-28 [Python][Docs] Add pyarrow.parquet.filters_to_expression example to the User Guide
2025-11-26 [Python][GPU] Numba interop tests broken by Numba API changes
2025-11-25 [Python][Parquet] read_schema drops extension types (UUID returned as fixed_size_binary[16])
2025-11-24 [Python] Scalar inferencing doesn’t infer UUID
2025-11-21 [Python][Parquet] Expose whether a Parquet file has index pages and bloom filters
2025-11-20 [Python] UUIDScalar can’t be pickled
2025-11-19 [Python] RecordBatch.from_struct_array(array_of_lists_of_structs[i].values) needs a test
2025-11-19 [Python] Add support and wheels for Python 3.15
2025-11-18 [Parquet][Python] parquet arrow schema inconsistent for file with UUID
2025-11-17 Pyarrow does not support abfs uri syntax in Azure
2025-11-15 [Python][Parquet] RowGroupMetadata.total_byte_size computes wrong uncompressed size
2025-11-14 [Python] Add test coverage to parquet.read_table with cloud filesystems integration
2025-11-12 [Python][C++] scanner(fragment_readahead=0).to_table() hangs indefinitely.
2025-11-05 [C++][Python] list_sort - sorting individual ListScalars
2025-11-03 [Python][Packaging] Failing musllinux wheels due to new image from quay.io
2025-11-03 Table.rename_columns() drops schema metadata
2025-11-02 Using pandas Period in filters for parquet fails
Open PRs (new contributors highlighted)
Date Title
2026-01-28 GH-46008: [Python][Benchmarking] Remove unused asv benchmarking files
2026-01-26 GH-48972: [Python] Add errors=‘coerce’ parameter to pyarrow.compute.cast
2026-01-25 GH-48978: [Python] test failures on pandas 3.0 (currently CI on 2.3.3)
2026-01-23 GH-48957: [Benchmarking] Add temporal type support to benchmarks
2026-01-19 GH-43510: [PYTHON] Move NumPy specific tests to separate test file.
2026-01-16 WIP: Draft PR to test wheels
2026-01-09 GH-35830: [C++] Fix fixed-size list scalar hashing with non-zero offsets
2026-01-07 GH-33459: [C++][Python] Support step >= 1 in list_slice kernel
2026-01-06 GH-48470: [Python] Construct UuidArray from list of UuidScalars
2026-01-05 GH-48241: [Python] Scalar inferencing doesn’t infer UUID
2026-01-04 GH-48695: [Python][C++] Add max_rows parameter to CSV reader
2026-01-02 GH-46872: [C++][Python] Move Arange utility function to an Arrow C++
2025-12-27 GH-48024: [C++][Python] Preserve schema metadata in RenameColumns
2025-12-25 GH-42018: [Python][C++] Arrow sting types conversion to numpy StringDType
2025-12-23 GH-48627: [Python] Add PyDecimal_Check(pythopn_decimal) as a ARROW_DCHECK
2025-12-22 GH-32609: [Python] Add internal type system stubs (_types, error, _stubs_typing)
2025-12-22 GH-32609: [CI] Add type checking infrastructure and CI workflow for type annotations
2025-12-19 GH-48593: [C++] C++20: use standard calendar / timezone APIs
2025-12-18 GH-22081: [Python] Support reading numpy arrays with ndims > 1 into pa arrays of Lists
2025-12-14 [Python][Docs] Document file path support for IPC reader source
2025-12-11 GH-48457: [Python] Overflow check in datetime conversion in pa.array
2025-12-11 GH-48455: [Python] Handle nested field names when sanitizing table at ParquetWriter (flavor=‘spark’)
2025-12-07 GH-42018: Add numpy.StringDType support
2025-11-29 GH-36593: [Python] Add rename_columns method to pyarrow datasets
2025-11-26 GH 40849: [Python] Remove all Py_FatalError calls
2025-11-25 GH-48254: [Python][Parquet] Support extension types in read_schema
2025-11-23 GH-48231 [C++][Parquet] Add FSST encoding support for Parquet
2025-11-21 GH-48199: [Python][Parquet] Expose existing Parquet C++ metadata about index_page and bloom_filter to Python
2025-11-20 GH-48172: [Python] Add cp315 to build
2025-11-08 GH-32007 [Python] Support arithmetic on arrays and scalars

Questions in last 90 days

0

Questions without accepted answer

0

Questions with no activity

0

Stack Overflow Questions
Question Answers Accepted? Comments Days since activity
Mailing list items from the last month
Date Subject

Issues

17

Pull Requests

8

Stack Overflow Questions

1

Mailing List messages

0

Issues opened through the last 3 months
PRs opened through the last 3 months

Open Issues

17

Issues from New Contributors

5

Open PRs

8

PRs from New Contributors

1

Open issues (new contributors highlighted)
Date Title
2026-01-30 [R] Disable GCS on macos
2026-01-26 [R] Warning building docs due to unresolved references to base R functions
2026-01-26 [R] Add note to docs on validating IPC streams
2026-01-20 [R] write_to_raw is very slow
2026-01-12 [R] arrow::write_parquet error with zero-length datetimes in R 4.5.2
2026-01-12 [CI][R] Update our gcc12 job for R CI
2026-01-09 [CI][R] test-r-alpine-linux-cran fails with segmentation fault
2026-01-02 [R] “Invalid metadata$r” warning
2025-12-26 [R] Remove trace$calls %||% trace$call once rlang > 0.4.11 is released
2025-12-26 [R] Add tests for filter() and arrange() with aggregation expressions
2025-12-26 [R] Add test for slice_sample with prop = 1 edge case
2025-12-23 [R] Add tests for duplicate column names and incompatible types in joins
2025-12-08 [R] How should we suggest to folks to get our libarrow builds?
2025-12-03 [R] Cannot install Arrow 22 to RHEL due to lack of escape characters in grepl
2025-11-16 [R][Dev] Update test helpers for testthat 3.3.0
2025-11-16 [R] col_types ignored when convert_options specified
2025-11-12 [Release][R] Add verification of R binaries to release process verification script
Open PRs (new contributors highlighted)
Date Title
2026-01-30 GH-49067: [R] Disable GCS on macos
2026-01-26 GH-49000: [R] Warning building docs due to unresolved references to base R functions
2026-01-26 GH-48998: [R] Add note to docs on validating IPC streams
2026-01-26 GH-48397: [R] Update docs on how to get our libarrow builds
2026-01-14 GH-48832: [R] Fix crash with zero-length POSIXct tzone attribute
2026-01-13 Update dplyr-funcs-doc.R to fix a typo
2025-12-19 GH-48593: [C++] C++20: use standard calendar / timezone APIs
2025-12-17 GH-36193: [R] arm64 binaries for R

Questions in last 90 days

1

Questions without accepted answer

0

Questions with no activity

0

Stack Overflow Questions
Question Answers Accepted? Comments Days since activity
How to convert int to double when using arrow to read in multiple CSVs with open_dataset in R? 2 TRUE 2 84
Mailing list items from the last month
Date Subject

The content below is an LLM summary of this month’s activity on the mailing list.

  • Carquet standalone C library: Introduction of a new, pure C99 library for Parquet files designed for embedded environments where C++ or the full Arrow dependency is not feasible. Thread

  • Late materialization in Parquet: Technical discussion on improving filter pushdown by potentially decoupling Parquet-specific in-memory arrays from the Arrow specification to evaluate expressions on encoded data. Thread

  • Arrow Database Connectivity (ADBC) 22 release: Version 22 of the ADBC libraries has been released, featuring updates across the C, Go, Python, Java, and Rust subcomponents. Thread

  • Apache Arrow 23.0.0 release: The community announced the major release of version 23.0.0, which includes 336 resolved issues across the ecosystem. Thread

  • AI-generated contribution guidelines: Proposal to update contributor documentation with specific rules for AI-assisted pull requests to ensure code ownership and reduce maintainer burden. Thread

  • Formal security model for Arrow: Discussion on establishing a formal document to outline security considerations and potential attack vectors for the Arrow in-memory and IPC formats. Thread

  • Java Dataset API roadmap: Users and maintainers discussed the current experimental status and production readiness of the Java Dataset API, focusing on feature parity with C++ and future stability guarantees. Thread

  • StructArray specification clarification: A technical debate regarding how the validity buffer should be handled for non-nullable child fields within a nullable StructArray to resolve implementation ambiguities. Thread

Open Issues (and change since 1 month ago)

3695 (-218)

Open PRs (and change since 1 month ago)

342 (-2)
Issues with `good-first-issues` label
104 issues across 11 components
Component Issues
C++ 43
Python 24
R 13
Documentation 9
MATLAB 5
FlightRPC 3
(No component) 2
Parquet 2
Archery 1
Continuous Integration 1
Other 1
Issues & PRs needing a champion
58 items - sorted by age (newest first)
Title Type Component Created Age (days)
[Docs] Clean up LICENSE and NOTICE files Issue Documentation 2024-11-28 428
[R] Provide helpful hints for NotImplemented kernel errors Issue R 2024-11-15 441
[C++][Python] Potential improvements around supply chain security Issue C++, Python 2024-11-09 447
[Python] Efficient way to iterate over groups Issue Python 2024-11-07 449
[R] Add Cumsum and duplicated bindings to datasets in R Issue R 2024-11-07 449
[CI][Packaging][Python] Enable BuildKit for building wheel on Windows Issue Python, Continuous Integration, Packaging 2024-11-07 449
[Python] Enable pyarrow AzureFileSystem for Windows Issue Python, Packaging 2024-11-05 451
[Dev][Archery] Use --arrow-ref instead of --arrow-sha in archery crossbow submit Issue Archery, Developer Tools 2024-10-31 456
[Format][Docs] Describe C device interface on C data interface and C stream interface docs pages Issue Documentation, Format 2024-10-26 461
Misleading error message when casting Issue Python 2024-10-25 462
[CI][C++] Use a separated Docker image for Emscripten Issue C++, Continuous Integration 2024-10-18 469
[CI][C++] Add clang-cl job Issue C++ 2024-10-11 476
[GLib] Add a sub Buffer class for GBytes based buffer Issue GLib 2024-10-09 478
[C++] Binary View Compute Kernels Issue C++ 2024-10-08 479
[R] please write unregister_scalar_function and/or make registration local/temporary Issue R 2024-10-04 483
[R] Support integer date and time classes from data.table Issue R 2024-09-19 498
[C++][Parquet] Add support for arrow::ArrayStatistics Issue Parquet, C++ 2024-08-04 544
[R] Subtracting X days from a given date in ymd format Issue R 2024-08-01 547
[Python] Move tests that are explicitly about conversion to/from numpy on test_array.py to a separate file Issue Python 2024-08-01 547
[R] Implement anonymous functions in calls to dplyr::across Issue R 2024-07-10 569
[CI][Dev] Add shell script formatter Issue Continuous Integration, Developer Tools 2024-06-28 581
[C++] Add support for system mimalloc Issue C++ 2024-06-19 590
[Python] Get size of IPC File ahead of time Issue Python 2024-06-07 602
[Docs][Format] Move IPC format spec back into a separate page Issue Documentation, Format 2024-05-15 625
[C++][Python] Update DLPack version Issue C++, Python 2024-05-15 625
[C++][Parquet] Predicate pushdown through arrow::dataset::ScanBuilder::Filter() not available on list fields Issue Parquet, C++ 2024-05-14 626
[R] Unable to disable url-encoding Issue R 2024-05-10 630
[Python] Pyarrow fs incorrectly resolves S3 URIs with white space as a local path Issue Python 2024-04-24 646
[CI][Archery] Archery linking should also check for undefined symbols Windows Issue Archery 2024-04-03 667
[CI][Archery] Archery linking should also check for undefined symbols macOS Issue Archery 2024-04-03 667
[CI][Archery] Archery linking should also check for undefined symbols Linux Issue Archery 2024-04-03 667
[Ruby] Improve Ruby’s GC integration Issue Ruby 2024-03-29 672
[Python] Allow pyarrow import to fail without triggering Py_FatalError Issue Python 2024-03-27 674
[Parquet] Make default fallback encoding choice smarter Issue Parquet, C++ 2024-03-18 683
[Python][Docs] Max batch size for Dataset Issue Python, Documentation 2024-03-15 686
[C++] Update vendored FlatBuffers to 24 Issue C++ 2024-03-14 687
[C++] String manipulation on a dictionary column Issue C++ 2024-03-08 693
[Python] Consider splitting _lib module into several parts Issue Python 2024-02-20 710
[Docs] Add a doc section for tensor arrays Issue Documentation 2024-02-08 722
[R] Export functions for low-level pointer operations Issue R 2024-01-25 736
[R] Update the docs to show how to avoid situations like data loss with leading zero in partition column Issue R 2024-01-17 744
[R] Use correct attribution in the footer of pkgdown site Issue R 2024-01-14 747
[R] Inconsistent naming Issue R 2023-10-25 828
[R] open_dataset - format is unclear Issue R 2023-10-10 843
[R] open_dataset() behavior with incorrectly quoted input data Issue R 2023-09-27 856
Update default version in parquet.rst PR Documentation 2023-08-17 897
GH-36831: [C++] DictionaryArray support for MinMax Function PR C++ 2023-08-10 904
[C++][Parquet] Process parquet rowgroups without Arrow conversion Issue Parquet, C++, Python 2023-05-17 989
[C++] Why is arrow mmap marked MAP_PRIVATE (during read)? Issue C++ 2023-04-25 1011
[Python] unexpected URL encoded path (white spaces) when uploading to S3 Issue Python 2023-04-05 1031
[R] Arrow/Parquet can’t open encrypted parquet files Issue R 2022-01-26 1465
[Python] Array.__str__ shows misleading output for timestamp types with time zone set Issue C++, Python 2020-07-19 2021
[C++/Python] Kernel for SetItem(IntegerArray, values) (“replace_with_indices”) Issue C++, Python 2020-07-13 2027
[R] Add bindings to ConcatenateTables Issue R 2020-05-09 2092
[Crossbow] Eliminate libgit2 dependency Issue Developer Tools 2020-03-12 2150
[C++][Python] Support ExtensionType arrays in more kernels Issue C++ 2019-07-09 2397
[Doc] Better document the Tensor classes in the prose documentation Issue C++, Python, Documentation 2019-07-04 2402
[Python] Add documentation section for integrations with PyTorch, TensorFlow Issue Python, Documentation 2018-02-01 2920