Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Introduction

Specifications for storing and transmitting neuronal morphology and connectivity data using the Apache Arrow data model, and models compatible with it (e.g. Apache Parquet).

About Apache Arrow

From arrow.apache.org:

Apache Arrow defines a language-independent columnar memory format for flat and nested data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. The Arrow memory format also supports zero-copy reads for lightning-fast data access without serialization overhead.

Using Apache Arrow gives neurarrow implementors access to a large ecosystem of existing software libraries across languages, as well as the ability to exchange that data between language runtimes and processes with minimal serialisation cost.

The use of standard binary formats such as parquet also allows the data to be read now and in the future without neurarrow-specific implementations.

Prior art

These software packages manage tabular neuroscience data:

These file formats describe tabular neuroscience data:

These specifications build on Apache Arrow with domain-specific schemas:

Development happens on github, and the rendered specification is at https://clbarnes.github.io/neurarrow/.