PyFitParquet

The pyfitparquet package provides support for Garmin FIT and TCX file ETL into Apache Parquet columnar format. It is designed to be used within a conda environment (on MacOS/Linux) and is available through the conda-forge channel. To install:

conda install -c conda-forge pyfitparquet

Quickstart

From python in an activated conda environment with pyfitparquet installed, execute:

# ETL all FIT/TCX files located in <data_dir> and output 
# parquet files to a default <data_dir>/parquet directory:
from pyfitparquet import transformer
pyfitparq = transformer.PyFitParquet()
pyfitparq.data_to_parquet(data_dir="/path/to/dir")

# To ETL FIT/TCX files individually:
pyfitparq.source_to_parquet("path/to/fitfile.fit", parquet_dir='.')
pyfitparq.source_to_parquet("path/to/tcxfile.tcx", parquet_dir='.')

For a more complete code example that includes configuration changes and reading/display of parquet files see/run: example.py.

Configuration

Two configuration files are used to fine-tune ETL behavior: parquet_config.yml and mapping_config.yml. In general, these files control, respectively, the row/column structure of parquet output files, and the mapping of TCX tag names to FIT/Parquet field_names. Please see verbose comments within the configuration files themselves for greater understanding of their use.

Though the configuration files can be modified directly in-place under the $CONDA_PREFIX install tree, any re-installation of pyfitparquet will revert configuration to the default. To maintain a persistent configuration across installations, set the PYFIT_CONFIG_DIR environment variable to a directory path of your choice and place local versions of the configuration files there. These files will not be overwritten or removed on uninstall.

Note: if PYFIT_CONFIG_DIR is set, but pyfitparquet cannot find the configuration files there, it will copy in default versions of the files from the current conda pyfitparquet installation.

Command-Line Interface

To use C++ CLI executables to ETL a single FIT-file (not TCX) to Parquet-file:

fittransformer <FIT_FILE_URI> <PARQUET_FILE_URI> 

To decode a single FIT-file (not TCX) to std::cout (default functionality provided by Garmin CPP FitSDK):

fitdecoder <FIT_FILE_URI> 

To ETL a single TCX or FIT file to Parquet-file using the Python CLI interface:

python pyfitparquet/transformer.py <SOURCE_FILE_URI> [-P PARQUET_DIR] 

Development

Install From Source

The suggested method employs the top-level Makefile (this procedure does expect make installed on your system). The default make target freshly creates the pyfitenv conda environment, pip installs the local pyfitparquet package into the enviroment (implicitly building local libs and binaries from source), and finally runs default unit tests to validate the installed package. In addition to installation of the python pyfitparquet module, two C++ CLI executables are installed: fitdecoder prints the contents of a FIT file to std::cout, and fittransformer performs a FIT-to-Parquet file ETL.

Note: (1) building from source requires a C++ compiler installed on your system. (2) A true clone of the repo's FIT/TCX test data files requires Git Large File Storage (LFS) installed on your system (without LFS, only stubbed placeholder data files are cloned and execution of validation tests and examples referencing this test data will fail).

git clone https://github.com/databike-io/pyfitparquet.git
cd pyfitparquet
make

# And to perform further dev work in this shell
conda activate pyfitenv

Licenses and Attributions

  • Two licenses are provided with this project:
  • All files (CPP source and TCX schemas) statically included in this repo under directory FitCppSDK_21.47.00 are the property of Garmin, LTD, statically included solely for FitSdk lib build and version control stability, and licensed under GARMIN_FIT_SDK_LICENSE.
  • All other source code in this repo is authored by AJ Donich (or other project contributors) and, to the extent legally applicable and not the purview of upstream licenses, provided as open source under Apache License 2.0.
  • Five TCX test data files were gratefully borrowed from Chris Joakim's ggps project.
  • The pyfitparquet package has several upstream open source dependencies (Apache Arrow, Pybind11, Boost, to name a few). We are grateful for these open source APIs and acknowledge many different licenses are employed by these projects. Please see environment.yml for PyFitParquet's direct dependencies and corresponding respective repos for specific licenses.