Although not a full-featured CDF library (writing is not supported yet) like CDFpp or CDFlib, CommonDataFormat.jl offers a fast and user-friendly interface for reading CDF files, implemented entirely in Julia. It supports partial loading and multi-threaded access, enabling efficient reading of selected variables or time intervals—particularly beneficial when working with large CDF datasets.
Reading Common Data Format (CDF) files is a common task in space physics. Here we compare the performance of different implementations of CDF reader (Julia, C++, Python, etc.)
dir ="tutorials/cdf"ifisdir(dir)cd(dir)endPkg.activate(".")Pkg.resolve()Pkg.instantiate()usingCommonDataFormatusingDownloadsusingPythonCallusingChairmarksusingCairoMakieusingDataFramesusingAlgebraOfGraphics
Activating project at `~/src/juliaspacephysics.github.io/tutorials/cdf`
Project No packages added to or removed from `~/src/juliaspacephysics.github.io/tutorials/cdf/Project.toml`
Manifest No packages added to or removed from `~/src/juliaspacephysics.github.io/tutorials/cdf/Manifest.toml`
---title: CDF reader benchmarksengine: juliaabstract: Although not a full-featured CDF library (writing is not supported yet) like CDFpp or CDFlib, `CommonDataFormat.jl` offers a fast and user-friendly interface for reading CDF files, implemented entirely in Julia. It supports partial loading and multi-threaded access, enabling efficient reading of selected variables or time intervals—particularly beneficial when working with large CDF datasets.julia: exeflags: ["--threads=auto"]---Reading [Common Data Format (CDF)](https://cdf.gsfc.nasa.gov/) files is a common task in space physics. Here we compare the performance of different implementations of CDF reader (Julia, C++, Python, etc.)- [CommonDataFormat.jl](https://github.com/JuliaSpacePhysics/CommonDataFormat.jl)- [CDFpp](https://github.com/SciQLop/CDFpp)- [CDFlib](https://github.com/SciQLop/CDFlib)See [benchmarks.ipynb - CDFpp](https://github.com/SciQLop/CDFpp/blob/main/notebooks/benchmarks.ipynb) for a relative comparison of CDFpp and other CDF readers.## Feature Comparison| Feature | CommonDataFormat.jl | CDFpp | CDFlib ||---------|-------------------|-------|--------|| **Language** | Julia | C++ (Python wrappers) | Python || **Lazy loading** | ✅ Yes | ✅ Yes | ❌ No || **Partial Loading** | ✅ Yes | ❌ No | ❌ No || **Parallel loading** | ✅ Yes | Thread-safe | ❌ No || **CDF writing** | ❌ No | ✅ Yes | ✅ Yes |## Setup```{julia}dir ="tutorials/cdf"ifisdir(dir)cd(dir)endPkg.activate(".")Pkg.resolve()Pkg.instantiate()usingCommonDataFormatusingDownloadsusingPythonCallusingChairmarksusingCairoMakieusingDataFramesusingAlgebraOfGraphics``````{julia}urls = ["https://hephaistos.lpp.polytechnique.fr/data/mirrors/CDF/test_files/mms1_scm_srvy_l2_scsrvy_20190301_v2.2.0.cdf","https://lasp.colorado.edu/mms/sdc/public/about/browse/mms1/edp/fast/l2/dce/2022/11/mms1_edp_fast_l2_dce_20221110_v3.1.0.cdf","https://lasp.colorado.edu/mms/sdc/public/about/browse/mms1/fpi/fast/l2/des-dist/2022/11/mms1_fpi_fast_l2_des-dist_20221103060000_v3.4.0.cdf","https://cdaweb.gsfc.nasa.gov/pub/data/solar-orbiter/rpw/science/l3/bia-efield/2022/solo_l3_rpw-bia-efield_20220220_v03.cdf"]```Set the environment variable `CDF_BENCH_DOWNLOAD=true` before rendering if you want the tutorial to download the test files listed above automatically.```{julia}data_dir =joinpath(pwd(), "data")mkpath(data_dir)functionensure_local_copy(urls; download::Bool=false) local_files =String[]for url in urls local_path =joinpath(data_dir, basename(url))if !isfile(local_path) && downloadtryDownloads.download(url, local_path)catch err@warn"Failed to download" url exception = errendendisfile(local_path) &&push!(local_files, local_path)endreturn local_filesenddownload_remote =get(ENV, "CDF_BENCH_DOWNLOAD", "false") in ("true", "1", "yes")files =ensure_local_copy(urls; download=download_remote)ifisempty(files) && !download_remote@info"No CDF files found in $(data_dir). Set ENV[\"CDF_BENCH_DOWNLOAD\"] = \"true\" before rendering to download the test files automatically."endfiles```## Julia interface to different CDF libraries### Interface to `CommonDataFormat.jl````{julia}module JLCDFimportCommonDataFormat as CDFload(fname) = CDF.CDFDataset(fname)list_variables(fname) =load(fname) |> keysfunctionget_var_data(fname, varname=nothing) ds =load(fname) varname =@something varname keys(ds)[1]returnArray(ds[varname])endfunctionfull_load(fname) ds =load(fname)return [Array(ds[k]) for k inkeys(ds)]endend```### Interface to CDFpp```{julia}module PycdfppusingPythonCallconst pycdfpp =pyimport("pycdfpp")load(fname) =@py pycdfpp.load(fname)functionlist_variables(fname) cdf =load(fname)@pylist(cdf)endfunctionget_var_data(fname, varname=nothing) ds =load(fname) varname =@something varname pylist(ds)[0]return ds[varname].valuesendfunctionfull_load(fname) c =load(fname) [c[varname].values for varname in c]endend```### Interface to CDFlib```{julia}module CDFlibusingPythonCallconst cdflib =pyimport("cdflib")load(fname) =@py cdflib.CDF(fname)functionlist_variables(fname)@pybegin cdf = cdflib.CDF(fname) info = cdf.cdf_info() info.rVariables + info.zVariablesendendfunctionget_var_data(fname, varname=nothing) cdf =load(fname) varname =@something varname cdf.cdf_info().zVariables[0] data = cdf.varget(varname)endfunctionfull_load(fname) c =load(fname) variables =@pybegin cdf_info = c.cdf_info() cdf_info.rVariables + cdf_info.zVariablesendmap(variables) do varnameifpyconvert(Int, c.varinq(varname).Last_Rec) !=-1@py c.varget(varname)endendendend``````{julia}functionrun_benchmarks(files, libs, func; n=3, evals=4) records =map(Iterators.product(files, libs, 1:n)) do (file, lib, i) fname =basename(file) f =getfield(lib, func)f(file) # warmup GC.gc() sample =@bf(file) evals = evals (; file=fname, library=nameof(lib), task=func, time=sample.time, evals=sample.evals)endreturn recordsendconst x =:time => log10 =>"log10(Time)"const base_plt =mapping(x, color=:library) *visual(Hist)```## File open```{julia}libs = (JLCDF, Pycdfpp, CDFlib)result_open =run_benchmarks(files, libs, :load) |> DataFrame``````{julia}draw(data(result_open) * base_plt)```## Variable listingList variables names without requesting values.```{julia}jl_res = JLCDF.list_variables(files[1])cdfpp_res =pyconvert(Vector{String}, Pycdfpp.list_variables(files[1]))cdflib_res =pyconvert(Vector{String}, CDFlib.list_variables(files[1]))@assert jl_res == cdfpp_res@assert jl_res == cdflib_resresult_list =run_benchmarks(files, libs, :list_variables; evals=4) |> DataFrame``````{julia}draw(data(result_list) * base_plt)```## Variable readingWe request the first variable values. Julia is column major, while CDFpp and CDFlib are row major (C++ and Python).```{julia}jl_res =permutedims(JLCDF.get_var_data(files[1]), (2, 1))cdfpp_res =PyArray(Pycdfpp.get_var_data(files[1]))cdflib_res =PyArray(CDFlib.get_var_data(files[1]))@assert jl_res[.!isnan.(jl_res)] ≈ cdfpp_res[.!isnan.(cdfpp_res)]@assert jl_res[.!isnan.(jl_res)] ≈ cdflib_res[.!isnan.(cdflib_res)]result_read =run_benchmarks(files, libs, :get_var_data; evals=2, n=4) |> DataFrame``````{julia}draw(data(result_read) * base_plt)```## Full CDF file loadingWe request all variables values.```{julia}result_full =run_benchmarks(files, libs, :full_load; evals=2, n=2) |> DataFrame``````{julia}draw(data(result_full) * base_plt)```## Summary```{julia}result =vcat(result_open, result_list, result_read, result_full)``````{julia}plt =data(result) *mapping(layout=:task) * base_pltdraw(plt; facet=(; linkxaxes=:none, linkyaxes=:none))```## Reproducibility::: {.callout-note collapse="true"}## This tutorial was built using these direct dependencies```{julia}usingPkgPkg.status()```:::::: {.callout-note collapse="true"}## Machine and Julia version information```{julia}usingInteractiveUtils# hideversioninfo()```:::