Skip to content

Data format

arbitrix-core accepts a single, strict OHLCV schema. Anything that doesn't satisfy it is rejected up front by validate_ohlcv(), before the backtester sees a single bar.

Schema

Component Requirement
Index pd.DatetimeIndex, tz-aware, UTC
Index ordering monotonic increasing, no duplicates
Columns (required) open, high, low, close, volume (lowercase, numeric)
Columns (optional) spread (numeric, in points)

Any extra columns survive untouched — they're available to your strategy in prepare() and on_bar(). Useful for indicators you've precomputed offline.

Loading from disk

load_ohlcv() handles CSV and parquet:

from arbitrix_core import load_ohlcv

df = load_ohlcv("eurusd_h1.csv", time_col="datetime")
# or
df = load_ohlcv("eurusd_h1.parquet")  # parquet detected by suffix

Behaviour:

  • File suffix .parquetpd.read_parquet; everything else → pd.read_csv.
  • If the loaded frame has no DatetimeIndex, the column named by time_col (default "time") is parsed with pd.to_datetime(..., utc=True) and set as the index.
  • If the index already has tz=None, it's localised to UTC. Otherwise it's converted to UTC.
  • Rows are sorted by index (mergesort, stable). Duplicate timestamps are collapsed keeping the last row.
  • The frame is then passed through validate_ohlcv() before being returned.

Validating an in-memory DataFrame

If you've built the DataFrame yourself, validate it explicitly:

from arbitrix_core import validate_ohlcv

validate_ohlcv(df)  # raises ValueError on schema problems

Common errors and fixes

Error message Cause Fix
DataFrame index must be a DatetimeIndex Index is RangeIndex / int / object df = df.set_index(pd.to_datetime(df["time"], utc=True))
DataFrame index must be tz-aware UTC Naive timestamps or non-UTC tz df.index = df.index.tz_localize("UTC") or tz_convert("UTC")
DataFrame index must be monotonic increasing Rows out of order df = df.sort_index(kind="mergesort")
DataFrame index has duplicates Repeated timestamps df = df[~df.index.duplicated(keep="last")]
DataFrame is missing required column(s): [...] Schema mismatch Rename columns to lowercase open/high/low/close/volume

DataProvider (advanced)

arbitrix_core.DataProvider is a runtime_checkable Protocol with one method:

def get_symbol_info(self, symbol: str) -> dict | None: ...

Open-core never instantiates one — it exists so that closed Arbitrix can inject a live broker symbol-info source (point value, contract size, swap rates) into the cost model. As an open-core user you typically don't need it; pass point_overrides={"EURUSD": 10.0} to costs.configure() instead.