OPEN-SOURCE SCRIPT

Volume Based Sampling [BackQuant]

984
Volume Based Sampling [BackQuant]

What this does
This indicator converts the usual time-based stream of candles into an event-based stream of “synthetic” bars that are created only when enough trading activity has occurred. You choose the activity definition:

Volume bars: create a new synthetic bar whenever the cumulative number of shares/contracts traded reaches a threshold.

Dollar bars: create a new synthetic bar whenever the cumulative traded dollar value (price × volume) reaches a threshold.

The script then keeps an internal ledger of these synthetic opens, highs, lows, closes, and volumes, and can display them as candles, plot a moving average calculated over the synthetic closes, mark each time a new sample is formed, and optionally overlay the native time-bars for comparison.

Why event-based sampling matters

Markets do not release information on a clock: activity clusters during news, opens/closes, and liquidity shocks. Event-based bars normalize for that heteroskedastic arrival of information: during active periods you get more bars (finer resolution); during quiet periods you get fewer bars (coarser resolution). Research shows this can reduce microstructure pathologies and produce series that are closer to i.i.d. and more suitable for statistical modeling and ML. In particular:

Volume and dollar bars are a common event-time alternative to time bars in quantitative research and are discussed extensively in Advances in Financial Machine Learning (AFML). These bars aim to homogenize information flow by sampling on traded size or value rather than elapsed seconds.

The Volume Clock perspective models market activity in “volume time,” showing that many intraday phenomena (volatility, liquidity shocks) are better explained when time is measured by traded volume instead of seconds.

Related market microstructure work on flow toxicity and liquidity highlights that the risk dealers face is tied to information intensity of order flow, again arguing for activity-based clocks.

How the indicator works (plain English)


Choose your bucket type
  • Volume: accumulate volume until it meets a threshold.
  • Dollar Bars: accumulate close × volume until it meets a dollar threshold.


Pick the threshold rule
Dynamic threshold: by default, the script computes a rolling statistic (mean or median) of recent activity to set the next bucket size. This adapts bar size to changing conditions (e.g., busier sessions produce more frequent synthetic bars).


Fixed threshold: optionally override with a constant target (e.g., exactly 100,000 contracts per synthetic bar, or $5,000,000 per dollar bar).

Build the synthetic bar
While a bucket fills, the script tracks:
o_s: first price of the bucket (synthetic open)
h_s: running maximum price (synthetic high)
l_s: running minimum price (synthetic low)
c_s: last price seen (synthetic close)
v_s: cumulative native volume inside the bucket
d_samples: number of native bars consumed to complete the bucket (a proxy for “how fast” the threshold filled)

Emit a new sample

Once the bucket meets/exceeds the threshold, a new synthetic bar is finalized and stored. If overflow occurs (e.g., a single native bar pushes you past the threshold by a lot), the code will emit multiple synthetic samples to account for the extra activity.

Maintain a rolling history efficiently

A ring buffer can overwrite the oldest samples when you hit your Max Stored Samples cap, keeping memory usage stable.

Compute synthetic-space statistics

The script computes an SMA over the last N synthetic closes and basic descriptors like average bars per synthetic sample, mean and standard deviation of synthetic returns, and more. These are all in event time, not clock time.

Inputs and options you will actually use

Data Settings
Sampling Method: Volume or Dollar Bars.
Rolling Lookback: window used to estimate the dynamic threshold from recent activity.
Filter: Mean or Median for the dynamic threshold. Median is more robust to spikes.
Use Fixed? / Fixed Threshold: override dynamic sizing with a constant target.
Max Stored Samples: cap on synthetic history to keep performance snappy.
Use Ring Buffer: turn on to recycle storage when at capacity.

Indicator Settings
SMA over last N samples: moving average in synthetic space. Because its index is sample count, not minutes, it adapts naturally: more updates in busy regimes, fewer in quiet regimes.

Visuals
  • Show Synthetic Bars: plot the synthetic OHLC candles.
  • Candle Color Mode:
  • Green/Red: directional close vs open
  • Volume Intensity: opacity scales with synthetic size
  • Neutral: single color
  • Adaptive: graded by how large the bucket was relative to threshold
  • Mark new samples: drop a small marker whenever a new synthetic bar prints.


Comparison & Research

Show Time Bars: overlay the native time-based candles to visually compare how the two sampling schemes differ.

How to read it, step by step

Turn on “Synthetic Bars” and optionally overlay “Time Bars.” You will see that during high-activity bursts, synthetic bars print much faster than time bars.

Watch the synthetic SMA. Crosses in synthetic space can be more meaningful because each update represents a roughly comparable amount of traded information.

Use the “Avg Bars per Sample” in the info table as a regime signal. Falling average bars per sample means activity is clustering, often coincident with higher realized volatility.

Try Dollar Bars when price varies a lot but share count does not; they normalize by dollar risk taken in each sample. Volume Bars are ideal when share count is a better proxy for information flow in your instrument.

Quant finance background and citations

Event time vs. clock time: Easley, López de Prado, and O’Hara advocate measuring intraday phenomena on a volume clock to better align sampling with information arrival. This framing helps explain volatility bursts and liquidity droughts and motivates volume-based bars.

Flow toxicity and dealer risk: The same authors show how adverse selection risk changes with the intensity and informativeness of order flow, further supporting activity-based clocks for modeling and risk management.

AFML framework: In Advances in Financial Machine Learning, event-driven bars such as volume, dollar, and imbalance bars are presented as superior sampling units for many ML tasks, yielding more stationary features and fewer microstructure distortions than fixed time bars. (Alpaca)

Practical use cases

1) Regime-aware moving averages
The synthetic SMA in event time is not fooled by quiet periods: if nothing of consequence trades, it barely updates. This can make trend filters less sensitive to calendar drift and more sensitive to true participation.

2) Breakout logic on “equal-information” samples

The script exposes simple alerts such as breakout above/below the synthetic SMA. Because each bar approximates a constant amount of activity, breakouts are conditioned on comparable informational mass, not arbitrary time buckets.

3) Volatility-adaptive backtests

If you use synthetic bars as your base data stream, most signal rules become self-paced: entry and exit opportunities accelerate in fast markets and slow down in quiet regimes, which often improves the realism of slippage and fill modeling in research pipelines (pair this indicator with strategy code downstream).

4) Regime diagnostics

Avg Bars per Sample trending down: activity is dense; expect larger realized ranges.
Return StdDev (synthetic) rising: noise or trend acceleration in event time; re-tune risk.
Interpreting the info panel
Method: your sampling choice and current threshold.
Total Samples: how many synthetic bars have been formed.
Current Vol/Dollar: how much of the next bucket is already filled.
Bars in Bucket: native bars consumed so far in the current bucket.
Avg Bars/Sample: lower means higher trading intensity.
Avg Return / Return StdDev: return stats computed over synthetic closes.
Research directions you can build from here
Imbalance and run bars

Extend beyond pure volume or dollar thresholds to imbalance bars that trigger on directional order flow imbalance (e.g., buy volume minus sell volume), as discussed in the AFML ecosystem. These often further homogenize distributional properties used in ML. alpaca.markets/learn/alternative-bars-01

Volume-time indicators

Re-compute classical indicators (RSI, MACD, Bollinger) on the synthetic stream. The premise is that signals are updated by traded information, not seconds, which may stabilize indicator behavior in heteroskedastic regimes.

Liquidity and toxicity overlays

Combine synthetic bars with proxies of flow toxicity to anticipate spread widening or volatility clustering. For instance, tag synthetic bars that surpass multiples of the threshold and test whether subsequent realized volatility is elevated.

Dollar-risk parity sampling for portfolios

Use dollar bars to align samples across assets by notional risk, enabling cleaner cross-asset features and comparability in multi-asset models (e.g., correlation studies, regime clustering). AFML discusses the benefits of event-driven sampling for cross-sectional ML feature engineering.

Microstructure feature set

Compute duration in native bars per synthetic sample, range per sample, and volume multiple of threshold as inputs to state classifiers or regime HMMs. These features are inherently activity-aware and often predictive of short-horizon volatility and trend persistence per the event-time literature. (Alpaca)

Tips for clean usage

Start with dynamic thresholds using Median over a sensible lookback to avoid outlier distortion, then move to Fixed thresholds when you know your instrument’s typical activity scale.
Compare time bars vs synthetic bars side by side to develop intuition for how your market “breathes” in activity time.
Keep Max Stored Samples reasonable for performance; the ring buffer avoids memory creep while preserving a rolling window of research-grade data.

כתב ויתור

המידע והפרסומים אינם אמורים להיות, ואינם מהווים, עצות פיננסיות, השקעות, מסחר או סוגים אחרים של עצות או המלצות שסופקו או מאושרים על ידי TradingView. קרא עוד בתנאים וההגבלות.