How Assay rates exchange quality

Eleven observable metrics across four dimensions, computed daily from public market data. No exchange cooperation required, no self-reported figures used as inputs, no proprietary feeds.

Summary

Assay rates crypto exchanges on the quality and integrity of their spot market data. Each day, public trade data and order book snapshots from ten venues are ingested, eleven metrics across four dimensions are computed, and a composite score and band assignment per exchange are published. Scores refresh every 24 hours.

The design is motivated by a single constraint: participants making decisions about exchange venues commit six-to-seven figures to a venue based on data the venue itself provides about itself. Existing alternatives are either self-reported (CoinMarketCap, CoinGecko), opaque (CER.live), or gated behind enterprise pricing (Kaiko). Assay fills the gap with an audit-grade, methodology-transparent layer priced for the people making those decisions.

This document describes what is measured (in full) and how scores are computed (in full). Specific threshold calibration parameters are held proprietary for the reasons set out in § What stays proprietary.

At a glance

Exchanges covered	10 global spot venues
Trading pairs	BTC/USDT, ETH/USDT (venue-native equivalents where applicable)
Data source	Public REST and WebSocket APIs only
Update frequency	Daily, approximately 23:30 UTC
Window	Trailing 24 hours (00:00-24:00 UTC)
Minimum history	60 days of peer baseline calibration before publication

The four dimensions

Each dimension answers one question a listing lead needs to answer about a venue. Together, the four dimensions cover the failure modes documented in the academic and industry literature on exchange quality.

Volume authenticity

Does the reported trading volume follow patterns consistent with organic market activity?

M01, M02, M03

3 metrics weight 30%

Order book quality

Is the displayed liquidity reflected in observable trading at size?

M04, M05, M06

3 metrics weight 25%

Price formation integrity

Do observed trades produce price dynamics consistent with informed market participation?

M07, M08 (M09 deferred)

2 metrics weight 25%

Cross-venue consistency

Do this venue's aggregate characteristics align with its observed position in the peer basket?

M10, M11

2 metrics weight 20%

Weights reflect relative importance to the buyer use case: volume authenticity carries the highest weight because wash trading is the single signal most likely to mislead a listing decision, and the metrics in this dimension are the hardest for an exchange to game.

The eleven metrics

Each metric is specified below with its question, rationale, inputs, score mapping, and cost-to-replicate analysis. Metrics are computed independently for each (exchange, pair, day) combination. Where a metric is not applicable to a venue, it is marked as not applicable and excluded from dimension aggregation.

`M01` Trade size distribution (Benford-adjusted)#

Question

Does the distribution of trade sizes match what organic trading produces?

Rationale

Organic trading produces trade sizes that follow a power-law-like distribution with a characteristic tail. Automated or bot-driven volume tends to produce concentrations at round numbers (0.1 BTC, 1.0 BTC, 10.0 BTC) or unusually uniform size distributions. A leading-digit test against Benford's Law captures both failure modes without assuming a "normal" venue size profile.

What this catches and what it does not. M01 is sensitive to unsophisticated trade-size manipulation: round-number bias, uniform-size bot output, and arithmetically-generated streams that ignore the natural leading-digit distribution. Conformance with Benford's Law on its own does not rule out sophisticated wash trading. A counterparty-aware system that constructs synthetic trades with properly power-distributed sizes can pass this metric while still being artificial. M01 is one of three components of the volume-authenticity dimension and is read alongside M02 and M03; agreement across all three is what establishes a high dimension score.

Inputs

Data source	Public trades, 24h rolling window
Field	Trade size in base asset
Outlier treatment	Winsorised at 99.9th percentile
Minimum sample	1,000 trades; below this the metric is not scored for that day
Statistic	Pearson chi-squared against Benford expected leading-digit frequencies, normalised by sample size: `χ²/N`. Normalisation makes the statistic sample-size invariant; raw chi-squared scales linearly with N and would otherwise rank exchanges by trading volume rather than distribution shape.

Score mapping

Piecewise linear in χ²/N: lower per-trade divergence maps to a higher score. Anchors:

χ²/N	Score
0.00	100
0.05	80
0.15	50
0.35	20
0.65	0

Cost to replicate

Matching this distribution requires generating synthetic trades with properly power-distributed sizes — a non-trivial engineering requirement. Simple automation with uniform or round sizes produces detectable leading-digit patterns.

`M02` Volume-to-order-book-depth ratio#

Question

Does the reported 24-hour volume plausibly pass through the observable order book?

Rationale

An exchange reporting $1B daily BTC/USDT volume with a $50k order book has an implied book turnover rate that is physically unusual. Reference venues show a ratio of 24h volume to mean bid+ask depth within ±2% of mid that sits in a consistent empirical range. Values outside this range — in either direction — are atypical relative to the peer basket.

Inputs

Volume	24h USD notional, computed from own trade data — exchange-reported ticker is not used
Depth	Time-averaged ±2% book depth across 1,440 one-minute snapshots
Coverage threshold	The metric is not scored for that day below 50% snapshot coverage

Score mapping

Non-monotonic in the raw ratio R: both unusually low and unusually high R are atypical, and the typical zone sits in the middle of the empirical peer-basket range. One of two metrics in the spec with this shape (the other is M10).

Cost to replicate

Matching both ends of this ratio requires maintaining real depth at all times. Deep books carry a direct cost: market-maker incentives, inventory risk, and capital that could otherwise earn yield. This is the metric most difficult to match without the underlying market fundamentals.

`M03` Trade interval entropy#

Question

Are trades arriving at times consistent with Poisson-like arrival processes?

Rationale

Organic trading produces trade arrival times that follow approximately a Poisson or Hawkes process — bursty, self-exciting, with heavy-tailed inter-arrival times. Bot-driven flow often produces either suspiciously regular intervals (exact 1-second spacings, periodic patterns) or unnaturally uniform distributions. Shannon entropy of the inter-arrival distribution captures both failure modes.

Inputs

Data source	Trade timestamps for the target pair, 24h window
Bucketing	Logarithmic: 0-100ms, 100ms-1s, 1s-10s, 10s-100s, 100s+
Minimum sample	5,000 trades; below this the metric is not scored for that day
Peer baseline	Empirical distribution of trade-interval entropy observed across the reference peer basket, recalibrated periodically

Score mapping

Monotonic in the absolute z-score of entropy against the peer baseline: both abnormally regular and abnormally uniform arrival patterns map to a lower score. Typical: within 2σ of peer mean. Atypical: beyond 2σ in either direction.

Cost to replicate

Matching a Poisson-like arrival distribution requires sophisticated bot design that injects timing variability deliberately. Simpler automation tends to produce detectable regularity at the millisecond or second scale.

`M04` Effective spread#

Question

What does it actually cost to trade?

Rationale

Quoted spread (best ask − best bid) can be very narrow while the venue is quiet. Effective spread — the realised cost of market orders relative to mid-price — measures actual execution cost against actual trades.

Inputs

Mid-price reference	1-minute order book snapshots
Trade direction	Buy/sell from feed where provided; Lee-Ready inferred otherwise
Aggregation	Volume-weighted average over 24h, in basis points
Low-volume fallback	Widen window to 7-day rolling for statistical power

Score mapping

Monotonic in effective spread: lower bps maps to a higher score. BTC/USDT benchmarks: under 5 bps is excellent, 5-15 bps typical, above 50 bps atypical. Exact thresholds are anchored to the live peer distribution and recalibrated periodically.

Cost to replicate

Effective spread reflects actual execution cost. A low effective spread without lowering fees or subsidising market makers is not easily replicated — both substitutes carry direct costs.

`M05` Order book slope (Kyle's λ proxy)#

Question

How much does the book absorb? What is the price impact of size?

Rationale

A deep book has a gradual slope: a $1M sell order moves price incrementally. A book with thin layers beyond the inside quote shows a steep slope at size. The relationship between executed size and price impact is a proxy for Kyle's lambda, computable from public order book data alone.

Inputs

Snapshots	1-minute frequency, full depth within ±5% of mid
Standard sizes	$10k, $100k, $1M notional on each side
Outlier treatment	Sizes winsorised at 99.9th percentile

Score mapping

Monotonic in λ expressed as slippage-bps for a standard $100k trade: lower λ maps to a higher score. Typical range 10-50 bps; values above 200 bps are atypical.

Cost to replicate

Matching this requires maintaining real depth across multiple price levels. The capital cost scales with the square of the depth advertised.

`M06` Quote stability and update rate#

Question

Are quotes being maintained, or is the book updating at rates disproportionate to actual trading?

Rationale

Quote update rates disproportionate to trade rates can indicate rapid cancel-and-replace cycles that make displayed depth difficult to hit in practice. High update-to-trade ratios are associated with low effective fill rates in several academic studies of exchange microstructure.

Inputs

Order book stream	WebSocket add/cancel/modify events from the continuous WS consumer
Peer baseline	Empirical distribution of the quote-update-to-trade ratio observed across the reference peer basket, recalibrated periodically
Coverage threshold	If WebSocket sequence gaps cover more than 50% of the day, the metric is not scored for that day

Score mapping

Monotonic in the absolute z-score against the peer baseline: deviations from the baseline ratio in either direction map to a lower score. Typical: within 2σ. Atypical: beyond 2σ.

Cost to replicate

A lower update ratio requires more stable quoting, which uses market-maker capital and inventory tolerance.

`M07` Cross-venue price deviation#

Question

Does this venue's price track the global market?

Rationale

A venue's mid-price for BTC/USDT should track the volume-weighted global mid within a tight tolerance, given arbitrage incentives. Persistent deviations may reflect lower arbitrage activity, stale feeds, or isolated price formation on that venue.

Inputs

Sampling	Mid-price every 60 seconds, target exchange and reference basket
Reference basket	Binance, Coinbase, Kraken, OKX (volume-weighted), excluding target if a basket member
Cross-rate	USD/USDT translated via Kraken USD VWAP / Binance USDT VWAP at each timestamp

Score mapping

Monotonic in mean absolute deviation from the reference basket: lower MAD maps to a higher score. Typical: MAD under 5 bps and tail under 30 bps. Atypical: MAD above 50 bps, or persistent autocorrelation above 0.5.

Cost to replicate

Keeping prices aligned with the global market requires arbitrage linkage. Without underlying liquidity to support two-way arbitrage, alignment is difficult to maintain.

`M08` Mid-price reversion dynamics#

Question

After large trades, does price revert in a way consistent with real market impact?

Rationale

In real markets, informed trades produce permanent impact and uninformed trades produce temporary impact that reverts. If trades at a venue show near-complete reversion within seconds regardless of size, this is inconsistent with the mix of informed and uninformed flow observed at reference venues.

Inputs

Trade filter	Prints above approximately $50k notional
Mid-price series	1-second resolution, around each large-trade event
Peer baseline	Empirical distribution of the full-reversion fraction observed across the reference peer basket, recalibrated periodically
Minimum sample	50 large trades per day; else 7-day rolling fallback

Score mapping

Monotonic in distance above the peer reference reversion fraction: reference venues show 20-40% full reversion, and values above that band map to lower scores. Atypical above ~70% — inconsistent with the mix of informed flow observed at the peer basket.

Cost to replicate

Matching this pattern requires trades that carry genuine price impact, which means actually moving the book and taking position risk.

`M09` Funding rate / basis sanity#

What it measures: For exchanges offering perpetual futures, whether the funding rate and basis stay within no-arbitrage bounds relative to spot. Systematic drift indicates limited arbitrage activity between spot and derivatives legs.

Why it is not scored in v1: Scoring M09 consistently across all ten covered venues requires a dedicated derivatives data pipeline. Three of the current venues are spot-only (Coinbase, Kraken, Gate.io) and would always return not applicable, making cross-venue comparison misleading. M09 is computed internally and will be published in a future methodology version once derivatives coverage is consistent.

`M10` Volume share vs liquidity share#

Question

Does this exchange's share of global volume match its share of global liquidity?

Rationale

A venue's share of global volume and its share of global depth should be of similar order. If a venue shows 15% of global BTC/USDT volume but 2% of global order book depth at ±2%, the ratio is unusual relative to peers.

Inputs

Volume	24h notional, computed from own trade data — exchange ticker is not used
Reference volume	Aggregate 24h across the peer basket, excluding target if a basket member
Depth	Time-averaged ±2% depth for target and peers

Score mapping

Non-monotonic in R = vol_share / depth_share: both unusually low and unusually high R are atypical. Typical: R in [0.7, 1.5]. Atypical: outside [0.3, 5] in either direction. The other non-monotonic metric in the spec (alongside M02).

Cost to replicate

This is the measurement hardest to match without the underlying fundamentals. Matching both ends requires real volume and real depth simultaneously.

`M11` Price leadership and contribution to price discovery#

Question

Does this exchange lead global price discovery or lag it?

Rationale

Venues with informed flow tend to lead price moves — price discovery happens there first and others follow. Venues with predominantly uninformed flow lag. Hasbrouck's information share and Gonzalo-Granger contribution to common factor quantify this directly. The v1 implementation uses a lighter lead-lag correlation proxy of equivalent intent.

Inputs

Mid-price series	1-second resolution, target + peer basket (excluding target if a member), 24h window
Peer baseline	Empirical distribution of price-leadership and information-share values observed across the reference peer basket, recalibrated periodically

Score mapping

Conditional rather than purely monotonic: information share above 10% of global is typical for large venues, below 2% typical for small venues. A volume share above 5% combined with an information share below 1% is atypical — it suggests volume without discovery.

Cost to replicate

Matching this requires attracting informed traders, which compounds over time and is difficult to short-circuit.

Composite scoring

Code	Dimension	Metrics	Weight
D1	Volume authenticity	M01, M02, M03	30%
D2	Order book quality	M04, M05, M06	25%
D3	Price formation integrity	M07, M08 (M09 deferred)	25%
D4	Cross-venue consistency	M10, M11	20%

Step 1 — Metric normalisation

Each raw metric value is converted to a 0-100 score using piecewise linear mappings defined in a versioned mapping file. Higher score means higher quality. Every score row is stamped with the mapping_version active at computation time, so historical scores remain interpretable after methodology updates.

Step 2 — Dimension scores

The dimension score is the arithmetic mean across that dimension's metrics that returned status='ok'.

Step 3 — Composite score

The composite is a weighted mean across dimensions: 0.30·D1 + 0.25·D2 + 0.25·D3 + 0.20·D4. A composite is only published when every dimension has at least one scoring metric.

Step 4 — Band assignment

1	≥ 85
2	75 - 85
3	60 - 75
4	45 - 60
5	< 45

D3 (Price Formation Integrity) averages over M07 and M08 only in v1; funding rate and basis metrics applicable to venues offering perpetual futures are planned for a future methodology version.

What stays proprietary

Most calibration parameters are not published in full. The exceptions are listed first.

M01 score-mapping anchors are public. The five-anchor piecewise-linear curve from χ²/N to score is set out in the M01 entry above. Published following operator review; intended as a worked example of the calibration shape and as a transparency commitment for the metric most often cited in commission discussions.

The remainder of the calibration set remains proprietary:

Score-mapping anchors for M02 through M11. The score ranges shown qualitatively in each metric entry are illustrative; production thresholds are calibrated from observed distributions across the peer basket.
Peer baseline calibration parameters for M03, M06, M08, and M11.
Per-metric outlier rules beyond the published 99.9% winsorisation.

Threshold calibration uses observed distributions across a reference peer basket and is recalibrated periodically. Full calibration parameters are available to data licence customers; the baseline_version and mapping_version stamped on each score row allows any customer to reproduce any historical score exactly.

Data sources

All 10 covered exchanges are ingested via their public REST and WebSocket APIs. No authentication, no paid feeds, no exchange cooperation is required at any stage.

USD-quoted venues (Coinbase, Kraken) are handled in venue-native pairs for single-venue metrics. For cross-venue comparison, USD prices are translated via the Kraken USD VWAP / Binance USDT VWAP basis computed at each timestamp.

Limitations

Assay scores are narrow by design. They describe market data integrity, not broader aspects of an exchange's operation. The following are not measured and should not be inferred:

Custody security or solvency
Regulatory standing or licensing status
User experience, customer support quality, or dispute handling
Fiat on-ramp / off-ramp quality
Derivatives market quality (v1 is spot-only)
Token-level listing quality for pairs other than BTC and ETH

A high Assay score does not imply safety. A low Assay score reflects market data characteristics outside the typical range — it does not imply intent.

Funding rate and basis sanity (M09) is computed internally but not published in v1. It applies only to venues offering perpetual futures and is absent for spot-only exchanges; a consistent treatment across all covered venues requires a dedicated derivatives data pipeline, which is planned for a future release.

Version history

Version	Notes
v0.1.1	M09 (funding rate / basis sanity) deferred from public scoring pending derivatives data pipeline.
v0.1.0	Initial release. 10 exchanges, 11 metrics, 2 pairs.

How Assay rates exchange quality

Summary

At a glance

The four dimensions

The eleven metrics

M01 Trade size distribution (Benford-adjusted)#

Question

Rationale

Inputs

Score mapping

Cost to replicate

M02 Volume-to-order-book-depth ratio#

Question

Rationale

Inputs

Score mapping

Cost to replicate

M03 Trade interval entropy#

Question

Rationale

Inputs

Score mapping

Cost to replicate

M04 Effective spread#

Question

Rationale

Inputs

Score mapping

Cost to replicate

M05 Order book slope (Kyle's λ proxy)#

Question

Rationale

Inputs

Score mapping

Cost to replicate

M06 Quote stability and update rate#

Question

Rationale

Inputs

Score mapping

Cost to replicate

M07 Cross-venue price deviation#

Question

Rationale

Inputs

Score mapping

Cost to replicate

M08 Mid-price reversion dynamics#

Question

Rationale

Inputs

Score mapping

Cost to replicate

M09 Funding rate / basis sanity#

M10 Volume share vs liquidity share#

Question

Rationale

Inputs

Score mapping

Cost to replicate

M11 Price leadership and contribution to price discovery#

Question

Rationale

Inputs

Score mapping

Cost to replicate

Composite scoring

Step 1 — Metric normalisation

Step 2 — Dimension scores

Step 3 — Composite score

Step 4 — Band assignment

What stays proprietary

Data sources

Limitations

Version history

`M01` Trade size distribution (Benford-adjusted)#

`M02` Volume-to-order-book-depth ratio#

`M03` Trade interval entropy#

`M04` Effective spread#

`M05` Order book slope (Kyle's λ proxy)#

`M06` Quote stability and update rate#

`M07` Cross-venue price deviation#

`M08` Mid-price reversion dynamics#

`M09` Funding rate / basis sanity#

`M10` Volume share vs liquidity share#

`M11` Price leadership and contribution to price discovery#