How Assay rates exchange quality
Eleven observable metrics across four dimensions, computed daily from public market data. No exchange cooperation required, no self-reported figures used as inputs, no proprietary feeds.
Summary
Assay rates crypto exchanges on the quality and integrity of their spot market data. Each day, public trade data and order book snapshots from ten venues are ingested, eleven metrics across four dimensions are computed, and a composite score and band assignment per exchange are published. Scores refresh every 24 hours.
The design is motivated by a single constraint: participants making decisions about exchange venues commit six-to-seven figures to a venue based on data the venue itself provides about itself. Existing alternatives are either self-reported (CoinMarketCap, CoinGecko), opaque (CER.live), or gated behind enterprise pricing (Kaiko). Assay fills the gap with an audit-grade, methodology-transparent layer priced for the people making those decisions.
This document describes what is measured (in full) and how scores are computed (in full). Specific threshold calibration parameters are held proprietary for the reasons set out in § What stays proprietary.
At a glance
| Exchanges covered | 10 global spot venues |
|---|---|
| Trading pairs | BTC/USDT, ETH/USDT (venue-native equivalents where applicable) |
| Data source | Public REST and WebSocket APIs only |
| Update frequency | Daily, approximately 23:30 UTC |
| Window | Trailing 24 hours (00:00-24:00 UTC) |
| Minimum history | 60 days of peer baseline calibration before publication |
The four dimensions
Each dimension answers one question a listing lead needs to answer about a venue. Together, the four dimensions cover the failure modes documented in the academic and industry literature on exchange quality.
Weights reflect relative importance to the buyer use case: volume authenticity carries the highest weight because wash trading is the single signal most likely to mislead a listing decision, and the metrics in this dimension are the hardest for an exchange to game.
The eleven metrics
Each metric is specified below with its question, rationale, inputs, score mapping, and cost-to-replicate analysis. Metrics are computed independently for each (exchange, pair, day) combination. Where a metric is not applicable to a venue, it is marked as not applicable and excluded from dimension aggregation.
M01 Trade size distribution (Benford-adjusted)#
Question
Does the distribution of trade sizes match what organic trading produces?
Rationale
Organic trading produces trade sizes that follow a power-law-like distribution with a characteristic tail. Automated or bot-driven volume tends to produce concentrations at round numbers (0.1 BTC, 1.0 BTC, 10.0 BTC) or unusually uniform size distributions. A leading-digit test against Benford's Law captures both failure modes without assuming a "normal" venue size profile.
What this catches and what it does not. M01 is sensitive to unsophisticated trade-size manipulation: round-number bias, uniform-size bot output, and arithmetically-generated streams that ignore the natural leading-digit distribution. Conformance with Benford's Law on its own does not rule out sophisticated wash trading. A counterparty-aware system that constructs synthetic trades with properly power-distributed sizes can pass this metric while still being artificial. M01 is one of three components of the volume-authenticity dimension and is read alongside M02 and M03; agreement across all three is what establishes a high dimension score.
Inputs
| Data source | Public trades, 24h rolling window |
|---|---|
| Field | Trade size in base asset |
| Outlier treatment | Winsorised at 99.9th percentile |
| Minimum sample | 1,000 trades; below this the metric is not scored for that day |
| Statistic | Pearson chi-squared against Benford expected leading-digit frequencies, normalised by sample size: χ²/N. Normalisation makes the statistic sample-size invariant; raw chi-squared scales linearly with N and would otherwise rank exchanges by trading volume rather than distribution shape. |
Score mapping
Piecewise linear in χ²/N: lower per-trade divergence maps to a higher score. Anchors:
| χ²/N | Score |
|---|---|
| 0.00 | 100 |
| 0.05 | 80 |
| 0.15 | 50 |
| 0.35 | 20 |
| 0.65 | 0 |
Cost to replicate
Matching this distribution requires generating synthetic trades with properly power-distributed sizes — a non-trivial engineering requirement. Simple automation with uniform or round sizes produces detectable leading-digit patterns.
M02 Volume-to-order-book-depth ratio#
Question
Does the reported 24-hour volume plausibly pass through the observable order book?
Rationale
An exchange reporting $1B daily BTC/USDT volume with a $50k order book has an implied book turnover rate that is physically unusual. Reference venues show a ratio of 24h volume to mean bid+ask depth within ±2% of mid that sits in a consistent empirical range. Values outside this range — in either direction — are atypical relative to the peer basket.
Inputs
| Volume | 24h USD notional, computed from own trade data — exchange-reported ticker is not used |
|---|---|
| Depth | Time-averaged ±2% book depth across 1,440 one-minute snapshots |
| Coverage threshold | The metric is not scored for that day below 50% snapshot coverage |
Score mapping
Non-monotonic in the raw ratio R: both unusually low and unusually high R are atypical, and the typical zone sits in the middle of the empirical peer-basket range. One of two metrics in the spec with this shape (the other is M10).
Cost to replicate
Matching both ends of this ratio requires maintaining real depth at all times. Deep books carry a direct cost: market-maker incentives, inventory risk, and capital that could otherwise earn yield. This is the metric most difficult to match without the underlying market fundamentals.
M03 Trade interval entropy#
Question
Are trades arriving at times consistent with Poisson-like arrival processes?
Rationale
Organic trading produces trade arrival times that follow approximately a Poisson or Hawkes process — bursty, self-exciting, with heavy-tailed inter-arrival times. Bot-driven flow often produces either suspiciously regular intervals (exact 1-second spacings, periodic patterns) or unnaturally uniform distributions. Shannon entropy of the inter-arrival distribution captures both failure modes.
Inputs
| Data source | Trade timestamps for the target pair, 24h window |
|---|---|
| Bucketing | Logarithmic: 0-100ms, 100ms-1s, 1s-10s, 10s-100s, 100s+ |
| Minimum sample | 5,000 trades; below this the metric is not scored for that day |
| Peer baseline | Empirical distribution of trade-interval entropy observed across the reference peer basket, recalibrated periodically |
Score mapping
Monotonic in the absolute z-score of entropy against the peer baseline: both abnormally regular and abnormally uniform arrival patterns map to a lower score. Typical: within 2σ of peer mean. Atypical: beyond 2σ in either direction.
Cost to replicate
Matching a Poisson-like arrival distribution requires sophisticated bot design that injects timing variability deliberately. Simpler automation tends to produce detectable regularity at the millisecond or second scale.
M04 Effective spread#
Question
What does it actually cost to trade?
Rationale
Quoted spread (best ask − best bid) can be very narrow while the venue is quiet. Effective spread — the realised cost of market orders relative to mid-price — measures actual execution cost against actual trades.
Inputs
| Mid-price reference | 1-minute order book snapshots |
|---|---|
| Trade direction | Buy/sell from feed where provided; Lee-Ready inferred otherwise |
| Aggregation | Volume-weighted average over 24h, in basis points |
| Low-volume fallback | Widen window to 7-day rolling for statistical power |
Score mapping
Monotonic in effective spread: lower bps maps to a higher score. BTC/USDT benchmarks: under 5 bps is excellent, 5-15 bps typical, above 50 bps atypical. Exact thresholds are anchored to the live peer distribution and recalibrated periodically.
Cost to replicate
Effective spread reflects actual execution cost. A low effective spread without lowering fees or subsidising market makers is not easily replicated — both substitutes carry direct costs.
M05 Order book slope (Kyle's λ proxy)#
Question
How much does the book absorb? What is the price impact of size?
Rationale
A deep book has a gradual slope: a $1M sell order moves price incrementally. A book with thin layers beyond the inside quote shows a steep slope at size. The relationship between executed size and price impact is a proxy for Kyle's lambda, computable from public order book data alone.
Inputs
| Snapshots | 1-minute frequency, full depth within ±5% of mid |
|---|---|
| Standard sizes | $10k, $100k, $1M notional on each side |
| Outlier treatment | Sizes winsorised at 99.9th percentile |
Score mapping
Monotonic in λ expressed as slippage-bps for a standard $100k trade: lower λ maps to a higher score. Typical range 10-50 bps; values above 200 bps are atypical.
Cost to replicate
Matching this requires maintaining real depth across multiple price levels. The capital cost scales with the square of the depth advertised.
M06 Quote stability and update rate#
Question
Are quotes being maintained, or is the book updating at rates disproportionate to actual trading?
Rationale
Quote update rates disproportionate to trade rates can indicate rapid cancel-and-replace cycles that make displayed depth difficult to hit in practice. High update-to-trade ratios are associated with low effective fill rates in several academic studies of exchange microstructure.
Inputs
| Order book stream | WebSocket add/cancel/modify events from the continuous WS consumer |
|---|---|
| Peer baseline | Empirical distribution of the quote-update-to-trade ratio observed across the reference peer basket, recalibrated periodically |
| Coverage threshold | If WebSocket sequence gaps cover more than 50% of the day, the metric is not scored for that day |
Score mapping
Monotonic in the absolute z-score against the peer baseline: deviations from the baseline ratio in either direction map to a lower score. Typical: within 2σ. Atypical: beyond 2σ.
Cost to replicate
A lower update ratio requires more stable quoting, which uses market-maker capital and inventory tolerance.
M07 Cross-venue price deviation#
Question
Does this venue's price track the global market?
Rationale
A venue's mid-price for BTC/USDT should track the volume-weighted global mid within a tight tolerance, given arbitrage incentives. Persistent deviations may reflect lower arbitrage activity, stale feeds, or isolated price formation on that venue.
Inputs
| Sampling | Mid-price every 60 seconds, target exchange and reference basket |
|---|---|
| Reference basket | Binance, Coinbase, Kraken, OKX (volume-weighted), excluding target if a basket member |
| Cross-rate | USD/USDT translated via Kraken USD VWAP / Binance USDT VWAP at each timestamp |
Score mapping
Monotonic in mean absolute deviation from the reference basket: lower MAD maps to a higher score. Typical: MAD under 5 bps and tail under 30 bps. Atypical: MAD above 50 bps, or persistent autocorrelation above 0.5.
Cost to replicate
Keeping prices aligned with the global market requires arbitrage linkage. Without underlying liquidity to support two-way arbitrage, alignment is difficult to maintain.
M08 Mid-price reversion dynamics#
Question
After large trades, does price revert in a way consistent with real market impact?
Rationale
In real markets, informed trades produce permanent impact and uninformed trades produce temporary impact that reverts. If trades at a venue show near-complete reversion within seconds regardless of size, this is inconsistent with the mix of informed and uninformed flow observed at reference venues.
Inputs
| Trade filter | Prints above approximately $50k notional |
|---|---|
| Mid-price series | 1-second resolution, around each large-trade event |
| Peer baseline | Empirical distribution of the full-reversion fraction observed across the reference peer basket, recalibrated periodically |
| Minimum sample | 50 large trades per day; else 7-day rolling fallback |
Score mapping
Monotonic in distance above the peer reference reversion fraction: reference venues show 20-40% full reversion, and values above that band map to lower scores. Atypical above ~70% — inconsistent with the mix of informed flow observed at the peer basket.
Cost to replicate
Matching this pattern requires trades that carry genuine price impact, which means actually moving the book and taking position risk.
M09 Funding rate / basis sanity#
What it measures: For exchanges offering perpetual futures, whether the funding rate and basis stay within no-arbitrage bounds relative to spot. Systematic drift indicates limited arbitrage activity between spot and derivatives legs.
Why it is not scored in v1: Scoring M09 consistently across all ten covered venues requires a dedicated derivatives data pipeline. Three of the current venues are spot-only (Coinbase, Kraken, Gate.io) and would always return not applicable, making cross-venue comparison misleading. M09 is computed internally and will be published in a future methodology version once derivatives coverage is consistent.
M10 Volume share vs liquidity share#
Question
Does this exchange's share of global volume match its share of global liquidity?
Rationale
A venue's share of global volume and its share of global depth should be of similar order. If a venue shows 15% of global BTC/USDT volume but 2% of global order book depth at ±2%, the ratio is unusual relative to peers.
Inputs
| Volume | 24h notional, computed from own trade data — exchange ticker is not used |
|---|---|
| Reference volume | Aggregate 24h across the peer basket, excluding target if a basket member |
| Depth | Time-averaged ±2% depth for target and peers |
Score mapping
Non-monotonic in R = vol_share / depth_share: both unusually low and unusually high R are atypical. Typical: R in [0.7, 1.5]. Atypical: outside [0.3, 5] in either direction. The other non-monotonic metric in the spec (alongside M02).
Cost to replicate
This is the measurement hardest to match without the underlying fundamentals. Matching both ends requires real volume and real depth simultaneously.
M11 Price leadership and contribution to price discovery#
Question
Does this exchange lead global price discovery or lag it?
Rationale
Venues with informed flow tend to lead price moves — price discovery happens there first and others follow. Venues with predominantly uninformed flow lag. Hasbrouck's information share and Gonzalo-Granger contribution to common factor quantify this directly. The v1 implementation uses a lighter lead-lag correlation proxy of equivalent intent.
Inputs
| Mid-price series | 1-second resolution, target + peer basket (excluding target if a member), 24h window |
|---|---|
| Peer baseline | Empirical distribution of price-leadership and information-share values observed across the reference peer basket, recalibrated periodically |
Score mapping
Conditional rather than purely monotonic: information share above 10% of global is typical for large venues, below 2% typical for small venues. A volume share above 5% combined with an information share below 1% is atypical — it suggests volume without discovery.
Cost to replicate
Matching this requires attracting informed traders, which compounds over time and is difficult to short-circuit.
Composite scoring
| Code | Dimension | Metrics | Weight |
|---|---|---|---|
| D1 | Volume authenticity | M01, M02, M03 | 30% |
| D2 | Order book quality | M04, M05, M06 | 25% |
| D3 | Price formation integrity | M07, M08 (M09 deferred) | 25% |
| D4 | Cross-venue consistency | M10, M11 | 20% |
Step 1 — Metric normalisation
Each raw metric value is converted to a 0-100 score using piecewise linear mappings defined in a versioned mapping file. Higher score means higher quality. Every score row is stamped with the mapping_version active at computation time, so historical scores remain interpretable after methodology updates.
Step 2 — Dimension scores
The dimension score is the arithmetic mean across that dimension's metrics that returned status='ok'.
Step 3 — Composite score
The composite is a weighted mean across dimensions: 0.30·D1 + 0.25·D2 + 0.25·D3 + 0.20·D4. A composite is only published when every dimension has at least one scoring metric.
Step 4 — Band assignment
| 1 | ≥ 85 |
|---|---|
| 2 | 75 - 85 |
| 3 | 60 - 75 |
| 4 | 45 - 60 |
| 5 | < 45 |
D3 (Price Formation Integrity) averages over M07 and M08 only in v1; funding rate and basis metrics applicable to venues offering perpetual futures are planned for a future methodology version.
What stays proprietary
Most calibration parameters are not published in full. The exceptions are listed first.
- M01 score-mapping anchors are public. The five-anchor piecewise-linear curve from
χ²/Nto score is set out in the M01 entry above. Published following operator review; intended as a worked example of the calibration shape and as a transparency commitment for the metric most often cited in commission discussions.
The remainder of the calibration set remains proprietary:
- Score-mapping anchors for M02 through M11. The score ranges shown qualitatively in each metric entry are illustrative; production thresholds are calibrated from observed distributions across the peer basket.
- Peer baseline calibration parameters for M03, M06, M08, and M11.
- Per-metric outlier rules beyond the published 99.9% winsorisation.
Threshold calibration uses observed distributions across a reference peer basket and is recalibrated periodically. Full calibration parameters are available to data licence customers; the baseline_version and mapping_version stamped on each score row allows any customer to reproduce any historical score exactly.
Data sources
All 10 covered exchanges are ingested via their public REST and WebSocket APIs. No authentication, no paid feeds, no exchange cooperation is required at any stage.
USD-quoted venues (Coinbase, Kraken) are handled in venue-native pairs for single-venue metrics. For cross-venue comparison, USD prices are translated via the Kraken USD VWAP / Binance USDT VWAP basis computed at each timestamp.
Limitations
Assay scores are narrow by design. They describe market data integrity, not broader aspects of an exchange's operation. The following are not measured and should not be inferred:
- Custody security or solvency
- Regulatory standing or licensing status
- User experience, customer support quality, or dispute handling
- Fiat on-ramp / off-ramp quality
- Derivatives market quality (v1 is spot-only)
- Token-level listing quality for pairs other than BTC and ETH
A high Assay score does not imply safety. A low Assay score reflects market data characteristics outside the typical range — it does not imply intent.
Funding rate and basis sanity (M09) is computed internally but not published in v1. It applies only to venues offering perpetual futures and is absent for spot-only exchanges; a consistent treatment across all covered venues requires a dedicated derivatives data pipeline, which is planned for a future release.
Version history
| Version | Notes |
|---|---|
| v0.1.1 | M09 (funding rate / basis sanity) deferred from public scoring pending derivatives data pipeline. |
| v0.1.0 | Initial release. 10 exchanges, 11 metrics, 2 pairs. |