When I started ingesting trade data from major crypto exchanges, I made a simple assumption: if I asked for the last 24 hours of trades, I would get the last 24 hours of trades. That assumption was wrong on two venues I monitor.
Each day, my pipeline requests 24 hours of BTC trade data from each venue in the cohort and merges what comes back into that day's record. Across two weeks of measurement, most venues returned what was requested. HTX returns about 1,000 trades covering, in the median, roughly ninety minutes. KuCoin returns about 100 trades covering, in the median, less than three minutes — as short as a few minutes in some cases. The methodology that was supposed to compute trade-based quality scores across the cohort was, for these two venues, scoring ninety minutes of late-evening activity on a typical day, and a few-minute sliver of activity on the worst day I have seen.
I did not catch this for several weeks. HTX scored low across most of the dimensions I publish, and the plausible explanation was that HTX was a lower-quality venue than its peers in the cohort, which would be the methodology working as intended. The numbers were consistent week to week, which read as a stable signal rather than a measurement artifact. KuCoin scored low too, on the metrics it scored at all. I had no particular reason to look more closely. Lower-volume regional exchanges scoring below the heavyweights is the result a quality methodology is supposed to produce.
The puzzle broke the day I was pulling HTX's trade tape for an unrelated audit — I wanted to spot-check a leading-digit result — and noticed that a full day of HTX trade data was a thousand-row parquet file. A thousand rows for a venue whose published 24-hour BTC volume made that count look implausibly low. I assumed the file was corrupted. I pulled three more days. They were also a thousand rows, all clustered in the late-evening UTC window. I pulled three days of KuCoin out of curiosity and found 100-row files spanning minutes, not hours. That afternoon was when I understood that the scores I had been publishing for these two venues were not computed on the data I thought they were computed on.
The mechanism is not exotic. Most exchanges in the cohort expose a public REST endpoint that accepts a time-window parameter — typically since and until, or startTime and endTime, depending on the venue — and returns the trades that fell inside that window. Pagination handles the cases where the response would be too large: the client requests one page at a time, advances a cursor, and concatenates the pages until the cursor leaves the requested window. This is the model my pipeline was built against, and it is what most of the venues I monitor actually do.
HTX's public trade endpoint does not work this way. It accepts a size parameter that caps the response at the server-side limit (observed: 1,000 rows), and it returns the most recent trades regardless of any since or until parameter the client would have passed. The time-window parameters are not honoured. The endpoint is documented as a "history" endpoint, but the history it offers is the most recent N executions, not the most recent 24 hours of executions. KuCoin's behaviour is similar with a much smaller cap: roughly 100 trades, the most recent, no time-window honoring. The consequence is mechanical: page 2 is identical to page 1, so the fetcher correctly stops — and the entire daily sample for these two venues collapses to one short page of recent activity.
The other venues in the cohort honour the requested window. Same code path, same client, same parameters. Coinbase, Binance, and the other venues I checked in this cohort respect since and until (or their venue-specific equivalents) and paginate properly through the requested span. Two of the venues I monitor return the most recent N executions instead, and there is no parameter that overrides that behaviour.
Receipts, from one specific UTC day:
Request: trailing 24h of BTC trades, fetched at 23:30 UTC at day end.
| Venue | Pair | Endpoint | Count | First trade (UTC) | Last trade (UTC) | Span |
|---|---|---|---|---|---|---|
| Coinbase | BTC/USD | /products/{pair}/trades?start=...&end=... |
717,469 | 23:55:54 prior-day | 23:43:56 same-day | 23h 48m |
| Binance | BTC/USDT | /api/v3/aggTrades?startTime=...&endTime=... |
3,911,539 | 23:55:48 prior-day | 23:34:25 same-day | 23h 38m |
| HTX | BTC/USDT | /market/history/trade?symbol=...&size=2000 (capped at 1,000) |
1,000 | 20:32:28 same-day | 22:48:09 same-day | 2h 15m |
| KuCoin | BTC/USDT | /api/v1/market/histories?symbol=... |
100 | 23:26:26 same-day | 23:30:08 same-day | 3m 42s |
Any developer reading this can reproduce it. Issue an HTX /market/history/trade request with size set as high as the endpoint will accept, and observe that the response is capped at the server-side limit and contains only the recent tail regardless of since/until. Issue the equivalent Coinbase or Binance request with explicit start and end timestamps spanning a 24-hour window and observe pagination through the full span. There is no way to make those public REST endpoints return a full historical 24-hour trade window in the same way as the others; covering the full day on HTX or KuCoin requires a different collection method — WebSocket capture, rolling collection from a process that polls more often than the cap permits, or a third-party data source.
The implication for any exchange-quality methodology that takes a 24-hour window at face value is mechanical. If a metric consumes "the day's trades" — whether for a leading-digit test, a volume figure, an effective-spread computation, or any cross-venue comparison that uses own-volume — and the day's trades are in fact a ninety-minute or few-minute sample, the metric is computing scores on a window that is the wrong window. The number it produces is meaningful for the slice it actually saw. It is not meaningful for what the methodology page claims it is measuring. On HTX, the digit-distribution test that requires 1,000 trades to be statistically meaningful was meeting that floor with the REST page-size cap itself, not with 24 hours of organic volume. The data wasn't passing the Benford-test sample-size requirement by accumulating 1,000 real trades; it was passing by having exactly 1,000 trades, because that's what the endpoint returned. The floor and the cap were the same number.
The endpoints are not broken. They are behaving consistently with their documented purpose. My methodology was naive in assuming "give me the last 24 hours" meant "give me the last 24 hours" on every venue. The venue-specific behaviour is documented, predictable, and stable. The thing that was wrong was my assumption that the documented behaviour was uniform across the cohort.
You can see where this is going. Any methodology that takes a 24-hour window at face value across a heterogeneous cohort of venues will mismeasure the venues whose endpoints do not honour the window — and will keep doing so as long as the assumption holds. The mismeasurement is stable, which makes it look like real signal — but it's the consistency of an interface mismatch repeating itself, not the consistency of real cross-venue dispersion. It is methodology brushing past the problem and producing numbers anyway, and there is no statistical test inside the methodology itself that will surface the problem, because the test sees exactly the data the venue provided.
The fix has two parts. The first, already shipped, is to mark the trade-derived metrics on HTX and KuCoin as insufficient data rather than compute them from truncated samples. Insufficient data is not a low-quality score. It is a refusal to score. These venues no longer receive a score on the metrics that depend on having a full day of trades, and they fall out of the public rankings for the affected dimensions accordingly. Their book-derived metrics — depth, spread, order-book shape — continue to score normally, because those metrics consume order-book snapshots collected on a one-minute cadence by a different code path, and that path is not affected by the REST trade-endpoint behaviour. The second part of the fix, on the backlog and blocking my v1 launch, is a temporal-uniformity gate that runs against the actual hourly coverage of each venue's daily trade sample and gates the affected metrics whenever the sample's span is materially below 24 hours. Any future venue whose endpoint returns sub-24-hour windows will trigger the gate automatically, without me having to add a flag for it.
None of this means trade-based metrics are useless. It means a trade-based metric is only as good as its claim about the window it is computed on, and that claim has to survive contact with the venue's actual endpoint behaviour. A digit-distribution test on a clean 24-hour sample is one thing. The same test on a 90-minute sample is a different measurement entirely. The work is checking that the data you have is the data you think you have, before computing anything from it.
I wrote about the Binance leading-digit result earlier this week — how a fraud-detection test was being broken by structural clustering in trade sizes. This is the second instance of the same pattern: a metric that looked like it was measuring exchange quality was partly measuring venue-specific market structure or endpoint behaviour. The longer this project runs, the more I expect to find these cases. That is the work: checking that the data is what it claims to be before turning it into a score. The methodology I'm building, called Assay, is mostly the sum of that discipline.