Data & methodology register

Every modeled claim traces to a source, a method, or a measured read.

The baseline is public and therefore not the differentiator. The register records what the published literature supports, what the statistical machinery guarantees, and the two data assets no public dataset contains. Sources carry their licenses and are re-verified on a fixed cadence.

Public baseline · Proprietary moat · Re-verified quarterly

The split

A public baseline anyone can read, and a record only Voltry holds.

Public datasets give a credible condition and wear baseline. That baseline is available to everyone, which is exactly why it is not the moat. The moat is two data assets that exist in no public dataset and can only be built forward.

PUBLIC BASELINE

Sets the method, the priors, and what counts as normal versus degraded.

  • Large-scale GPU reliability field studies across four hardware generations
  • Memory-specific HBM and DRAM field studies
  • Silent-corruption studies from hyperscale fleets
  • Validated survival and degradation methodology with public benchmarks
  • Distribution-free calibration with finite-sample coverage guarantees
PROPRIETARY MOAT

Two assets the public literature is missing, both compounding and un-backfillable.

  • Paired facility-power-to-outcome data, measured at the power-distribution layer
  • Per-unit, serial-level Hopper-and-newer lifetime records, bound to a permanent identity

History can only be built forward. An asset met at its third sale can never carry a born-on record, so early coverage compounds into a lead a later entrant cannot reproduce.

Field reliability studies

The lifetime arc the public record covers, and where it stops.

The public lifetime arc runs from K20X on Titan, through V100 on Summit, to MI250X on Frontier, and is extended to H100 only as statistics by Delta and the hyperscale cluster reports. There is still no public per-unit, serial-level, multi-year H100 or Blackwell lifetime dataset. The register uses each study for what it can support and nothing more.

01 load-bearing
1,056 GPUs 2.5 yr · 11.7M GPU-hr

Delta SC 2025

A100H100 (GH200)

The load-bearing baseline: memory MTBE deficit, the 512-row cap, the generation inversion, and the bathtub curve. Architecture-level findings only.

02
V100 Extreme-scale memory corruption study

Summit ICS 2024

V100

The direction of the power-to-error relationship: double-bit errors track sustained high-power modes, not higher temperature.

03
Multi-yr Survival analysis under heavy censoring

Titan SC 2020

K20X

The survival-analysis method under heavy right-censoring, and the identification of fallen-off-bus as connector fatigue. Method, not hazard rates.

04
MI250X Large-scale deployment

Frontier public arc

MI250X

Extends the public lifetime arc to a current AMD accelerator and anchors cross-vendor generality of method.

05
460M+ events 19 data centers

HBM field study ATC 2024

HBM

HBM error patterns distinct from DRAM: spatial locality, column and through-silicon-via failures, hierarchical structure. A cross-vendor memory prior.

06
Large Large-scale field study

DRAM field study 2009

DRAM

Memory errors are dominated by hard, permanent faults, and one correctable error predicts another. The basis for a monotone wear model.

07
16,384 H100 419 interruptions

Hyperscale cluster reports

H100A100

Population priors and aggregate validation targets only. Cluster-level, never a per-unit condition.

08
Fleet Hyperscale CPU and accelerator fleets

Silent-corruption studies

Cross-vendor

Corruption depends on voltage, frequency, temperature, and life cycle, and appears only after months. Justifies a measured pass or fail, never a population rate per asset.

Raw telemetry from the load-bearing study is withheld by its authors; the statistics and pipeline are public under a permissive research license. Voltry records the license and access terms per source and never restates a withheld figure as if it were its own measurement.

Statistical methodology

Validated methods, public benchmarks, falsifiable metrics.

The modeled track uses methods with a long validation record and standard, checkable metrics. The point is not novelty in the statistics; it is that each method is the right tool for a monotone, censored, time-to-event problem, and that its guarantees are stated rather than assumed.

  1. Per-unit signal
    Gamma and Inverse-Gaussian processes; first-passage time A distribution over time-to-threshold from a measured monotone wear signal.
    Wiener-process model Reserved for non-monotone signals such as thermal-margin drift that fluctuate.
  2. Population baseline
    Kaplan-Meier, Nelson-Aalen, Cox, Weibull AFT Semi- and fully-parametric hazards under right-censoring.
    Random survival forests, DeepSurv, DeepHit Competing risks across multiple failure modes where the data support them.
  3. Validation
    NASA PCoE C-MAPSS; run-to-failure datasets A public benchmark the prognostics field validates against.
  4. Band calibration keystone
    Conformal prediction; conformalized survival analysis, 2023 A finite-sample, distribution-free lower predictive bound at a stated coverage.
    Weighted conformal prediction under covariate shift Transfer error as the coverage-inflation needed to restore nominal coverage.
  5. Output

    A banded trajectory with a stated coverage guarantee and a measured transfer-error inflation.

Discrimination is reported with time-dependent concordance, calibration with the integrated Brier score and predicted-versus-observed survival curves, and coverage with empirical-versus-nominal diagnostics per generation. A band that under-covers is a failed band.

Standards on the certificate

Data sanitization
IEEE 2883-2022, verified memory clear
Device attestation
SPDM, nvtrust and NRAS, RATS-compliant
Diagnostics
DCGM run levels r1 to r4, plus extended burn-in
Power quality
IEEE 519 and IEEE 1159, at the facility layer

License hygiene and cadence

  • Every source carries its license and access terms in the register; permissive research licenses are honored and attribution is preserved.
  • Withheld raw data is never restated as a Voltry measurement. Only published statistics and pipelines are cited.
  • The register is re-scanned quarterly for new Hopper, H200, and Blackwell field data, new HBM3 and HBM3e studies, and new corruption datasets.
  • Calibration snapshots and transfer-error bands are revised on each re-scan, and the methodology hash on every certificate records which snapshot was in force.
The register is the canonical reference

Full source provenance, access notes, and the data-to-dimension calibration map are maintained in the Voltry Baseline Data and Methodology Register, version 2.0. Every modeled field on a certificate traces back to a row in it, and the signed evidence bundle attached to each certificate lets a third party recompute the modeled fields and reproduce the bands.