SchemaHashing 3.0.0

SchemaHashing

Deterministic, tiered SHA-256 hashing of a relational database schema. Produces per-tier hashes that map directly to SemVer bump levels and an optional structured per-object diff for analytical use.

Supported engines: PostgreSQL, SQL Server, MySQL / MariaDB, SQLite.

Overview

An ISchemaReader reads metadata from the live catalog (information_schema / sys.* / pragma_*) into a SchemaSnapshot. SchemaHasher canonicalises that snapshot and produces four SHA-256 digests.

Each tier encodes a cumulative superset of the tier below it, so any change that flips Major also flips Minor and Patch.

Tier Contents (cumulative) Corresponding SemVer bump
Major NOT-NULL columns (name + data type), PK / FK / UNIQUE / CHECK constraints major
Minor Major facts ∪ nullable columns (name + data type + nullability + default) ∪ column defaults (all columns) ∪ triggers minor
Patch Minor facts ∪ non-PK indexes ∪ column ordinal positions patch
Overall Digest of the three tier digests

Tier semantics

Each tier is a cumulative superset of the tier below it: every fact that feeds the major-tier digest also feeds the minor-tier digest, and every fact that feeds the minor-tier digest also feeds the patch-tier digest. Three consequences follow.

First, the suggested bump is the highest-severity tier that differs. A major-severity change flips the major, minor, and patch hashes; a minor-severity change flips minor and patch but leaves major unchanged; a patch-severity change flips only patch.

Second, hash equality at a tier is a reliable negative signal. When two major digests match, no major-severity fact differs — the caller need not inspect the diff for breaking concerns. The same holds for the minor and patch tiers with respect to their severity classes.

Third, hash inequality is a prompt to inspect the structured diff. Inequality at a tier means at least one fact at that severity differs, but the hash alone does not name which. Every diff record carries a severity tag, so filtering by severity reproduces the same split that the tier hashes draw.

Hash equality contract

  • Equal major digests: no breaking change. Sufficient for callers who only need to know that the contract is stable.
  • Equal minor digests: no backwards-compatible addition beyond what the major tier already covers. Callers relying on default values, nullable columns, or triggers can rely on this equality.
  • Equal patch digests: no change at all within the facts the library tracks. Two snapshots with equal patch digests are indistinguishable to the library.
  • Unequal digest at any tier: call the diff and filter by severity to enumerate the specific changes at and above that tier's severity.

Classification rules

Columns. Adding a NOT-NULL column is a major change. A consumer that writes rows in the previous shape fails even when the column has a default, because the default does not retroactively populate data the consumer expected to omit. Adding a nullable column is a minor change: existing write paths remain valid. Removing a NOT-NULL column is major (read paths that select it break); removing a nullable column is minor. Changing the data type of a NOT-NULL column is major; width widening, narrowing, and kind changes are treated uniformly, because any type change on a required column is a potential break for consumers bound to the previous type. Changing the data type of a nullable column is minor. Flipping nullability in either direction is major: NOT-NULL to nullable breaks read paths that assumed presence; nullable to NOT-NULL breaks write paths that sent nulls. A change to a column's default value is minor — existing rows and explicit inserts are unaffected, only future defaulted inserts observe the change. A change to a column's ordinal position is patch; well-behaved callers address columns by name, and an ordinal shift only affects positional consumers.

Constraints. Adding, removing, or redefining any primary key, unique constraint, check constraint, or foreign key is a major change. These bind the shape of valid data; any change can break consumers that produced data valid under the previous shape. Constraint identity is based on schema, table, type, and column list — the constraint name is deliberately excluded from every tier's hash, so a rename-only change produces neither a hash difference nor a diff entry. Changing a constraint's column list is surfaced as a remove of the old shape plus an add of the new shape, both major; the library treats constraints as value-typed and does not attempt to infer "same constraint, different columns".

Indexes. Adding, removing, or changing the uniqueness, columns, or definition of a non-primary-key index is a patch change. Indexes affect query performance and uniqueness enforcement at the storage layer; they do not alter the static schema contract a consumer binds to. Primary-key indexes are not tracked as indexes — they live with the primary-key constraint, and changes to them are therefore major.

Triggers. Adding or removing a trigger, or changing its timing (before, after, instead-of) or event set (insert, update, delete), is a minor change. Behavior alterations are observable but do not alter the static contract. Trigger identity is based on schema, table, and name; a rename surfaces as a remove plus an add, both minor.

Composite changes on one column

Two or more differences on the same column produce one combined change record whose severity is the highest severity across the differing fields. A column that simultaneously flips nullability and changes its default is reported once at major severity; both previous and current states are attached to the record so the caller has full detail without reconciling multiple entries.

Ordinal cascades

Adding or removing a column anywhere other than the tail shifts the ordinal position of every trailing column. The diff reports one added or removed entry at the appropriate severity plus one patch-severity modified entry per trailing column. This is exact, not a false positive — the ordinal positions have in fact changed. Consumers that do not track ordinal position can filter those entries out by severity.

Known limitations

The PostgreSQL reader derives trigger identity and body text from the information schema, which stores only the trigger's invoked expression (the call to its function), not the function body itself. A change confined entirely to a trigger function's body does not flow through to a trigger-level diff. Surfacing such a change requires altering the trigger itself — its timing, its event set, or the function it invokes — or tracking functions as first-class facts outside this library.

Cross-engine hash comparison is not supported. Type formatting, check-clause normalization, and default-value expressions are canonicalized by the source engine's catalogs and differ between engines; hashes are stable within an engine only.

Repository layout

Project Purpose
SchemaHashing.Package The library. Published to NuGet as SchemaHashing.
SchemaHashing.Examples.RestApi Minimal API demonstrating hashing and comparison endpoints over two Postgres servers.
SchemaHashing.Tests.Unit Unit tests (xUnit, NSubstitute).
SchemaHashing.Tests.Integration Integration tests (Testcontainers) covering the classification matrix.

Installation

dotnet add package SchemaHashing

The package depends only on System.Data.Common and Dapper. The corresponding ADO.NET provider must be installed separately:

dotnet add package Npgsql                    # Postgres
dotnet add package Microsoft.Data.SqlClient  # SQL Server
dotnet add package MySqlConnector            # MySQL / MariaDB
dotnet add package Microsoft.Data.Sqlite     # SQLite

Usage

Hashing a snapshot

using SchemaHashing;
using Microsoft.Extensions.DependencyInjection;

var services = new ServiceCollection();
services.AddSchemaHashing(DatabaseType.Postgres);

var reader = services.BuildServiceProvider().GetRequiredService<ISchemaReader>();

await using var conn = new NpgsqlConnection("Host=localhost;Database=app;...");
await conn.OpenAsync();

SchemaSnapshot snapshot = await reader.ReadAsync(conn);
SchemaHashes   hashes   = SchemaHasher.Hash(snapshot);

SchemaHashes contains four hex digests: Major, Minor, Patch, Overall.

Comparing hashes

SchemaComparison comparison = SchemaHasher.Compare(previousHashes, currentHashes);

if (comparison.SuggestedBump == SemverBump.Major)
{
    // major-severity fact differs — call Diff for details
}

Structured diff

Diff and Analyze return per-object change records, each tagged with a Severity. Only rows whose ChangeKind is Added, Removed, or Modified are present; unchanged objects are omitted.

SchemaDiff diff = SchemaHasher.Diff(previousSnapshot, currentSnapshot);

foreach (ColumnChange c in diff.Columns.Where(x => x.Severity == Severity.Major))
{
    // c.Kind, c.Previous, c.Current, c.Severity
}

Analyze returns the hash comparison and the structured diff in a single call:

SchemaAnalysis analysis = SchemaHasher.Analyze(previousSnapshot, currentSnapshot);
// analysis.Comparison.SuggestedBump
// analysis.Diff.Columns

Diff against stored canonical JSON

SchemaHasher.GetCanonicalJson(snapshot, tier) produces a deterministic JSON projection of a single tier. Persisting that JSON allows later comparison without retaining the original snapshot:

string storedJson  = SchemaHasher.GetCanonicalJson(snapshotAtN, HashTier.Major);
string currentJson = SchemaHasher.GetCanonicalJson(currentSnapshot, HashTier.Major);
SchemaDiff diff    = SchemaHasher.Diff(storedJson, currentJson, HashTier.Major);

The JSON overload parses both inputs into a partial SchemaSnapshot and dispatches through the same diff path as the snapshot overload; output records are the same types (ColumnChange, ConstraintChange, IndexChange, TriggerChange). Facts absent from the tier's projection — e.g. triggers under HashTier.Major — cannot appear in its diff.

Design notes

  • Ordering in the canonical projection uses StringComparer.Ordinal, so hashes are independent of server collation and culture.
  • Constraint identity — both in the hash and in Diff — is (Schema, Table, Type, Columns). Constraint Name is excluded, so a rename-only change produces no hash difference and no diff entry.
  • Type length, precision, and scale are folded into the canonical DataType string:
    • PostgreSQL: PostgresSchemaReader.BuildFullType handles varchar(n), numeric(p,s), timestamp(p), arrays.
    • SQL Server: SqlServerSchemaReader.BuildFullType.
    • MySQL / MariaDB: COLUMN_TYPE is used verbatim (already canonical).
    • SQLite: the declared affinity text is used verbatim.
  • Implicit PostgreSQL NOT NULL check constraints are filtered before hashing.
  • MySQL check-constraint extraction requires MySQL 8.0.16+ or MariaDB 10.2+; earlier versions return no check constraints.
  • SQLite exposes a single schema (main). CHECK constraints are not extracted (they are embedded in the CREATE TABLE SQL text). Trigger definitions use the raw CREATE TRIGGER statement, so formatting changes flow through as patch-tier differences.
  • Cross-engine hash comparison is not supported: type formatting and check-clause text differ between engines. Hashes are stable within an engine.
  • Composite column changes (e.g. nullability flip plus default change) are reported as one ColumnChange whose Severity is the highest severity across the differing fields.
  • Adding or removing a column mid-table shifts the ordinal positions of every trailing column. The diff will report one column Added (or Removed) at the relevant severity plus one Modified entry per trailing column at Severity.Patch.

Example REST API

SchemaHashing.Examples.RestApi exposes hashing and comparison endpoints. The provided Docker Compose stack starts two PostgreSQL instances with intentionally different schemas.

Running

cd SchemaHashing.Examples.RestApi/deployment
docker compose up --build

Containers:

  • postgres-alphalocalhost:5433, baseline schema from init-alpha/*.sql
  • postgres-betalocalhost:5434, modified schema from init-beta/*.sql
  • apilocalhost:8080

Endpoints

Named databases (resolved from the pre-registered catalog):

Method Path Response
GET /databases Registered database names
GET /schema/{name}/hash SchemaHashes for the named database
GET /schema/{name}/hash/{tier}/json Canonical JSON for a tier (Major / Minor / Patch)
GET /schema/compare/{left}/{right} Per-tier change flags and suggested SemVer bump
GET /schema/compare/{left}/{right}/{tier}/diff Canonical JSON for both sides of a tier

Ad-hoc (connection supplied in the request body):

Method Path Request body
POST /schema/hash ConnectionRequest
POST /schema/hash/{tier}/json ConnectionRequest
POST /schema/compare { "left": ConnectionRequest, "right": ... }
POST /schema/compare/{tier}/diff { "left": ConnectionRequest, "right": ... }

ConnectionRequest has the shape { host, port, username, password, database }.

Example:

curl http://localhost:8080/schema/compare/alpha/beta | jq

curl -X POST http://localhost:8080/schema/hash \
  -H 'content-type: application/json' \
  -d '{"host":"localhost","port":5433,"username":"postgres","password":"postgres","database":"app"}' | jq

Configuration

Configuration is read from environment variables. DB_NAMES is a comma-separated list of logical database names; for each name NAME, the following variables are required:

DB_NAMES=alpha,beta
ALPHA_HOST=postgres-alpha
ALPHA_PORT=5432
ALPHA_USERNAME=postgres
ALPHA_PASSWORD=postgres
ALPHA_DATABASE=app
BETA_HOST=postgres-beta
BETA_PORT=5432
BETA_USERNAME=postgres
BETA_PASSWORD=postgres
BETA_DATABASE=app

If DB_NAMES is unset or empty, the application starts with an empty catalog; the POST endpoints remain available. The same variables are defined in Properties/launchSettings.json for local runs against localhost:5433 / localhost:5434.

Development

dotnet build                                               # build the solution
dotnet test --filter "FullyQualifiedName!~Integration"     # unit tests only
dotnet test                                                # all tests (Docker required)

The integration suite starts a PostgreSQL container per test class via Testcontainers and exercises a class-representative matrix (base, major, minor, patch) plus targeted PostgreSQL canonicalization tests (CHECK clause whitespace, identity-column defaults, column ordering).

License

MIT

No packages depend on SchemaHashing.

Version Downloads Last updated
3.0.0 7 05/05/2026