Java Selenium TestNG CI/CD API Testing Database Playwright Performance

Chapter 8: Schema Validation

Generating Schemas Automatically

Writing schemas by hand for large APIs is painful. A response with 30 fields and nested objects? That is an hour of typing. The good news is you can auto-generate schemas from a sample response. The bad news is auto-generated schemas are always too loose. You must tighten them.

Tools for Auto-Generation

Tool	URL	Best For
jsonschema.net	https://www.jsonschema.net/	Quick browser-based generation. Paste JSON, get schema.
quicktype	https://quicktype.io/	Generates schema + typed code (Java, TypeScript, etc.)
GenSON (Python)	pip install genson	CLI tool. Good for scripting schema generation.
json-schema-generator (npm)	npm install json-schema-generator	Node.js based. Good for JS/TS projects.

Step-by-Step: Generate and Refine

Auto-Generate Schema Workflow

Hit your API and copy a sample response (use a response that has ALL fields populated, not just the minimum)

Paste the response into jsonschema.net or quicktype

Download the generated schema

Review and tighten: add "required" arrays (auto-generators often skip this)

Add constraints: minimum, maximum, pattern, enum values

Set "additionalProperties": false if you want strict validation

Test against multiple API responses — not just the one you generated from

Commit the schema file to version control

Example: Before and After Tightening

auto-generated-loose.jsonjson

// AUTO-GENERATED — too loose, accepts almost anything
{
  "type": "object",
  "properties": {
    "id": { "type": "integer" },
    "name": { "type": "string" },
    "email": { "type": "string" },
    "status": { "type": "string" },
    "balance": { "type": "number" }
  }
}

// Problems:
// 1. No "required" — any field can be missing
// 2. No constraints — name can be empty, balance can be negative
// 3. Status accepts any string — "asdf" passes
// 4. additionalProperties defaults to true — extra fields pass

manually-tightened.jsonjson

// MANUALLY TIGHTENED — catches real bugs
{
  "type": "object",
  "required": ["id", "name", "email", "status", "balance"],
  "properties": {
    "id": {
      "type": "integer",
      "minimum": 1
    },
    "name": {
      "type": "string",
      "minLength": 1,
      "maxLength": 200
    },
    "email": {
      "type": "string",
      "format": "email"
    },
    "status": {
      "type": "string",
      "enum": ["active", "inactive", "suspended"]
    },
    "balance": {
      "type": "number",
      "minimum": 0
    }
  },
  "additionalProperties": false
}

Auto-Generated Schema

✗No required fields — missing fields pass
✗No constraints — empty strings and negatives pass
✗No enum — any random string is valid for status
✗additionalProperties: true — extra fields ignored
✗Generated in 5 seconds
✗Catches almost nothing

Manually Tightened Schema

✓All critical fields marked required
✓minLength, minimum, format constraints added
✓enum restricts status to known values
✓additionalProperties: false — catches extra fields
✓Takes 15 minutes more
✓Catches structural regressions effectively

Using GenSON from Command Line

genson-cli.shbash

# Install GenSON
pip install genson

# Generate schema from a JSON file
genson response.json > schema.json

# Generate from multiple responses (merges them)
genson response1.json response2.json response3.json > schema.json

# Pipe from curl directly
curl -s https://jsonplaceholder.typicode.com/users/1 | genson > user-schema.json

Pro move: generate the schema from 5-10 different API responses, not just one. GenSON merges them and creates a schema that covers all variations. Then add your constraints on top. This gives you the best starting point.

Never use an auto-generated schema directly in your test suite without reviewing it. It will give you false confidence — tests pass, but they catch nothing. The schema is just a starting point. The value comes from the constraints YOU add.

Q: How do you create JSON Schemas efficiently for a large API with many endpoints?

A: I use a combination of auto-generation and manual refinement. First, I hit each endpoint and save sample responses. Then I use tools like jsonschema.net or GenSON to auto-generate base schemas. The key step is manual tightening — I add required arrays, enum constraints for fields with fixed values, pattern for IDs and codes, minimum/maximum for numbers, and set additionalProperties to false. I generate from multiple response samples to cover variations. For shared sub-schemas like pagination or error responses, I use $ref to avoid duplication.

Key Point: Auto-generate schemas from sample responses using tools like jsonschema.net or GenSON, then manually tighten them with required, enum, pattern, and additionalProperties: false. The auto-generated schema is the starting point, not the final product.

Previous Up NextSchema Validation in CI/CD