Harness-engineering db-first-normal-form

First Normal Form (1NF)

install

source · Clone the upstream repo

git clone https://github.com/Intense-Visions/harness-engineering

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/Intense-Visions/harness-engineering "$T" && mkdir -p ~/.claude/skills && cp -r "$T/agents/skills/codex/db-first-normal-form" ~/.claude/skills/intense-visions-harness-engineering-db-first-normal-form-a168ae && rm -rf "$T"

manifest: agents/skills/codex/db-first-normal-form/SKILL.md

source content

First Normal Form (1NF)

Every column holds a single atomic value, no repeating groups exist, and every row is uniquely identifiable by a primary key.

When to Use

Designing new tables from scratch
Reviewing an existing schema for normalization violations
Refactoring columns that store CSV lists, JSON arrays, or delimited strings
Migrating data from spreadsheets or flat files into a relational database
Auditing tables that lack a primary key

Instructions

First Normal Form defines the baseline for a well-structured relational table. A table is in 1NF when it satisfies three rules.

Rule 1: Every Column Holds Atomic Values

Each cell must contain a single, indivisible value. No lists, no comma-separated strings, no embedded structures.

BAD -- tags crammed into one column:

CREATE TABLE articles (
  id       SERIAL PRIMARY KEY,
  title    TEXT NOT NULL,
  tags     TEXT  -- 'postgresql,normalization,database'
);

Querying "find all articles tagged postgresql" now requires

LIKE '%postgresql%'

or string splitting -- neither is indexable or correct (matches "not-postgresql" too).

GOOD -- junction table for multi-valued relationship:

CREATE TABLE articles (
  id    SERIAL PRIMARY KEY,
  title TEXT NOT NULL
);

CREATE TABLE tags (
  id   SERIAL PRIMARY KEY,
  name TEXT NOT NULL UNIQUE
);

CREATE TABLE article_tags (
  article_id INT REFERENCES articles(id),
  tag_id     INT REFERENCES tags(id),
  PRIMARY KEY (article_id, tag_id)
);

Now

SELECT a.title FROM articles a JOIN article_tags at ON a.id = at.article_id JOIN tags t ON at.tag_id = t.id WHERE t.name = 'postgresql';

is indexable, correct, and constraint-enforced.

Rule 2: No Repeating Groups

Do not model a variable-length list as numbered columns.

BAD -- phone1/phone2/phone3 pattern:

CREATE TABLE contacts (
  id     SERIAL PRIMARY KEY,
  name   TEXT NOT NULL,
  phone1 TEXT,
  phone2 TEXT,
  phone3 TEXT
);

This caps phones at three, wastes space when contacts have one, and makes queries painful (

WHERE phone1 = ? OR phone2 = ? OR phone3 = ?

GOOD -- separate rows:

CREATE TABLE contacts (
  id   SERIAL PRIMARY KEY,
  name TEXT NOT NULL
);

CREATE TABLE contact_phones (
  id         SERIAL PRIMARY KEY,
  contact_id INT REFERENCES contacts(id),
  phone      TEXT NOT NULL,
  label      TEXT  -- 'mobile', 'work', 'home'
);

Rule 3: Every Table Has a Primary Key

Every row must be uniquely identifiable. Without a primary key, the table is a bag of rows -- duplicates are undetectable and foreign keys cannot reference it.

CREATE TABLE products (
  id    SERIAL PRIMARY KEY,
  sku   TEXT NOT NULL UNIQUE,
  name  TEXT NOT NULL,
  price NUMERIC(10,2) NOT NULL CHECK (price >= 0)
);

Worked Example: E-Commerce Order Import

A CSV export contains:

order_id, customer, items("Widget,Gadget,Bolt"), quantities("2,1,5")

Step 1 -- create atomic tables:

CREATE TABLE orders (
  id       SERIAL PRIMARY KEY,
  customer TEXT NOT NULL
);

CREATE TABLE order_lines (
  id        SERIAL PRIMARY KEY,
  order_id  INT REFERENCES orders(id),
  item_name TEXT NOT NULL,
  quantity  INT NOT NULL CHECK (quantity > 0)
);

Step 2 -- load one row per item instead of one row per order.

Anti-Patterns

JSON arrays as a normalization shortcut. Storing
```
tags jsonb DEFAULT '[]'
```
avoids the junction table but makes constraint enforcement, joins, and indexing harder. Reserve JSONB for genuinely schemaless data.
TEXT columns for structured data. A
```
metadata TEXT
```
column containing key-value pairs is a schema within a schema. Define real columns or use a typed JSONB column with a CHECK constraint.
Missing primary keys. Tables without a primary key invite duplicate rows and block foreign key references. Every table gets a primary key -- no exceptions.
Splitting atoms too far. Do not split a phone number into country code, area code, and subscriber number unless you query those parts independently. Atomicity is relative to your query patterns.

PostgreSQL Specifics

PostgreSQL arrays (

TEXT[]

INT[]

) technically violate 1NF but are acceptable for specific use cases:

Tag lists that are always read and written as a whole
Full-text search vectors (
```
tsvector
```
)
GIN-indexed containment queries (
```
WHERE tags @> ARRAY['postgresql']
```
)

When you find yourself using

unnest()

to join array elements with another table, that is the signal to normalize into a junction table instead.

MySQL Callout

MySQL lacks native array types. 1NF violations in MySQL typically manifest as comma-separated strings parsed with

FIND_IN_SET()

. This is always wrong in production -- it defeats indexing, prevents constraint enforcement, and returns incorrect results for substring matches.

Details

Why 1NF Matters

Query correctness. Atomic values enable exact-match queries and proper joins. Comma-separated values require pattern matching that produces false positives.
Indexing. B-tree indexes work on atomic column values. You cannot efficiently index into a substring of a TEXT column.
Constraint enforcement. Foreign keys, UNIQUE constraints, and CHECK constraints operate on column values. Non-atomic values bypass all of them.
Aggregation.
```
COUNT
```
,
```
SUM
```
, and
```
GROUP BY
```
operate on rows. If one row contains multiple values packed into a string, aggregations are wrong without preprocessing.

The Array Exception in PostgreSQL

PostgreSQL arrays with GIN indexes support efficient containment queries:

CREATE INDEX idx_tags ON articles USING GIN (tags);
SELECT * FROM articles WHERE tags @> ARRAY['postgresql'];

This is valid when: (1) you never need to join tag values to another table, (2) you never need referential integrity on tag values, and (3) the array is small and treated as an opaque unit. When any of these conditions fails, normalize.

Real-World Case Study: Logging Platform Migration

A SaaS logging platform stored log labels as

labels TEXT

containing

"env=prod,service=api,region=us-east"

. Queries used

LIKE '%service=api%'

. After normalizing into a

log_labels(log_id, key, value)

table with a composite index on

(key, value)

, query latency dropped from 1200ms to 8ms for label-filtered searches. Storage increased 15% but query performance improved 150x.

Source

PostgreSQL DDL Basics
Date, C.J. "An Introduction to Database Systems" (8th Edition), Chapter 12
Codd, E.F. "A Relational Model of Data for Large Shared Data Banks" (1970)

Process

Read the rules and examples in this document.
Apply the three 1NF rules to your schema design, decomposing non-atomic columns and adding primary keys.
Verify your implementation by checking that every column holds a single value, no numbered column groups exist, and every table has a primary key.

Harness Integration

Type: knowledge -- this skill is a reference document, not a procedural workflow.
No tools or state -- consumed as context by other skills and agents.
related_skills: db-second-normal-form, db-third-normal-form, db-denormalization

Success Criteria

The three rules of 1NF are applied correctly to every table in the schema.
Anti-patterns listed in this document (CSV columns, repeating groups, missing keys) are identified and eliminated.