Claude-skill-registry CQL Type System & Schema Handling
Implement and deserialize all CQL types including primitives (int, text, timestamp, uuid, varint, decimal), collections (list, set, map), tuples, UDTs (user-defined types), and frozen types. Use when working with CQL type deserialization, schema validation, collection parsing, UDT handling, or type-correct data generation.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/cql-type-system" ~/.claude/skills/majiayu000-claude-skill-registry-cql-type-system-schema-handling && rm -rf "$T"
skills/data/cql-type-system/SKILL.mdCQL Type System & Schema Handling
This skill provides guidance on implementing Cassandra CQL type system with schema-provided deserialization.
When to Use This Skill
- Implementing CQL type deserializers
- Parsing collection types (list, set, map)
- Handling User-Defined Types (UDTs)
- Working with frozen vs non-frozen types
- Tuple deserialization
- Schema validation
- Type-correct data generation
Core Principles
Schema-Provided Deserialization
Per PRD: schema passed in, not inferred
// Schema provides type information fn deserialize_cell( data: &[u8], column_type: &CqlType, // From schema ) -> Result<CqlValue>
Never try to infer type from data alone - always use schema.
CQL Type Categories
1. Primitive Types
Fixed-Size Primitives
- 1 byte (0x00 or 0x01)boolean
- 1 byte signedtinyint
- 2 bytes signed, big-endiansmallint
- 4 bytes signed, big-endianint
- 8 bytes signed, big-endianbigint
- 4 bytes IEEE 754float
- 8 bytes IEEE 754double
- 4 bytes (days since epoch)date
- 8 bytes (nanoseconds since midnight)time
Variable-Size Primitives
/text
- UTF-8 encoded stringvarchar
- raw bytesblob
- ASCII-only stringascii
Special Primitives
/uuid
- 16 bytestimeuuid
- 4 bytes (IPv4) or 16 bytes (IPv6)inet
- variable-length big integervarint
- scale (4 bytes) + unscaled varintdecimal
- months, days, nanoseconds (3 VInts)duration
- 8 bytes (milliseconds since Unix epoch)timestamp
2. Collection Types
See collections-and-udts.md for detailed format.
Collection Format:
[4 bytes: element_count (big-endian)] [for each element:] [4 bytes: element_size (big-endian)] [bytes: element_data]
Types:
- Ordered, allows duplicateslist<T>
- Unordered, no duplicatesset<T>
- Key-value pairsmap<K,V>
3. Tuple Types
Format:
[element_1_data] [element_2_data] ...
No size prefix - elements serialized back-to-back. Each element uses its type's serialization.
4. User-Defined Types (UDTs)
Format:
[for each field in schema order:] [4 bytes: field_size (-1 for null, 0 for empty, >0 for data)] [if size > 0:] [bytes: field_data]
UDT schema defines field names and types.
5. Frozen vs Non-Frozen
Frozen types:
- Serialized as single blob
- Cannot update individual elements
- Used in primary keys
- Nested collections must be frozen
Non-frozen collections:
- Can update individual elements
- Only allowed at top level (not nested)
- Uses tombstones for deletions
Type Deserialization Patterns
Zero-Copy Pattern
use bytes::Bytes; fn deserialize_text(data: Bytes) -> Result<String> { // Zero-copy: validate UTF-8 then wrap let s = std::str::from_utf8(&data)?; Ok(s.to_string()) // Only copy if needed } fn deserialize_blob(data: Bytes) -> Result<Bytes> { // Zero-copy: just return the slice Ok(data) }
Length-Prefixed Pattern
fn deserialize_length_prefixed(data: &[u8]) -> Result<(Bytes, &[u8])> { if data.len() < 4 { return Err(Error::NotEnoughBytes); } let size = i32::from_be_bytes([data[0], data[1], data[2], data[3]]); if size < 0 { return Ok((Bytes::new(), &data[4..])); // Null } let size = size as usize; if data.len() < 4 + size { return Err(Error::NotEnoughBytes); } let value = Bytes::copy_from_slice(&data[4..4 + size]); let remaining = &data[4 + size..]; Ok((value, remaining)) }
Collection Pattern
fn deserialize_list( data: &[u8], element_type: &CqlType, ) -> Result<Vec<CqlValue>> { let count = i32::from_be_bytes([data[0], data[1], data[2], data[3]]) as usize; let mut offset = 4; let mut elements = Vec::with_capacity(count); for _ in 0..count { let (element_data, remaining) = deserialize_length_prefixed(&data[offset..])?; let element = deserialize_value(&element_data, element_type)?; elements.push(element); offset = data.len() - remaining.len(); } Ok(elements) }
Schema Handling
Schema Sources
- Statistics.db: Serialization header with column definitions
- System tables:
,system_schema.tablessystem_schema.columns - CQL schema file: For test data generation
Schema Representation
struct TableSchema { keyspace: String, table: String, partition_keys: Vec<ColumnDef>, clustering_keys: Vec<ColumnDef>, regular_columns: Vec<ColumnDef>, static_columns: Vec<ColumnDef>, } struct ColumnDef { name: String, cql_type: CqlType, } enum CqlType { // Primitives Boolean, Int, BigInt, Text, Uuid, Timestamp, // ... more primitives // Collections List(Box<CqlType>), Set(Box<CqlType>), Map(Box<CqlType>, Box<CqlType>), // Complex Tuple(Vec<CqlType>), Udt(UdtDef), // Modifiers Frozen(Box<CqlType>), }
PRD Alignment
Supports Milestone M1 (Core Reading Library):
- All CQL types including collections & UDTs
- Schema-provided deserialization (not inferred)
- Zero-copy patterns where possible
Supports Milestone M5 (Write Support):
- Type-correct serialization
- Schema validation
Common Pitfalls
1. Inferring Types
❌ Wrong: Look at data to guess type ✅ Right: Use schema to know type
2. Copying Unnecessarily
❌ Wrong:
Vec<u8> for every field
✅ Right: Bytes with zero-copy slicing
3. Ignoring Null Handling
❌ Wrong: Assume all fields present ✅ Right: Check for null (-1 size prefix)
4. Frozen Semantics
❌ Wrong: Try to update frozen collection elements ✅ Right: Replace entire frozen value
5. Nested Collections
❌ Wrong: Allow non-frozen nested collections ✅ Right: Nested collections must be frozen
Type System References
Detailed specifications in:
- cql-types-reference.md - Complete type catalog
- collections-and-udts.md - Collection and UDT formats
Testing
Generate type-correct test data:
# Use test-data-management skill for Docker-based generation cd test-data ./scripts/start-clean.sh ./scripts/generate.sh
Validate parsing against sstabledump:
sstabledump test-data/datasets/sstables/keyspace/table/*.db
Next Steps
When adding new type support:
- Add to
enumCqlType - Implement deserializer with zero-copy where possible
- Add serializer (for M5 write support)
- Create property tests with edge cases
- Generate test data with type
- Validate against sstabledump