JSON Type Guide

MatrixOne Python SDK provides comprehensive support for JSON data type with SQLAlchemy-standard syntax.

Overview

The SDK’s JSON type provides:

  • SQLAlchemy Standard Syntax: Use familiar column['key'] dictionary-style access

  • Automatic Type Conversion: .astext for strings, .cast() for numbers

  • Boolean Handling: Native Python True/False support

  • Custom JSON Functions: Direct access to MatrixOne JSON functions

  • Full ORM Integration: Works seamlessly with SQLAlchemy queries

Quick Start

Basic Usage

from matrixone import Client
from matrixone.orm import declarative_base
from matrixone.sqlalchemy_ext import JSON
from sqlalchemy import Column, Integer, String

Base = declarative_base()

class Product(Base):
    __tablename__ = 'products'
    id = Column(Integer, primary_key=True)
    name = Column(String(200))
    specifications = Column(JSON)  # MatrixOne JSON type

client = Client()
client.connect(database='test')
client.create_table(Product)

Inserting JSON Data

You can insert Python dictionaries directly - they’re automatically serialized:

# Using client.insert()
client.insert(Product, {
    'id': 1,
    'name': 'Laptop',
    'specifications': {
        'brand': 'Dell',
        'ram': 16,
        'storage': '512GB SSD',
        'active': True
    }
})

# Using client.batch_insert()
products = [
    {
        'id': 2,
        'name': 'Mouse',
        'specifications': {
            'brand': 'Logitech',
            'wireless': True,
            'price': 99.99
        }
    },
    # ... more products
]
client.batch_insert(Product, products)

SQLAlchemy Standard Syntax

Dictionary-Style Access

Use column['key'] to access JSON fields:

# Query with JSON field access
results = client.query(Product).filter(
    Product.specifications['brand'] == 'Dell'
).all()

# Nested access
results = client.query(Product).filter(
    Product.specifications['hardware']['processor'] == 'Intel i7'
).all()

Note

String values are automatically quoted in comparisons. Use .astext for explicit string extraction.

Text Extraction (.astext)

The .astext property removes JSON quotes automatically:

from sqlalchemy import select

# Extract brand as plain text (no quotes)
stmt = select(
    Product.name,
    Product.specifications['brand'].astext.label('brand')
)

results = client.execute(stmt).fetchall()
for row in results:
    print(f"{row.name}: {row.brand}")  # No quotes around brand

# Use in WHERE clause
results = client.query(Product).filter(
    Product.specifications['category'].astext == 'Electronics'
).all()

Type Casting (.cast())

Cast JSON values to SQL types for numeric operations:

from sqlalchemy import Numeric

# Cast price to numeric for comparison
expensive_products = client.query(Product).filter(
    Product.specifications['price'].cast(Numeric) > 500
).all()

# Use in SELECT
stmt = select(
    Product.name,
    Product.specifications['price'].cast(Numeric).label('price')
).order_by(
    Product.specifications['price'].cast(Numeric).desc()
)

Boolean Handling

Work with JSON boolean values naturally:

# Use Python True/False directly (recommended)
active_products = client.query(Product).filter(
    Product.specifications['active'] == True
).all()

inactive_products = client.query(Product).filter(
    Product.specifications['active'] == False
).all()

# Or use string literals
verified = client.query(Product).filter(
    Product.specifications['verified'] == 'true'
).all()

# Use .asbool for explicit boolean conversion
stmt = select(
    Product.name,
    Product.specifications['active'].asbool.label('is_active')
)

Complete Example

from sqlalchemy import Numeric, select

# Complex query with multiple JSON operations
stmt = select(
    Product.name,
    Product.specifications['brand'].astext.label('brand'),
    Product.specifications['category'].astext.label('category'),
    Product.specifications['price'].cast(Numeric).label('price'),
    Product.specifications['active'].label('is_active')
).where(
    Product.specifications['active'] == True
).where(
    Product.specifications['price'].cast(Numeric) > 100
).order_by(
    Product.specifications['price'].cast(Numeric).desc()
)

results = client.execute(stmt).fetchall()
for row in results:
    print(f"{row.name} ({row.brand}): ${row.price}")

Custom JSON Functions

For advanced use cases, the SDK provides direct access to MatrixOne JSON functions.

json_extract

Extract values from JSON columns:

from matrixone.sqlalchemy_ext import json_extract

# Extract single field
stmt = select(
    Product.name,
    json_extract(Product.specifications, '$.brand').label('brand')
)

# Extract nested field
stmt = select(
    json_extract(Product.specifications, '$.hardware.processor')
)

# Extract with multiple paths
stmt = select(
    json_extract(Product.specifications, '$.brand', '$.model')
)

json_extract_string

Extract text without JSON quotes:

from matrixone.sqlalchemy_ext import json_extract_string

stmt = select(
    Product.name,
    json_extract_string(Product.specifications, '$.brand').label('brand')
)
# Returns: 'Dell' (not '"Dell"')

json_extract_float64

Extract numeric values:

from matrixone.sqlalchemy_ext import json_extract_float64

stmt = select(
    Product.name,
    json_extract_float64(Product.specifications, '$.price').label('price')
)

Modifying JSON Data

Use json_set, json_insert, and json_replace for updates:

from matrixone.sqlalchemy_ext import json_set, json_insert, json_replace
from sqlalchemy import update

# json_set: Update or insert
stmt = update(Product).values(
    specifications=json_set(
        Product.specifications,
        '$.ram', 32,
        '$.warranty', '3 years'
    )
).where(Product.id == 1)
client.execute(stmt)

# json_insert: Only insert new fields
stmt = update(Product).values(
    specifications=json_insert(
        Product.specifications,
        '$.warranty', '2 years'  # Only added if doesn't exist
    )
)

# json_replace: Only update existing fields
stmt = update(Product).values(
    specifications=json_replace(
        Product.specifications,
        '$.price', 999.99  # Only updated if exists
    )
)

JSON Path Expressions

MatrixOne supports standard JSON path syntax:

Path Expression

Description

'$.key'

Access object key at root level

'$.key1.key2'

Access nested object keys

'$[0]'

Access array element by index

'$.items[0]'

Access array element in object

'$[*]'

All array elements (wildcard)

'$.*'

All object values (wildcard)

'$**.key'

Recursive search for key

Examples:

from matrixone.sqlalchemy_ext import json_extract

# Array access
first_feature = json_extract(Product.specifications, '$.features[0]')

# Nested objects
cpu = json_extract(Product.specifications, '$.hardware.processor')

# Wildcards
all_values = json_extract(Product.specifications, '$.*')

Best Practices

  1. Use Standard Syntax

    Prefer column['key'] over json_extract():

    # ✅ Recommended: Standard syntax
    Product.specifications['brand'].astext
    
    # ⚠️ Less convenient: Direct function
    json_extract_string(Product.specifications, '$.brand')
    
  2. Type Conversions

    Always cast for numeric operations:

    # ✅ Correct: Cast to Numeric
    Product.specifications['price'].cast(Numeric) > 100
    
    # ❌ Wrong: String comparison
    Product.specifications['price'] > 100
    
  3. Boolean Comparisons

    Use Python booleans directly:

    # ✅ Recommended: Python boolean
    Product.specifications['active'] == True
    
    # ✅ Also works: String
    Product.specifications['active'] == 'true'
    
  4. Batch Inserts

    Use batch_insert() for multiple records:

    # ✅ Efficient: Single batch
    client.batch_insert(Product, products_list)
    
    # ❌ Inefficient: Multiple inserts
    for product in products_list:
        client.insert(Product, product)
    
  5. Index JSON Extracts

    For frequently queried JSON fields, consider extracting to regular columns:

    # Add regular columns for frequently queried fields
    class Product(Base):
        __tablename__ = 'products'
        id = Column(Integer, primary_key=True)
        name = Column(String(200))
        brand = Column(String(100))  # Extracted from JSON
        specifications = Column(JSON)  # Full data
    

Common Patterns

Filtering by Multiple JSON Conditions

from sqlalchemy import and_

results = client.query(Product).filter(
    and_(
        Product.specifications['active'] == True,
        Product.specifications['category'].astext == 'Electronics',
        Product.specifications['price'].cast(Numeric) > 500
    )
).all()

Ordering by JSON Fields

from sqlalchemy import desc

# Order by price (descending)
results = client.query(Product).order_by(
    Product.specifications['price'].cast(Numeric).desc()
).all()

Aggregating JSON Data

from sqlalchemy import func

# Count by brand
stmt = select(
    Product.specifications['brand'].astext.label('brand'),
    func.count().label('count')
).group_by(
    Product.specifications['brand']
)

results = client.execute(stmt).fetchall()

Checking for NULL or Missing Keys

from sqlalchemy import and_, or_

# Find products where warranty is NULL or missing
results = client.query(Product).filter(
    or_(
        Product.specifications['warranty'].is_(None),
        json_extract(Product.specifications, '$.warranty').is_(None)
    )
).all()

Troubleshooting

Common Issues

Issue: String comparisons not working

# ❌ May not work as expected
Product.specifications['brand'] == 'Dell'

Solution: JSON strings include quotes. Use .astext or let the SDK handle it:

# ✅ Automatic quote handling
Product.specifications['brand'] == 'Dell'  # SDK handles quotes

# ✅ Or explicit with astext
Product.specifications['brand'].astext == 'Dell'

Issue: Numeric comparisons not working

# ❌ String comparison
Product.specifications['price'] > 100

Solution: Cast to numeric type:

# ✅ Numeric comparison
Product.specifications['price'].cast(Numeric) > 100

Issue: Boolean values not matching

# May not work consistently
Product.specifications['active'] == 1

Solution: Use Python boolean or explicit string:

# ✅ Use Python boolean
Product.specifications['active'] == True

# ✅ Or explicit string
Product.specifications['active'] == 'true'

Performance Considerations

  1. JSON extraction is slower than regular columns

    For frequently queried fields, extract to regular columns.

  2. Use specific extraction functions

    json_extract_string and json_extract_float64 are more efficient than json_extract with casting.

  3. Batch operations

    Always use batch_insert() for multiple records.

  4. Index considerations

    MatrixOne doesn’t support direct indexing on JSON paths. Extract important fields to regular columns and index those.

API Reference

For detailed API documentation, see:

  • API Reference - Complete API reference

  • matrixone.sqlalchemy_ext.JSON - JSON type class

  • matrixone.sqlalchemy_ext.json_extract - JSON extraction functions

  • matrixone.sqlalchemy_ext.json_set - JSON modification functions

Examples

For complete working examples, see:

  • examples/example_24_query_update.py - Query and update with JSON

  • sdk_demo/test_json_standard_syntax.py - Standard syntax demonstrations

  • sdk_demo/test_json_bool_handling.py - Boolean handling examples

See Also