MatrixOne Python SDK Documentation

Welcome to the MatrixOne Python SDK documentation!

The MatrixOne Python SDK provides a comprehensive, high-level interface for MatrixOne database operations, including SQLAlchemy-like ORM interface, vector similarity search, fulltext search, snapshot management, PITR (Point-in-Time Recovery), restore operations, table cloning, account management, pub/sub operations, and mo-ctl integration. The SDK is designed for both synchronous and asynchronous operations with full type safety and extensive documentation.

Danger

🚨 MUST READ: Column Naming Convention

Always use lowercase with underscores (snake_case) for column names!

Using camelCase will cause SELECT queries to fail. See Column Naming Conventions for details.

# ❌ userName = Column(String(50))    # Fails in SELECT!
# ✅ user_name = Column(String(50))   # Works perfectly!

Data Management

Features

  • 🚀 High Performance: Optimized for MatrixOne database operations with connection pooling

  • 🔄 Async Support: Full async/await support with AsyncClient for non-blocking operations

  • 🧠 Vector Search: Advanced vector similarity search with HNSW and IVF indexing

  • 🔍 Fulltext Search: Powerful fulltext search with BM25 and TF-IDF algorithms

  • 📊 Metadata Analysis: Comprehensive table and column metadata analysis with statistics

  • 📸 Snapshot Management: Create and manage database snapshots at multiple levels

  • Point-in-Time Recovery: PITR functionality for precise data recovery

  • 🔄 Table Cloning: Clone databases and tables efficiently with data replication

  • 👥 Account Management: Comprehensive user, role, and permission management

  • 📊 Pub/Sub: Real-time publication and subscription support

  • 🔧 Version Management: Automatic backend version detection and compatibility checking

  • 🛡️ Type Safety: Full type hints support with comprehensive documentation

  • 🌿 Branch Management: Git-style version control with BranchManager and SQLAlchemy-style statement builders

  • 📚 SQLAlchemy Integration: Seamless SQLAlchemy integration with enhanced ORM features

  • 🔗 Enhanced Query Building: Advanced query building with SQLAlchemy expressions

  • 🎯 Logical Operations: Enhanced logical operations including logical_in functionality

  • 🛠️ MO-DIAG Tool: Interactive diagnostic tool for index inspection and health monitoring

  • 📖 Comprehensive Documentation: Detailed API documentation with examples

Quick Start

Basic Connection:

from matrixone import Client

# Create and connect to MatrixOne
client = Client()
client.connect(
    host='localhost',
    port=6001,
    user='root',
    password='111',
    database='test'
)

# Get backend version (auto-detected)
version = client.get_backend_version()
print(f"MatrixOne version: {version}")

client.disconnect()

Transaction Management (Recommended):

from matrixone import Client
from matrixone.orm import Base, Column, Integer, String
from sqlalchemy import select, insert, update, delete

# Define ORM model
Base = declarative_base()

class User(Base):
    __tablename__ = 'users'
    id = Column(Integer, primary_key=True)
    name = Column(String(100))
    email = Column(String(255))
    age = Column(Integer)

client = Client()
client.connect(database='test')

# Create table
client.create_table(User)

# Use session for atomic transactions (recommended)
with client.session() as session:
    # All operations are atomic - succeed or fail together
    session.execute(insert(User).values(name='Alice', email='alice@example.com', age=30))
    session.execute(update(User).where(User.age < 18).values(status='minor'))

    # Query using SQLAlchemy select
    stmt = select(User).where(User.age > 25)
    result = session.execute(stmt)
    for user in result.scalars():
        print(f"User: {user.name}, Age: {user.age}")

    # Commits automatically on success, rolls back on error

client.disconnect()

Note

Why use ``session()``?

  • Atomic operations - all succeed or fail together

  • Automatic rollback on errors

  • Access to all managers (snapshots, clones, load_data, etc.)

  • Full SQLAlchemy ORM support with type safety

See Quick Start and ORM Usage Guide for detailed examples.

Vector Search:

from matrixone.sqlalchemy_ext import create_vector_column
from sqlalchemy import Column, Integer, String, Text
from matrixone.orm import declarative_base
import numpy as np

# Define vector table model
Base = declarative_base()

class Document(Base):
    __tablename__ = 'documents'
    id = Column(Integer, primary_key=True)
    title = Column(String(200))
    content = Column(Text)
    embedding = create_vector_column(384, 'f32')

# Create table and insert initial data first (recommended)
client.create_table(Document)

# ⚠️ Best practice: Insert initial data BEFORE creating IVF index
client.insert('documents', {
    'id': 1,
    'title': 'AI Research',
    'content': 'Machine learning paper...',
    'embedding': np.random.rand(384).tolist()
})

# Create IVF index after initial data (better clustering)
client.vector_ops.enable_ivf()
client.vector_ops.create_ivf(
    'documents',  # Table name as positional argument
    name='idx_embedding',
    column='embedding',
    lists=100  # Number of clusters
)

# IVF supports dynamic updates - can continue inserting
client.insert('documents', {'id': 2, ...})  # ✅ Works fine

results = client.vector_ops.similarity_search(
    'documents',  # Table name as positional argument
    vector_column='embedding',
    query_vector=np.random.rand(384).tolist(),
    limit=5,
    distance_type='cosine'
)

⭐ Monitor IVF Index Health (Critical for Production):

# Get IVF index statistics - Essential for monitoring index quality
stats = client.vector_ops.get_ivf_stats("documents", "embedding")

# Check index balance
counts = stats['distribution']['centroid_count']
balance_ratio = max(counts) / min(counts) if min(counts) > 0 else float('inf')

print(f"Total centroids: {len(counts)}")
print(f"Total vectors: {sum(counts)}")
print(f"Balance ratio: {balance_ratio:.2f}")

# Rebuild if imbalanced (ratio > 2.5)
if balance_ratio > 2.5:
    print("⚠️  Index needs rebuilding for optimal performance!")
    # See vector_guide for detailed monitoring and rebuild procedures

Note

HNSW Index Notes:

  • Requires BigInteger primary key (not Integer)

  • 🚧 Currently read-only after creation (dynamic updates coming soon)

  • Workaround: Drop index → Modify data → Recreate index

See Vector Search Guide for details on HNSW vs IVF selection.

JSON Data Handling:

from matrixone.sqlalchemy_ext import JSON
from sqlalchemy import Column, Integer, String, Numeric

class Product(Base):
    __tablename__ = 'products'
    id = Column(Integer, primary_key=True)
    name = Column(String(200))
    specifications = Column(JSON)  # MatrixOne JSON type

# Insert with Python dictionaries (auto-serialized)
client.insert(Product, {
    'id': 1,
    'name': 'Laptop',
    'specifications': {
        'brand': 'Dell',
        'ram': 16,
        'price': 1299.99,
        'active': True
    }
})

# SQLAlchemy standard syntax
results = client.query(Product).filter(
    Product.specifications['brand'] == 'Dell'
).filter(
    Product.specifications['price'].cast(Numeric) > 1000
).all()

# Extract text without quotes
stmt = select(
    Product.name,
    Product.specifications['brand'].astext.label('brand')
)

Fulltext Search:

# Create fulltext index
client.fulltext_index.create(
    'documents',
    'ftidx_content',
    ['title', 'content'],
    algorithm='BM25'
)

# Search with natural language
from matrixone.sqlalchemy_ext.fulltext_search import natural_match

results = client.query(Document).filter(
    natural_match('title', 'content', query='machine learning techniques')
).all()

Metadata Analysis:

# Analyze table metadata
metadata = client.metadata.scan(
    dbname='test',
    tablename='documents'
)

for row in metadata:
    print(f"{row.column_name}: {row.data_type}")
    print(f"  Nulls: {row.null_count}, Distinct: {row.distinct_count}")

# Get table statistics
stats = client.metadata.get_table_brief_stats(
    dbname='test',
    tablename='documents'
)
print(f"Rows: {stats.row_count}, Size: {stats.size_bytes} bytes")

Installation

pip install matrixone-python-sdk

For development installation, see the Installation page.

Indices and tables