MatrixOne Python SDK Documentation
Welcome to the MatrixOne Python SDK documentation!
The MatrixOne Python SDK provides a comprehensive, high-level interface for MatrixOne database operations, including SQLAlchemy-like ORM interface, vector similarity search, fulltext search, snapshot management, PITR (Point-in-Time Recovery), restore operations, table cloning, account management, pub/sub operations, and mo-ctl integration. The SDK is designed for both synchronous and asynchronous operations with full type safety and extensive documentation.
Danger
🚨 MUST READ: Column Naming Convention
Always use lowercase with underscores (snake_case) for column names!
Using camelCase will cause SELECT queries to fail. See Column Naming Conventions for details.
# ❌ userName = Column(String(50)) # Fails in SELECT!
# ✅ user_name = Column(String(50)) # Works perfectly!
Getting Started
- Installation
- Quick Start
- Basic Usage
- Transaction Management (Recommended)
- Wrapping Existing SQLAlchemy Sessions
- SQLAlchemy ORM Style (Recommended)
- Async Usage with Sessions and ORM
- Vector Operations with Table Models
- HNSW Vector Indexing
- ORM with Modern Patterns
- Vector Search with Modern API
- Async Vector Operations
- Transaction Management
- Error Handling with Modern API
- Configuration Best Practices
- Next Steps
- Column Naming Conventions
- Configuration Management Guide
Core Features
Data Management
Advanced Features
- Account Management Guide
- Pub/Sub Operations Guide
- MOCTL Command Line Interface Guide
- MO-DIAG Interactive Diagnostic Tool Guide
- Overview
- Getting Started
- Connection Management
- Index Inspection Commands
- Vector Index Commands
- CDC Monitoring Commands
- CDC Task Management
- Table Statistics Commands
- Database Operations
- Utility Commands
- Real-World Examples
- Best Practices
- Troubleshooting
- Command Reference Summary
- Command-Line Options
- Programmatic Usage
Production Guide
- Best Practices Guide
- SDK-First Development Philosophy
- Table Operations Best Practices
- Vector Operations Best Practices
- Fulltext Search Best Practices
- Metadata Analysis Best Practices
- Transaction Management
- Snapshot and PITR Operations
- Account and User Management
- Pub/Sub Operations
- Async Operations Best Practices
- Performance Optimization
- Index Maintenance Best Practices
- Error Handling Best Practices
- Testing Best Practices
- Summary of SDK Features
- Connection Hooks Guide
Reference
Features
🚀 High Performance: Optimized for MatrixOne database operations with connection pooling
🔄 Async Support: Full async/await support with AsyncClient for non-blocking operations
🧠 Vector Search: Advanced vector similarity search with HNSW and IVF indexing
🔍 Fulltext Search: Powerful fulltext search with BM25 and TF-IDF algorithms
📊 Metadata Analysis: Comprehensive table and column metadata analysis with statistics
📸 Snapshot Management: Create and manage database snapshots at multiple levels
⏰ Point-in-Time Recovery: PITR functionality for precise data recovery
🔄 Table Cloning: Clone databases and tables efficiently with data replication
👥 Account Management: Comprehensive user, role, and permission management
📊 Pub/Sub: Real-time publication and subscription support
🔧 Version Management: Automatic backend version detection and compatibility checking
🛡️ Type Safety: Full type hints support with comprehensive documentation
🌿 Branch Management: Git-style version control with BranchManager and SQLAlchemy-style statement builders
📚 SQLAlchemy Integration: Seamless SQLAlchemy integration with enhanced ORM features
🔗 Enhanced Query Building: Advanced query building with SQLAlchemy expressions
🎯 Logical Operations: Enhanced logical operations including logical_in functionality
🛠️ MO-DIAG Tool: Interactive diagnostic tool for index inspection and health monitoring
📖 Comprehensive Documentation: Detailed API documentation with examples
Quick Start
Basic Connection:
from matrixone import Client
# Create and connect to MatrixOne
client = Client()
client.connect(
host='localhost',
port=6001,
user='root',
password='111',
database='test'
)
# Get backend version (auto-detected)
version = client.get_backend_version()
print(f"MatrixOne version: {version}")
client.disconnect()
Transaction Management (Recommended):
from matrixone import Client
from matrixone.orm import Base, Column, Integer, String
from sqlalchemy import select, insert, update, delete
# Define ORM model
Base = declarative_base()
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
name = Column(String(100))
email = Column(String(255))
age = Column(Integer)
client = Client()
client.connect(database='test')
# Create table
client.create_table(User)
# Use session for atomic transactions (recommended)
with client.session() as session:
# All operations are atomic - succeed or fail together
session.execute(insert(User).values(name='Alice', email='alice@example.com', age=30))
session.execute(update(User).where(User.age < 18).values(status='minor'))
# Query using SQLAlchemy select
stmt = select(User).where(User.age > 25)
result = session.execute(stmt)
for user in result.scalars():
print(f"User: {user.name}, Age: {user.age}")
# Commits automatically on success, rolls back on error
client.disconnect()
Note
Why use ``session()``?
✅ Atomic operations - all succeed or fail together
✅ Automatic rollback on errors
✅ Access to all managers (snapshots, clones, load_data, etc.)
✅ Full SQLAlchemy ORM support with type safety
See Quick Start and ORM Usage Guide for detailed examples.
Vector Search:
from matrixone.sqlalchemy_ext import create_vector_column
from sqlalchemy import Column, Integer, String, Text
from matrixone.orm import declarative_base
import numpy as np
# Define vector table model
Base = declarative_base()
class Document(Base):
__tablename__ = 'documents'
id = Column(Integer, primary_key=True)
title = Column(String(200))
content = Column(Text)
embedding = create_vector_column(384, 'f32')
# Create table and insert initial data first (recommended)
client.create_table(Document)
# ⚠️ Best practice: Insert initial data BEFORE creating IVF index
client.insert('documents', {
'id': 1,
'title': 'AI Research',
'content': 'Machine learning paper...',
'embedding': np.random.rand(384).tolist()
})
# Create IVF index after initial data (better clustering)
client.vector_ops.enable_ivf()
client.vector_ops.create_ivf(
'documents', # Table name as positional argument
name='idx_embedding',
column='embedding',
lists=100 # Number of clusters
)
# IVF supports dynamic updates - can continue inserting
client.insert('documents', {'id': 2, ...}) # ✅ Works fine
results = client.vector_ops.similarity_search(
'documents', # Table name as positional argument
vector_column='embedding',
query_vector=np.random.rand(384).tolist(),
limit=5,
distance_type='cosine'
)
⭐ Monitor IVF Index Health (Critical for Production):
# Get IVF index statistics - Essential for monitoring index quality
stats = client.vector_ops.get_ivf_stats("documents", "embedding")
# Check index balance
counts = stats['distribution']['centroid_count']
balance_ratio = max(counts) / min(counts) if min(counts) > 0 else float('inf')
print(f"Total centroids: {len(counts)}")
print(f"Total vectors: {sum(counts)}")
print(f"Balance ratio: {balance_ratio:.2f}")
# Rebuild if imbalanced (ratio > 2.5)
if balance_ratio > 2.5:
print("⚠️ Index needs rebuilding for optimal performance!")
# See vector_guide for detailed monitoring and rebuild procedures
Note
HNSW Index Notes:
Requires
BigIntegerprimary key (notInteger)🚧 Currently read-only after creation (dynamic updates coming soon)
Workaround: Drop index → Modify data → Recreate index
See Vector Search Guide for details on HNSW vs IVF selection.
JSON Data Handling:
from matrixone.sqlalchemy_ext import JSON
from sqlalchemy import Column, Integer, String, Numeric
class Product(Base):
__tablename__ = 'products'
id = Column(Integer, primary_key=True)
name = Column(String(200))
specifications = Column(JSON) # MatrixOne JSON type
# Insert with Python dictionaries (auto-serialized)
client.insert(Product, {
'id': 1,
'name': 'Laptop',
'specifications': {
'brand': 'Dell',
'ram': 16,
'price': 1299.99,
'active': True
}
})
# SQLAlchemy standard syntax
results = client.query(Product).filter(
Product.specifications['brand'] == 'Dell'
).filter(
Product.specifications['price'].cast(Numeric) > 1000
).all()
# Extract text without quotes
stmt = select(
Product.name,
Product.specifications['brand'].astext.label('brand')
)
Fulltext Search:
# Create fulltext index
client.fulltext_index.create(
'documents',
'ftidx_content',
['title', 'content'],
algorithm='BM25'
)
# Search with natural language
from matrixone.sqlalchemy_ext.fulltext_search import natural_match
results = client.query(Document).filter(
natural_match('title', 'content', query='machine learning techniques')
).all()
Metadata Analysis:
# Analyze table metadata
metadata = client.metadata.scan(
dbname='test',
tablename='documents'
)
for row in metadata:
print(f"{row.column_name}: {row.data_type}")
print(f" Nulls: {row.null_count}, Distinct: {row.distinct_count}")
# Get table statistics
stats = client.metadata.get_table_brief_stats(
dbname='test',
tablename='documents'
)
print(f"Rows: {stats.row_count}, Size: {stats.size_bytes} bytes")
Installation
pip install matrixone-python-sdk
For development installation, see the Installation page.