MatrixOne Python SDK Documentation
==================================

Welcome to the MatrixOne Python SDK documentation!

The MatrixOne Python SDK provides a comprehensive, high-level interface for MatrixOne database operations,
including SQLAlchemy-like ORM interface, vector similarity search, fulltext search, snapshot management,
PITR (Point-in-Time Recovery), restore operations, table cloning, account management, pub/sub operations,
and mo-ctl integration. The SDK is designed for both synchronous and asynchronous operations with full
type safety and extensive documentation.

.. danger::
   **🚨 MUST READ: Column Naming Convention**
   
   **Always use lowercase with underscores (snake_case) for column names!**
   
   Using camelCase will cause SELECT queries to fail. See :doc:`naming_conventions` for details.
   
   .. code-block:: python
   
      # ❌ userName = Column(String(50))    # Fails in SELECT!
      # ✅ user_name = Column(String(50))   # Works perfectly!

.. toctree::
   :maxdepth: 2
   :caption: Getting Started

   installation
   quickstart
   naming_conventions
   configuration_guide

.. toctree::
   :maxdepth: 2
   :caption: Core Features

   vector_guide
   fulltext_guide
   json_guide
   orm_guide
   metadata_guide
   index_verification_guide

.. toctree::
   :maxdepth: 2
   :caption: Data Management

   stage_guide
   load_data_guide
   export_guide
   snapshot_restore_guide
   clone_guide
   branch_guide

.. toctree::
   :maxdepth: 2
   :caption: Advanced Features

   account_guide
   pubsub_guide
   moctl_guide
   mo_diag_guide

.. toctree::
   :maxdepth: 2
   :caption: Production Guide

   best_practices
   connection_hooks_guide

.. toctree::
   :maxdepth: 2
   :caption: Reference

   api/index
   examples
   contributing

Features
--------

* 🚀 **High Performance**: Optimized for MatrixOne database operations with connection pooling
* 🔄 **Async Support**: Full async/await support with AsyncClient for non-blocking operations
* 🧠 **Vector Search**: Advanced vector similarity search with HNSW and IVF indexing
* 🔍 **Fulltext Search**: Powerful fulltext search with BM25 and TF-IDF algorithms
* 📊 **Metadata Analysis**: Comprehensive table and column metadata analysis with statistics
* 📸 **Snapshot Management**: Create and manage database snapshots at multiple levels
* ⏰ **Point-in-Time Recovery**: PITR functionality for precise data recovery
* 🔄 **Table Cloning**: Clone databases and tables efficiently with data replication
* 👥 **Account Management**: Comprehensive user, role, and permission management
* 📊 **Pub/Sub**: Real-time publication and subscription support
* 🔧 **Version Management**: Automatic backend version detection and compatibility checking
* 🛡️ **Type Safety**: Full type hints support with comprehensive documentation
* 🌿 **Branch Management**: Git-style version control with BranchManager and SQLAlchemy-style statement builders
* 📚 **SQLAlchemy Integration**: Seamless SQLAlchemy integration with enhanced ORM features
* 🔗 **Enhanced Query Building**: Advanced query building with SQLAlchemy expressions
* 🎯 **Logical Operations**: Enhanced logical operations including logical_in functionality
* 🛠️ **MO-DIAG Tool**: Interactive diagnostic tool for index inspection and health monitoring
* 📖 **Comprehensive Documentation**: Detailed API documentation with examples

Quick Start
-----------

**Basic Connection:**

.. code-block:: python

   from matrixone import Client

   # Create and connect to MatrixOne
   client = Client()
   client.connect(
       host='localhost',
       port=6001,
       user='root',
       password='111',
       database='test'
   )

   # Get backend version (auto-detected)
   version = client.get_backend_version()
   print(f"MatrixOne version: {version}")

   client.disconnect()

**Transaction Management (Recommended):**

.. code-block:: python

   from matrixone import Client
   from matrixone.orm import Base, Column, Integer, String
   from sqlalchemy import select, insert, update, delete

   # Define ORM model
   Base = declarative_base()
   
   class User(Base):
       __tablename__ = 'users'
       id = Column(Integer, primary_key=True)
       name = Column(String(100))
       email = Column(String(255))
       age = Column(Integer)

   client = Client()
   client.connect(database='test')
   
   # Create table
   client.create_table(User)

   # Use session for atomic transactions (recommended)
   with client.session() as session:
       # All operations are atomic - succeed or fail together
       session.execute(insert(User).values(name='Alice', email='alice@example.com', age=30))
       session.execute(update(User).where(User.age < 18).values(status='minor'))
       
       # Query using SQLAlchemy select
       stmt = select(User).where(User.age > 25)
       result = session.execute(stmt)
       for user in result.scalars():
           print(f"User: {user.name}, Age: {user.age}")
       
       # Commits automatically on success, rolls back on error

   client.disconnect()

.. note::
   **Why use ``session()``?**
   
   * ✅ **Atomic operations** - all succeed or fail together
   * ✅ **Automatic rollback** on errors
   * ✅ **Access to all managers** (snapshots, clones, load_data, etc.)
   * ✅ **Full SQLAlchemy ORM** support with type safety
   
   See :doc:`quickstart` and :doc:`orm_guide` for detailed examples.

**Vector Search:**

.. code-block:: python

   from matrixone.sqlalchemy_ext import create_vector_column
   from sqlalchemy import Column, Integer, String, Text
   from matrixone.orm import declarative_base
   import numpy as np

   # Define vector table model
   Base = declarative_base()
   
   class Document(Base):
       __tablename__ = 'documents'
       id = Column(Integer, primary_key=True)
       title = Column(String(200))
       content = Column(Text)
       embedding = create_vector_column(384, 'f32')

   # Create table and insert initial data first (recommended)
   client.create_table(Document)
   
   # ⚠️ Best practice: Insert initial data BEFORE creating IVF index
   client.insert('documents', {
       'id': 1,
       'title': 'AI Research',
       'content': 'Machine learning paper...',
       'embedding': np.random.rand(384).tolist()
   })
   
   # Create IVF index after initial data (better clustering)
   client.vector_ops.enable_ivf()
   client.vector_ops.create_ivf(
       'documents',  # Table name as positional argument
       name='idx_embedding',
       column='embedding',
       lists=100  # Number of clusters
   )
   
   # IVF supports dynamic updates - can continue inserting
   client.insert('documents', {'id': 2, ...})  # ✅ Works fine

   results = client.vector_ops.similarity_search(
       'documents',  # Table name as positional argument
       vector_column='embedding',
       query_vector=np.random.rand(384).tolist(),
       limit=5,
       distance_type='cosine'
   )

**⭐ Monitor IVF Index Health (Critical for Production):**

.. code-block:: python

   # Get IVF index statistics - Essential for monitoring index quality
   stats = client.vector_ops.get_ivf_stats("documents", "embedding")
   
   # Check index balance
   counts = stats['distribution']['centroid_count']
   balance_ratio = max(counts) / min(counts) if min(counts) > 0 else float('inf')
   
   print(f"Total centroids: {len(counts)}")
   print(f"Total vectors: {sum(counts)}")
   print(f"Balance ratio: {balance_ratio:.2f}")
   
   # Rebuild if imbalanced (ratio > 2.5)
   if balance_ratio > 2.5:
       print("⚠️  Index needs rebuilding for optimal performance!")
       # See vector_guide for detailed monitoring and rebuild procedures

.. note::
   **HNSW Index Notes:**
   
   * Requires ``BigInteger`` primary key (not ``Integer``)
   * 🚧 Currently read-only after creation (*dynamic updates coming soon*)
   * Workaround: Drop index → Modify data → Recreate index
   
   See :doc:`vector_guide` for details on HNSW vs IVF selection.

**JSON Data Handling:**

.. code-block:: python

   from matrixone.sqlalchemy_ext import JSON
   from sqlalchemy import Column, Integer, String, Numeric

   class Product(Base):
       __tablename__ = 'products'
       id = Column(Integer, primary_key=True)
       name = Column(String(200))
       specifications = Column(JSON)  # MatrixOne JSON type

   # Insert with Python dictionaries (auto-serialized)
   client.insert(Product, {
       'id': 1,
       'name': 'Laptop',
       'specifications': {
           'brand': 'Dell',
           'ram': 16,
           'price': 1299.99,
           'active': True
       }
   })

   # SQLAlchemy standard syntax
   results = client.query(Product).filter(
       Product.specifications['brand'] == 'Dell'
   ).filter(
       Product.specifications['price'].cast(Numeric) > 1000
   ).all()

   # Extract text without quotes
   stmt = select(
       Product.name,
       Product.specifications['brand'].astext.label('brand')
   )

**Fulltext Search:**

.. code-block:: python

   # Create fulltext index
   client.fulltext_index.create(
       'documents',
       'ftidx_content',
       ['title', 'content'],
       algorithm='BM25'
   )

   # Search with natural language
   from matrixone.sqlalchemy_ext.fulltext_search import natural_match
   
   results = client.query(Document).filter(
       natural_match('title', 'content', query='machine learning techniques')
   ).all()

**Metadata Analysis:**

.. code-block:: python

   # Analyze table metadata
   metadata = client.metadata.scan(
       dbname='test',
       tablename='documents'
   )

   for row in metadata:
       print(f"{row.column_name}: {row.data_type}")
       print(f"  Nulls: {row.null_count}, Distinct: {row.distinct_count}")

   # Get table statistics
   stats = client.metadata.get_table_brief_stats(
       dbname='test',
       tablename='documents'
   )
   print(f"Rows: {stats.row_count}, Size: {stats.size_bytes} bytes")

Installation
------------

.. code-block:: bash

   pip install matrixone-python-sdk

For development installation, see the :doc:`installation` page.

Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`