Metadata Manager

MetadataManager

class matrixone.metadata.MetadataManager(client, executor=None)[source]

Bases: BaseMetadataManager

Synchronous metadata manager for MatrixOne table metadata operations.

This class provides comprehensive metadata scanning capabilities for analyzing table statistics, column information, data distribution, and storage details. It enables deep introspection of table structure and performance characteristics.

Key Features:

Metadata scanning: Access detailed table and column metadata
Column statistics: Row counts, null counts, min/max values, sums
Storage analysis: Compression ratios, object sizes, data distribution
Index metadata: Scan index-specific metadata
Tombstone inspection: Analyze deleted data objects
Performance insights: Identify storage hotspots and optimization opportunities
Transaction-aware: Full integration with transaction contexts

Executor Pattern:

If executor is None, uses self.client.execute (default client-level executor)
If executor is provided (e.g., session), uses executor.execute (transaction-aware)
All operations can participate in transactions when used via session

Available Metadata Columns:

COL_NAME: Column name
OBJECT_NAME: Object/block name in storage
IS_HIDDEN: Whether object is hidden
OBJ_LOC: Object storage location
CREATE_TS: Creation timestamp
DELETE_TS: Deletion timestamp (for tombstones)
ROWS_CNT: Number of rows in the object
NULL_CNT: Number of null values
COMPRESS_SIZE: Compressed storage size
ORIGIN_SIZE: Original uncompressed size
MIN: Minimum value in column
MAX: Maximum value in column
SUM: Sum of values (for numeric columns)

Usage Examples:

from matrixone import Client
from matrixone.metadata import MetadataColumn

client = Client(host='localhost', port=6001, user='root', password='111', database='test')

# Basic metadata scan (returns SQLAlchemy Result)
result = client.metadata.scan('test_db', 'users')
for row in result:
    print(f"Column: {row.col_name}, Rows: {row.rows_cnt}")

# Scan specific columns with structured output
rows = client.metadata.scan(
    'test_db', 'users',
    columns=[MetadataColumn.COL_NAME, MetadataColumn.ROWS_CNT, MetadataColumn.NULL_CNT]
)
for row in rows:
    print(f"{row['col_name']}: {row['rows_cnt']} rows, {row['null_cnt']} nulls")

# Get all structured metadata
metadata_rows = client.metadata.scan('test_db', 'users', columns='*')
for row in metadata_rows:
    print(f"{row.col_name}: {row.rows_cnt} rows")

# Scan index metadata
index_result = client.metadata.scan(
    'test_db', 'users',
    indexname='idx_email'
)

# Scan tombstone (deleted) objects
tombstone_result = client.metadata.scan(
    'test_db', 'users',
    is_tombstone=True
)

# Get table statistics summary
stats = client.metadata.get_table_stats('test_db', 'users')
print(f"Total rows: {stats['total_rows']}")
print(f"Total size: {stats['total_size']}")
print(f"Compression ratio: {stats['compression_ratio']}")

# Get detailed statistics with indexes and tombstones
detailed_stats = client.metadata.get_table_stats(
    'test_db', 'users',
    include_indexes=['idx_email', 'idx_name'],
    include_tombstone=True
)
print(f"Table data: {detailed_stats['table']}")
print(f"Tombstone data: {detailed_stats['tombstone']}")
for idx in detailed_stats['indexes']:
    print(f"Index {idx['name']}: {idx['size']}")

# Using within a transaction
with client.session() as session:
    # Scan metadata within transaction context
    result = session.metadata.scan('test_db', 'users')

Use Cases:

Performance analysis: Identify tables with high null counts or poor compression
Storage optimization: Analyze object distribution and compression ratios
Data quality: Check for null values and data distribution
Index analysis: Evaluate index size and effectiveness
Tombstone management: Monitor deleted data objects
Capacity planning: Track table growth and storage usage

See also

Client.create_table: For creating tables
VectorManager: For vector-specific metadata operations
LoadDataManager: For bulk data loading

Scan table metadata using metadata_scan function.

Parameters:

dbname – Database name
tablename – Table name
is_tombstone – Optional tombstone flag (True or False)
indexname – Optional index name
columns – Optional list of columns to return. Can be MetadataColumn enum values or strings. If None, returns all columns as SQLAlchemy Result. If specified, returns List of MetadataRow.
distinct_object_name – Optional flag to return distinct object names only.

Returns:

SQLAlchemy Result object - If columns is specified: List of MetadataRow

Return type:

If columns is None

Example:

# Scan all columns of a table (returns SQLAlchemy Result)
result = client.metadata.scan("test_db", "users")

# Scan specific column
result = client.metadata.scan("test_db", "users", indexname="id")

# Scan with tombstone filter
result = client.metadata.scan("test_db", "users", is_tombstone=False)

# Scan tombstone objects
result = client.metadata.scan("test_db", "users", is_tombstone=True)

# Scan specific index
result = client.metadata.scan("test_db", "users", indexname="idx_name")

# Get structured results with specific columns
from matrixone.metadata import MetadataColumn
rows = client.metadata.scan("test_db", "users",
                             columns=[MetadataColumn.COL_NAME,
                                      MetadataColumn.ROWS_CNT])
for row in rows:
    print(f"Column: {row.col_name}, Rows: {row.rows_cnt}")

# Get all structured results
rows = client.metadata.scan("test_db", "users", columns="*")

get_table_brief_stats(dbname: str, tablename: str, is_tombstone: bool | None = None, indexname: str | None = None, include_tombstone: bool = False, include_indexes: List[str] | None = None) → Dict[str, Dict[str, Any]][source]

Get brief statistics for a table, tombstone, and indexes.

Parameters:

dbname – Database name
tablename – Table name
is_tombstone – Optional tombstone flag (True or False)
indexname – Optional index name
include_tombstone – Whether to include tombstone statistics
include_indexes – List of index names to include

Returns:

Dictionary with brief statistics for table, tombstone, and indexes

get_table_detail_stats(dbname: str, tablename: str, is_tombstone: bool | None = None, indexname: str | None = None, include_tombstone: bool = False, include_indexes: List[str] | None = None) → Dict[str, List[Dict[str, Any]]][source]

Get detailed statistics for a table, tombstone, and indexes.

Parameters:

dbname – Database name
tablename – Table name
is_tombstone – Optional tombstone flag (True or False)
indexname – Optional index name
include_tombstone – Whether to include tombstone statistics
include_indexes – List of index names to include

Returns:

Dictionary with detailed statistics for table, tombstone, and indexes

AsyncMetadataManager

class matrixone.metadata.AsyncMetadataManager(client, executor=None)[source]

Bases: BaseMetadataManager

Asynchronous metadata manager for MatrixOne table metadata operations.

Provides the same comprehensive metadata scanning functionality as MetadataManager but with full async/await support for non-blocking I/O operations. Ideal for high-concurrency applications requiring metadata analysis.

Key Features:

Non-blocking operations: All metadata operations use async/await
Async metadata scanning: Asynchronously access table and column metadata
Concurrent analysis: Scan multiple tables concurrently
Async statistics: Get table statistics without blocking
Transaction-aware: Full integration with async transaction contexts
Executor pattern: Works with both async client and async session

Executor Pattern:

If executor is None, uses self.client.execute (default async client-level executor)
If executor is provided (e.g., async session), uses executor.execute (async transaction-aware)
All operations are non-blocking and use async/await
Enables concurrent metadata operations

Usage Examples:

from matrixone import AsyncClient
from matrixone.metadata import MetadataColumn
import asyncio

async def main():
    client = AsyncClient()
    await client.connect(host='localhost', port=6001, user='root', password='111', database='test')

    # Basic async metadata scan
    result = await client.metadata.scan('test_db', 'users')
    async for row in result:
        print(f"Column: {row.col_name}, Rows: {row.rows_cnt}")

    # Scan specific columns with structured output
    rows = await client.metadata.scan(
        'test_db', 'users',
        columns=[MetadataColumn.COL_NAME, MetadataColumn.ROWS_CNT]
    )
    for row in rows:
        print(f"{row['col_name']}: {row['rows_cnt']} rows")

    # Get async table statistics
    stats = await client.metadata.get_table_stats('test_db', 'users')
    print(f"Total rows: {stats['total_rows']}")
    print(f"Compression ratio: {stats['compression_ratio']}")

    # Concurrent metadata scanning of multiple tables
    results = await asyncio.gather(
        client.metadata.scan('test_db', 'users'),
        client.metadata.scan('test_db', 'orders'),
        client.metadata.scan('test_db', 'products')
    )

    # Concurrent statistics for multiple tables
    stats_list = await asyncio.gather(
        client.metadata.get_table_stats('test_db', 'users'),
        client.metadata.get_table_stats('test_db', 'orders'),
        client.metadata.get_table_stats('test_db', 'products')
    )
    for table_stats in stats_list:
        print(f"Table: {table_stats['total_rows']} rows")

    # Scan index metadata asynchronously
    index_result = await client.metadata.scan(
        'test_db', 'users',
        indexname='idx_email'
    )

    # Using within async transaction
    async with client.session() as session:
        result = await session.metadata.scan('test_db', 'users')
        stats = await session.metadata.get_table_stats('test_db', 'orders')

    await client.disconnect()

asyncio.run(main())

Use Cases:

Async performance monitoring: Non-blocking metadata collection
Concurrent analysis: Analyze multiple tables simultaneously
Real-time dashboards: Update table statistics without blocking
High-throughput applications: Metadata operations in async web servers
Batch metadata collection: Gather stats from many tables efficiently

Data Classes

MetadataRow

class matrixone.metadata.MetadataRow(col_name: str, object_name: str, is_hidden: bool, obj_loc: str, create_ts: str, delete_ts: str, rows_cnt: int, null_cnt: int, compress_size: int, origin_size: int, min: Any | None = None, max: Any | None = None, sum: Any | None = None)[source]

Bases: object

Structured representation of a metadata scan row

col_name: str

object_name: str

is_hidden: bool

obj_loc: str

create_ts: str

delete_ts: str

rows_cnt: int

null_cnt: int

compress_size: int

origin_size: int

min: Any | None = None

max: Any | None = None

sum: Any | None = None

classmethod from_sqlalchemy_row(row) → MetadataRow[source]: Create MetadataRow from SQLAlchemy Row object

__init__(col_name: str, object_name: str, is_hidden: bool, obj_loc: str, create_ts: str, delete_ts: str, rows_cnt: int, null_cnt: int, compress_size: int, origin_size: int, min: Any | None = None, max: Any | None = None, sum: Any | None = None) → None

MetadataColumn

class matrixone.metadata.MetadataColumn(value)[source]

Bases: Enum

Enumeration of available metadata columns

COL_NAME = 'col_name'

OBJECT_NAME = 'object_name'

IS_HIDDEN = 'is_hidden'

OBJ_LOC = 'obj_loc'

CREATE_TS = 'create_ts'

DELETE_TS = 'delete_ts'

ROWS_CNT = 'rows_cnt'

NULL_CNT = 'null_cnt'

COMPRESS_SIZE = 'compress_size'

ORIGIN_SIZE = 'origin_size'

MIN = 'min'

MAX = 'max'

SUM = 'sum'