Metadata Manager

MetadataManager

class matrixone.metadata.MetadataManager(client, executor=None)[source]

Bases: BaseMetadataManager

Synchronous metadata manager for MatrixOne table metadata operations.

This class provides comprehensive metadata scanning capabilities for analyzing table statistics, column information, data distribution, and storage details. It enables deep introspection of table structure and performance characteristics.

Key Features:

  • Metadata scanning: Access detailed table and column metadata

  • Column statistics: Row counts, null counts, min/max values, sums

  • Storage analysis: Compression ratios, object sizes, data distribution

  • Index metadata: Scan index-specific metadata

  • Tombstone inspection: Analyze deleted data objects

  • Performance insights: Identify storage hotspots and optimization opportunities

  • Transaction-aware: Full integration with transaction contexts

Executor Pattern:

  • If executor is None, uses self.client.execute (default client-level executor)

  • If executor is provided (e.g., session), uses executor.execute (transaction-aware)

  • All operations can participate in transactions when used via session

Available Metadata Columns:

  • COL_NAME: Column name

  • OBJECT_NAME: Object/block name in storage

  • IS_HIDDEN: Whether object is hidden

  • OBJ_LOC: Object storage location

  • CREATE_TS: Creation timestamp

  • DELETE_TS: Deletion timestamp (for tombstones)

  • ROWS_CNT: Number of rows in the object

  • NULL_CNT: Number of null values

  • COMPRESS_SIZE: Compressed storage size

  • ORIGIN_SIZE: Original uncompressed size

  • MIN: Minimum value in column

  • MAX: Maximum value in column

  • SUM: Sum of values (for numeric columns)

Usage Examples:

from matrixone import Client
from matrixone.metadata import MetadataColumn

client = Client(host='localhost', port=6001, user='root', password='111', database='test')

# Basic metadata scan (returns SQLAlchemy Result)
result = client.metadata.scan('test_db', 'users')
for row in result:
    print(f"Column: {row.col_name}, Rows: {row.rows_cnt}")

# Scan specific columns with structured output
rows = client.metadata.scan(
    'test_db', 'users',
    columns=[MetadataColumn.COL_NAME, MetadataColumn.ROWS_CNT, MetadataColumn.NULL_CNT]
)
for row in rows:
    print(f"{row['col_name']}: {row['rows_cnt']} rows, {row['null_cnt']} nulls")

# Get all structured metadata
metadata_rows = client.metadata.scan('test_db', 'users', columns='*')
for row in metadata_rows:
    print(f"{row.col_name}: {row.rows_cnt} rows")

# Scan index metadata
index_result = client.metadata.scan(
    'test_db', 'users',
    indexname='idx_email'
)

# Scan tombstone (deleted) objects
tombstone_result = client.metadata.scan(
    'test_db', 'users',
    is_tombstone=True
)

# Get table statistics summary
stats = client.metadata.get_table_stats('test_db', 'users')
print(f"Total rows: {stats['total_rows']}")
print(f"Total size: {stats['total_size']}")
print(f"Compression ratio: {stats['compression_ratio']}")

# Get detailed statistics with indexes and tombstones
detailed_stats = client.metadata.get_table_stats(
    'test_db', 'users',
    include_indexes=['idx_email', 'idx_name'],
    include_tombstone=True
)
print(f"Table data: {detailed_stats['table']}")
print(f"Tombstone data: {detailed_stats['tombstone']}")
for idx in detailed_stats['indexes']:
    print(f"Index {idx['name']}: {idx['size']}")

# Using within a transaction
with client.session() as session:
    # Scan metadata within transaction context
    result = session.metadata.scan('test_db', 'users')

Use Cases:

  • Performance analysis: Identify tables with high null counts or poor compression

  • Storage optimization: Analyze object distribution and compression ratios

  • Data quality: Check for null values and data distribution

  • Index analysis: Evaluate index size and effectiveness

  • Tombstone management: Monitor deleted data objects

  • Capacity planning: Track table growth and storage usage

See also

  • Client.create_table: For creating tables

  • VectorManager: For vector-specific metadata operations

  • LoadDataManager: For bulk data loading

scan(dbname: str, tablename: str, is_tombstone: bool | None = None, indexname: str | None = None, columns: List[MetadataColumn | str] | str | None = None, distinct_object_name: bool | None = None) Result | List[MetadataRow][source]

Scan table metadata using metadata_scan function.

Parameters:
  • dbname – Database name

  • tablename – Table name

  • is_tombstone – Optional tombstone flag (True or False)

  • indexname – Optional index name

  • columns – Optional list of columns to return. Can be MetadataColumn enum values or strings. If None, returns all columns as SQLAlchemy Result. If specified, returns List of MetadataRow.

  • distinct_object_name – Optional flag to return distinct object names only.

Returns:

SQLAlchemy Result object - If columns is specified: List of MetadataRow

Return type:

  • If columns is None

Example:

# Scan all columns of a table (returns SQLAlchemy Result)
result = client.metadata.scan("test_db", "users")

# Scan specific column
result = client.metadata.scan("test_db", "users", indexname="id")

# Scan with tombstone filter
result = client.metadata.scan("test_db", "users", is_tombstone=False)

# Scan tombstone objects
result = client.metadata.scan("test_db", "users", is_tombstone=True)

# Scan specific index
result = client.metadata.scan("test_db", "users", indexname="idx_name")

# Get structured results with specific columns
from matrixone.metadata import MetadataColumn
rows = client.metadata.scan("test_db", "users",
                             columns=[MetadataColumn.COL_NAME,
                                      MetadataColumn.ROWS_CNT])
for row in rows:
    print(f"Column: {row.col_name}, Rows: {row.rows_cnt}")

# Get all structured results
rows = client.metadata.scan("test_db", "users", columns="*")
get_table_brief_stats(dbname: str, tablename: str, is_tombstone: bool | None = None, indexname: str | None = None, include_tombstone: bool = False, include_indexes: List[str] | None = None) Dict[str, Dict[str, Any]][source]

Get brief statistics for a table, tombstone, and indexes.

Parameters:
  • dbname – Database name

  • tablename – Table name

  • is_tombstone – Optional tombstone flag (True or False)

  • indexname – Optional index name

  • include_tombstone – Whether to include tombstone statistics

  • include_indexes – List of index names to include

Returns:

Dictionary with brief statistics for table, tombstone, and indexes

get_table_detail_stats(dbname: str, tablename: str, is_tombstone: bool | None = None, indexname: str | None = None, include_tombstone: bool = False, include_indexes: List[str] | None = None) Dict[str, List[Dict[str, Any]]][source]

Get detailed statistics for a table, tombstone, and indexes.

Parameters:
  • dbname – Database name

  • tablename – Table name

  • is_tombstone – Optional tombstone flag (True or False)

  • indexname – Optional index name

  • include_tombstone – Whether to include tombstone statistics

  • include_indexes – List of index names to include

Returns:

Dictionary with detailed statistics for table, tombstone, and indexes

AsyncMetadataManager

class matrixone.metadata.AsyncMetadataManager(client, executor=None)[source]

Bases: BaseMetadataManager

Asynchronous metadata manager for MatrixOne table metadata operations.

Provides the same comprehensive metadata scanning functionality as MetadataManager but with full async/await support for non-blocking I/O operations. Ideal for high-concurrency applications requiring metadata analysis.

Key Features:

  • Non-blocking operations: All metadata operations use async/await

  • Async metadata scanning: Asynchronously access table and column metadata

  • Concurrent analysis: Scan multiple tables concurrently

  • Async statistics: Get table statistics without blocking

  • Transaction-aware: Full integration with async transaction contexts

  • Executor pattern: Works with both async client and async session

Executor Pattern:

  • If executor is None, uses self.client.execute (default async client-level executor)

  • If executor is provided (e.g., async session), uses executor.execute (async transaction-aware)

  • All operations are non-blocking and use async/await

  • Enables concurrent metadata operations

Usage Examples:

from matrixone import AsyncClient
from matrixone.metadata import MetadataColumn
import asyncio

async def main():
    client = AsyncClient()
    await client.connect(host='localhost', port=6001, user='root', password='111', database='test')

    # Basic async metadata scan
    result = await client.metadata.scan('test_db', 'users')
    async for row in result:
        print(f"Column: {row.col_name}, Rows: {row.rows_cnt}")

    # Scan specific columns with structured output
    rows = await client.metadata.scan(
        'test_db', 'users',
        columns=[MetadataColumn.COL_NAME, MetadataColumn.ROWS_CNT]
    )
    for row in rows:
        print(f"{row['col_name']}: {row['rows_cnt']} rows")

    # Get async table statistics
    stats = await client.metadata.get_table_stats('test_db', 'users')
    print(f"Total rows: {stats['total_rows']}")
    print(f"Compression ratio: {stats['compression_ratio']}")

    # Concurrent metadata scanning of multiple tables
    results = await asyncio.gather(
        client.metadata.scan('test_db', 'users'),
        client.metadata.scan('test_db', 'orders'),
        client.metadata.scan('test_db', 'products')
    )

    # Concurrent statistics for multiple tables
    stats_list = await asyncio.gather(
        client.metadata.get_table_stats('test_db', 'users'),
        client.metadata.get_table_stats('test_db', 'orders'),
        client.metadata.get_table_stats('test_db', 'products')
    )
    for table_stats in stats_list:
        print(f"Table: {table_stats['total_rows']} rows")

    # Scan index metadata asynchronously
    index_result = await client.metadata.scan(
        'test_db', 'users',
        indexname='idx_email'
    )

    # Using within async transaction
    async with client.session() as session:
        result = await session.metadata.scan('test_db', 'users')
        stats = await session.metadata.get_table_stats('test_db', 'orders')

    await client.disconnect()

asyncio.run(main())

Use Cases:

  • Async performance monitoring: Non-blocking metadata collection

  • Concurrent analysis: Analyze multiple tables simultaneously

  • Real-time dashboards: Update table statistics without blocking

  • High-throughput applications: Metadata operations in async web servers

  • Batch metadata collection: Gather stats from many tables efficiently

See also

  • AsyncClient: For async database operations

  • AsyncSession: For async transaction management

  • MetadataManager: For synchronous metadata operations

async scan(dbname: str, tablename: str, is_tombstone: bool | None = None, indexname: str | None = None, columns: List[MetadataColumn | str] | None = None, distinct_object_name: bool | None = None) Result | List[MetadataRow][source]

Scan table metadata using metadata_scan function (async).

Parameters:
  • dbname – Database name

  • tablename – Table name

  • is_tombstone – Optional tombstone flag (True or False)

  • indexname – Optional index name

  • columns – Optional list of columns to return

Returns:

SQLAlchemy Result object

Example:

# Scan all columns of a table
result = await client.metadata.scan("test_db", "users")

# Scan specific column
result = await client.metadata.scan("test_db", "users", indexname="id")

# Scan with tombstone filter
result = await client.metadata.scan("test_db", "users", is_tombstone=False)

# Scan tombstone objects
result = await client.metadata.scan("test_db", "users", is_tombstone=True)

# Scan specific index
result = await client.metadata.scan("test_db", "users", indexname="idx_name")
async get_table_brief_stats(dbname: str, tablename: str, is_tombstone: bool | None = None, indexname: str | None = None, include_tombstone: bool = False, include_indexes: List[str] | None = None) Dict[str, Dict[str, Any]][source]

Get brief statistics for a table, tombstone, and indexes (async).

Parameters:
  • dbname – Database name

  • tablename – Table name

  • is_tombstone – Optional tombstone flag (True or False)

  • indexname – Optional index name

  • include_tombstone – Whether to include tombstone statistics

  • include_indexes – List of index names to include

Returns:

Dictionary with brief statistics for table, tombstone, and indexes

async get_table_detail_stats(dbname: str, tablename: str, is_tombstone: bool | None = None, indexname: str | None = None, include_tombstone: bool = False, include_indexes: List[str] | None = None) Dict[str, List[Dict[str, Any]]][source]

Get detailed statistics for a table, tombstone, and indexes (async).

Parameters:
  • dbname – Database name

  • tablename – Table name

  • is_tombstone – Optional tombstone flag (True or False)

  • indexname – Optional index name

  • include_tombstone – Whether to include tombstone statistics

  • include_indexes – List of index names to include

Returns:

Dictionary with detailed statistics for table, tombstone, and indexes

Data Classes

MetadataRow

class matrixone.metadata.MetadataRow(col_name: str, object_name: str, is_hidden: bool, obj_loc: str, create_ts: str, delete_ts: str, rows_cnt: int, null_cnt: int, compress_size: int, origin_size: int, min: Any | None = None, max: Any | None = None, sum: Any | None = None)[source]

Bases: object

Structured representation of a metadata scan row

col_name: str
object_name: str
is_hidden: bool
obj_loc: str
create_ts: str
delete_ts: str
rows_cnt: int
null_cnt: int
compress_size: int
origin_size: int
min: Any | None = None
max: Any | None = None
sum: Any | None = None
classmethod from_sqlalchemy_row(row) MetadataRow[source]

Create MetadataRow from SQLAlchemy Row object

__init__(col_name: str, object_name: str, is_hidden: bool, obj_loc: str, create_ts: str, delete_ts: str, rows_cnt: int, null_cnt: int, compress_size: int, origin_size: int, min: Any | None = None, max: Any | None = None, sum: Any | None = None) None

MetadataColumn

class matrixone.metadata.MetadataColumn(value)[source]

Bases: Enum

Enumeration of available metadata columns

COL_NAME = 'col_name'
OBJECT_NAME = 'object_name'
IS_HIDDEN = 'is_hidden'
OBJ_LOC = 'obj_loc'
CREATE_TS = 'create_ts'
DELETE_TS = 'delete_ts'
ROWS_CNT = 'rows_cnt'
NULL_CNT = 'null_cnt'
COMPRESS_SIZE = 'compress_size'
ORIGIN_SIZE = 'origin_size'
MIN = 'min'
MAX = 'max'
SUM = 'sum'