Metadata Manager
MetadataManager
- class matrixone.metadata.MetadataManager(client, executor=None)[source]
Bases:
BaseMetadataManagerSynchronous metadata manager for MatrixOne table metadata operations.
This class provides comprehensive metadata scanning capabilities for analyzing table statistics, column information, data distribution, and storage details. It enables deep introspection of table structure and performance characteristics.
Key Features:
Metadata scanning: Access detailed table and column metadata
Column statistics: Row counts, null counts, min/max values, sums
Storage analysis: Compression ratios, object sizes, data distribution
Index metadata: Scan index-specific metadata
Tombstone inspection: Analyze deleted data objects
Performance insights: Identify storage hotspots and optimization opportunities
Transaction-aware: Full integration with transaction contexts
Executor Pattern:
If executor is None, uses self.client.execute (default client-level executor)
If executor is provided (e.g., session), uses executor.execute (transaction-aware)
All operations can participate in transactions when used via session
Available Metadata Columns:
COL_NAME: Column name
OBJECT_NAME: Object/block name in storage
IS_HIDDEN: Whether object is hidden
OBJ_LOC: Object storage location
CREATE_TS: Creation timestamp
DELETE_TS: Deletion timestamp (for tombstones)
ROWS_CNT: Number of rows in the object
NULL_CNT: Number of null values
COMPRESS_SIZE: Compressed storage size
ORIGIN_SIZE: Original uncompressed size
MIN: Minimum value in column
MAX: Maximum value in column
SUM: Sum of values (for numeric columns)
Usage Examples:
from matrixone import Client from matrixone.metadata import MetadataColumn client = Client(host='localhost', port=6001, user='root', password='111', database='test') # Basic metadata scan (returns SQLAlchemy Result) result = client.metadata.scan('test_db', 'users') for row in result: print(f"Column: {row.col_name}, Rows: {row.rows_cnt}") # Scan specific columns with structured output rows = client.metadata.scan( 'test_db', 'users', columns=[MetadataColumn.COL_NAME, MetadataColumn.ROWS_CNT, MetadataColumn.NULL_CNT] ) for row in rows: print(f"{row['col_name']}: {row['rows_cnt']} rows, {row['null_cnt']} nulls") # Get all structured metadata metadata_rows = client.metadata.scan('test_db', 'users', columns='*') for row in metadata_rows: print(f"{row.col_name}: {row.rows_cnt} rows") # Scan index metadata index_result = client.metadata.scan( 'test_db', 'users', indexname='idx_email' ) # Scan tombstone (deleted) objects tombstone_result = client.metadata.scan( 'test_db', 'users', is_tombstone=True ) # Get table statistics summary stats = client.metadata.get_table_stats('test_db', 'users') print(f"Total rows: {stats['total_rows']}") print(f"Total size: {stats['total_size']}") print(f"Compression ratio: {stats['compression_ratio']}") # Get detailed statistics with indexes and tombstones detailed_stats = client.metadata.get_table_stats( 'test_db', 'users', include_indexes=['idx_email', 'idx_name'], include_tombstone=True ) print(f"Table data: {detailed_stats['table']}") print(f"Tombstone data: {detailed_stats['tombstone']}") for idx in detailed_stats['indexes']: print(f"Index {idx['name']}: {idx['size']}") # Using within a transaction with client.session() as session: # Scan metadata within transaction context result = session.metadata.scan('test_db', 'users')
Use Cases:
Performance analysis: Identify tables with high null counts or poor compression
Storage optimization: Analyze object distribution and compression ratios
Data quality: Check for null values and data distribution
Index analysis: Evaluate index size and effectiveness
Tombstone management: Monitor deleted data objects
Capacity planning: Track table growth and storage usage
See also
Client.create_table: For creating tables
VectorManager: For vector-specific metadata operations
LoadDataManager: For bulk data loading
- scan(dbname: str, tablename: str, is_tombstone: bool | None = None, indexname: str | None = None, columns: List[MetadataColumn | str] | str | None = None, distinct_object_name: bool | None = None) Result | List[MetadataRow][source]
Scan table metadata using metadata_scan function.
- Parameters:
dbname – Database name
tablename – Table name
is_tombstone – Optional tombstone flag (True or False)
indexname – Optional index name
columns – Optional list of columns to return. Can be MetadataColumn enum values or strings. If None, returns all columns as SQLAlchemy Result. If specified, returns List of MetadataRow.
distinct_object_name – Optional flag to return distinct object names only.
- Returns:
SQLAlchemy Result object - If columns is specified: List of MetadataRow
- Return type:
If columns is None
Example:
# Scan all columns of a table (returns SQLAlchemy Result) result = client.metadata.scan("test_db", "users") # Scan specific column result = client.metadata.scan("test_db", "users", indexname="id") # Scan with tombstone filter result = client.metadata.scan("test_db", "users", is_tombstone=False) # Scan tombstone objects result = client.metadata.scan("test_db", "users", is_tombstone=True) # Scan specific index result = client.metadata.scan("test_db", "users", indexname="idx_name") # Get structured results with specific columns from matrixone.metadata import MetadataColumn rows = client.metadata.scan("test_db", "users", columns=[MetadataColumn.COL_NAME, MetadataColumn.ROWS_CNT]) for row in rows: print(f"Column: {row.col_name}, Rows: {row.rows_cnt}") # Get all structured results rows = client.metadata.scan("test_db", "users", columns="*")
- get_table_brief_stats(dbname: str, tablename: str, is_tombstone: bool | None = None, indexname: str | None = None, include_tombstone: bool = False, include_indexes: List[str] | None = None) Dict[str, Dict[str, Any]][source]
Get brief statistics for a table, tombstone, and indexes.
- Parameters:
dbname – Database name
tablename – Table name
is_tombstone – Optional tombstone flag (True or False)
indexname – Optional index name
include_tombstone – Whether to include tombstone statistics
include_indexes – List of index names to include
- Returns:
Dictionary with brief statistics for table, tombstone, and indexes
- get_table_detail_stats(dbname: str, tablename: str, is_tombstone: bool | None = None, indexname: str | None = None, include_tombstone: bool = False, include_indexes: List[str] | None = None) Dict[str, List[Dict[str, Any]]][source]
Get detailed statistics for a table, tombstone, and indexes.
- Parameters:
dbname – Database name
tablename – Table name
is_tombstone – Optional tombstone flag (True or False)
indexname – Optional index name
include_tombstone – Whether to include tombstone statistics
include_indexes – List of index names to include
- Returns:
Dictionary with detailed statistics for table, tombstone, and indexes
AsyncMetadataManager
- class matrixone.metadata.AsyncMetadataManager(client, executor=None)[source]
Bases:
BaseMetadataManagerAsynchronous metadata manager for MatrixOne table metadata operations.
Provides the same comprehensive metadata scanning functionality as MetadataManager but with full async/await support for non-blocking I/O operations. Ideal for high-concurrency applications requiring metadata analysis.
Key Features:
Non-blocking operations: All metadata operations use async/await
Async metadata scanning: Asynchronously access table and column metadata
Concurrent analysis: Scan multiple tables concurrently
Async statistics: Get table statistics without blocking
Transaction-aware: Full integration with async transaction contexts
Executor pattern: Works with both async client and async session
Executor Pattern:
If executor is None, uses self.client.execute (default async client-level executor)
If executor is provided (e.g., async session), uses executor.execute (async transaction-aware)
All operations are non-blocking and use async/await
Enables concurrent metadata operations
Usage Examples:
from matrixone import AsyncClient from matrixone.metadata import MetadataColumn import asyncio async def main(): client = AsyncClient() await client.connect(host='localhost', port=6001, user='root', password='111', database='test') # Basic async metadata scan result = await client.metadata.scan('test_db', 'users') async for row in result: print(f"Column: {row.col_name}, Rows: {row.rows_cnt}") # Scan specific columns with structured output rows = await client.metadata.scan( 'test_db', 'users', columns=[MetadataColumn.COL_NAME, MetadataColumn.ROWS_CNT] ) for row in rows: print(f"{row['col_name']}: {row['rows_cnt']} rows") # Get async table statistics stats = await client.metadata.get_table_stats('test_db', 'users') print(f"Total rows: {stats['total_rows']}") print(f"Compression ratio: {stats['compression_ratio']}") # Concurrent metadata scanning of multiple tables results = await asyncio.gather( client.metadata.scan('test_db', 'users'), client.metadata.scan('test_db', 'orders'), client.metadata.scan('test_db', 'products') ) # Concurrent statistics for multiple tables stats_list = await asyncio.gather( client.metadata.get_table_stats('test_db', 'users'), client.metadata.get_table_stats('test_db', 'orders'), client.metadata.get_table_stats('test_db', 'products') ) for table_stats in stats_list: print(f"Table: {table_stats['total_rows']} rows") # Scan index metadata asynchronously index_result = await client.metadata.scan( 'test_db', 'users', indexname='idx_email' ) # Using within async transaction async with client.session() as session: result = await session.metadata.scan('test_db', 'users') stats = await session.metadata.get_table_stats('test_db', 'orders') await client.disconnect() asyncio.run(main())
Use Cases:
Async performance monitoring: Non-blocking metadata collection
Concurrent analysis: Analyze multiple tables simultaneously
Real-time dashboards: Update table statistics without blocking
High-throughput applications: Metadata operations in async web servers
Batch metadata collection: Gather stats from many tables efficiently
See also
AsyncClient: For async database operations
AsyncSession: For async transaction management
MetadataManager: For synchronous metadata operations
- async scan(dbname: str, tablename: str, is_tombstone: bool | None = None, indexname: str | None = None, columns: List[MetadataColumn | str] | None = None, distinct_object_name: bool | None = None) Result | List[MetadataRow][source]
Scan table metadata using metadata_scan function (async).
- Parameters:
dbname – Database name
tablename – Table name
is_tombstone – Optional tombstone flag (True or False)
indexname – Optional index name
columns – Optional list of columns to return
- Returns:
SQLAlchemy Result object
Example:
# Scan all columns of a table result = await client.metadata.scan("test_db", "users") # Scan specific column result = await client.metadata.scan("test_db", "users", indexname="id") # Scan with tombstone filter result = await client.metadata.scan("test_db", "users", is_tombstone=False) # Scan tombstone objects result = await client.metadata.scan("test_db", "users", is_tombstone=True) # Scan specific index result = await client.metadata.scan("test_db", "users", indexname="idx_name")
- async get_table_brief_stats(dbname: str, tablename: str, is_tombstone: bool | None = None, indexname: str | None = None, include_tombstone: bool = False, include_indexes: List[str] | None = None) Dict[str, Dict[str, Any]][source]
Get brief statistics for a table, tombstone, and indexes (async).
- Parameters:
dbname – Database name
tablename – Table name
is_tombstone – Optional tombstone flag (True or False)
indexname – Optional index name
include_tombstone – Whether to include tombstone statistics
include_indexes – List of index names to include
- Returns:
Dictionary with brief statistics for table, tombstone, and indexes
- async get_table_detail_stats(dbname: str, tablename: str, is_tombstone: bool | None = None, indexname: str | None = None, include_tombstone: bool = False, include_indexes: List[str] | None = None) Dict[str, List[Dict[str, Any]]][source]
Get detailed statistics for a table, tombstone, and indexes (async).
- Parameters:
dbname – Database name
tablename – Table name
is_tombstone – Optional tombstone flag (True or False)
indexname – Optional index name
include_tombstone – Whether to include tombstone statistics
include_indexes – List of index names to include
- Returns:
Dictionary with detailed statistics for table, tombstone, and indexes
Data Classes
MetadataRow
- class matrixone.metadata.MetadataRow(col_name: str, object_name: str, is_hidden: bool, obj_loc: str, create_ts: str, delete_ts: str, rows_cnt: int, null_cnt: int, compress_size: int, origin_size: int, min: Any | None = None, max: Any | None = None, sum: Any | None = None)[source]
Bases:
objectStructured representation of a metadata scan row
- classmethod from_sqlalchemy_row(row) MetadataRow[source]
Create MetadataRow from SQLAlchemy Row object
MetadataColumn
- class matrixone.metadata.MetadataColumn(value)[source]
Bases:
EnumEnumeration of available metadata columns
- COL_NAME = 'col_name'
- OBJECT_NAME = 'object_name'
- IS_HIDDEN = 'is_hidden'
- OBJ_LOC = 'obj_loc'
- CREATE_TS = 'create_ts'
- DELETE_TS = 'delete_ts'
- ROWS_CNT = 'rows_cnt'
- NULL_CNT = 'null_cnt'
- COMPRESS_SIZE = 'compress_size'
- ORIGIN_SIZE = 'origin_size'
- MIN = 'min'
- MAX = 'max'
- SUM = 'sum'