Vector Type Extensions
The MatrixOne Python SDK provides vector data type support through SQLAlchemy extensions.
VectorType
- class matrixone.sqlalchemy_ext.vector_type.VectorType(dimension: int | None = None, precision: str = 'f32')[source]
Bases:
UserDefinedTypeSQLAlchemy type for MatrixOne vector columns.
This type represents vector data in MatrixOne database and provides proper serialization/deserialization for SQLAlchemy operations. It supports both vecf32 and vecf64 precision types with configurable dimensions.
Key Features:
Support for both 32-bit (vecf32) and 64-bit (vecf64) vector precision
Configurable vector dimensions
Automatic serialization/deserialization of vector data
Integration with MatrixOne’s vector indexing and search capabilities
Support for vector similarity operations
- Usage
# Define vector columns in SQLAlchemy models class Document(Base): __tablename__ = ‘documents’ id = Column(Integer, primary_key=True) content = Column(Text) embedding = Column(VectorType(384, VectorPrecision.F32)) # 384-dim f32 vector embedding_64 = Column(VectorType(512, VectorPrecision.F64)) # 512-dim f64 vector
# Use in table creation client.create_table_orm(‘documents’, Column(‘id’, Integer, primary_key=True), Column(‘content’, Text), Column(‘embedding’, VectorType(384, VectorPrecision.F32)) )
Supported Operations:
Vector similarity search using distance functions
Vector indexing with HNSW and IVF algorithms
Vector arithmetic operations
Integration with fulltext search capabilities
Note: Vector dimensions and precision must match the requirements of your vector indexing strategy and embedding model.
- __init__(dimension: int | None = None, precision: str = 'f32')[source]
Initialize VectorType.
Args:
dimension: Vector dimension (optional) precision: Vector precision - VectorPrecision.F32 for vecf32, VectorPrecision.F64 for vecf64
Usage Examples
Basic Vector Type Usage
from sqlalchemy import Column, Integer, String, create_engine
from matrixone.orm import declarative_base
from matrixone.sqlalchemy_ext import VectorType
Base = declarative_base()
class Document(Base):
__tablename__ = 'documents'
id = Column(Integer, primary_key=True)
title = Column(String(200))
content = Column(String(1000))
embedding = Column(VectorType(384)) # 384-dimensional vector
# Create table
engine = create_engine('mysql+pymysql://root:111@localhost:6001/test')
Base.metadata.create_all(engine)
Vector Operations
from sqlalchemy.orm import sessionmaker
from sqlalchemy import text
Session = sessionmaker(bind=engine)
session = Session()
# Insert vector data
doc = Document(
title='Sample Document',
content='This is a sample document',
embedding=[0.1, 0.2, 0.3, ...] # 384-dimensional vector
)
session.add(doc)
session.commit()
# Search similar vectors using L2 distance
query_vector = [0.1, 0.2, 0.3, ...]
result = session.execute(text("""
SELECT id, title, embedding <-> :query_vector as distance
FROM documents
ORDER BY embedding <-> :query_vector
LIMIT 10
"""), {'query_vector': query_vector})
for row in result:
print(f"Document {row.id}: {row.title} (Distance: {row.distance})")
# Search using cosine similarity
result = session.execute(text("""
SELECT id, title, embedding <#> :query_vector as cosine_distance
FROM documents
ORDER BY embedding <#> :query_vector
LIMIT 10
"""), {'query_vector': query_vector})
session.close()
Vector Distance Functions
# L2 distance (Euclidean distance)
result = session.execute(text("""
SELECT id, embedding <-> :query_vector as l2_distance
FROM documents
ORDER BY embedding <-> :query_vector
"""), {'query_vector': query_vector})
# Cosine distance
result = session.execute(text("""
SELECT id, embedding <#> :query_vector as cosine_distance
FROM documents
ORDER BY embedding <#> :query_vector
"""), {'query_vector': query_vector})
# Inner product
result = session.execute(text("""
SELECT id, embedding <*> :query_vector as inner_product
FROM documents
ORDER BY embedding <*> :query_vector DESC
"""), {'query_vector': query_vector})
Vector Index Integration
from matrixone.sqlalchemy_ext import VectorIndex, HNSWConfig
# Create vector index on the embedding column
config = HNSWConfig(m=16, ef_construction=200)
index = VectorIndex(
name='idx_document_embedding',
column='embedding',
algorithm='hnsw',
config=config
)
# Add index to table
Document.__table__.append_constraint(index)
# Create index in database
index.create(engine)
# Now searches will use the index for better performance
result = session.execute(text("""
SELECT id, title, embedding <-> :query_vector as distance
FROM documents
ORDER BY embedding <-> :query_vector
LIMIT 10
"""), {'query_vector': query_vector})
Vector Validation
# Vector dimension validation
try:
# This will raise an error if vector dimension doesn't match
doc = Document(
title='Invalid Document',
embedding=[0.1, 0.2] # Wrong dimension (should be 384)
)
session.add(doc)
session.commit()
except Exception as e:
print(f"Vector dimension error: {e}")
# Valid vector
doc = Document(
title='Valid Document',
embedding=[0.1] * 384 # Correct dimension
)
session.add(doc)
session.commit()