Embedder¶
BEAR
supports embedding with bothOpenAI
and self-hosted Text Embedding Inference server.- Each
bear.model.Resource
must have anembedding
field, configured via EmbeddingConfig. This defines the embedding server, model, and Milvus index settings. - The default embedder is set using
.env
variables with the prefixDEFAULT_EMBEDDING_XXX
. - For details, see embedding.py.
In [1]:
Copied!
from bear.embedding import get_embedder
from bear.config import config, EmbeddingConfig
from bear.embedding import get_embedder from bear.config import config, EmbeddingConfig
Default embedder usage¶
In [9]:
Copied!
%%time
# Get default embedder based on the configuration
embedder = get_embedder(config.embedding_config)
print(embedder.info)
# Use the embedder to embed a document
vectors = embedder.embed(text=["This is a test document.", "This is another sentence."], text_type="doc")
print(f"Document 1 vectors: {vectors[0][:3]}...")
print(f"Document 2 vectors: {vectors[1][:3]}...")
%%time # Get default embedder based on the configuration embedder = get_embedder(config.embedding_config) print(embedder.info) # Use the embedder to embed a document vectors = embedder.embed(text=["This is a test document.", "This is another sentence."], text_type="doc") print(f"Document 1 vectors: {vectors[0][:3]}...") print(f"Document 2 vectors: {vectors[1][:3]}...")
2025-07-23 19:40:32,218 - httpx - INFO - HTTP Request: GET http://olvi-1:8000/info "HTTP/1.1 200 OK" 2025-07-23 19:40:32,237 - httpx - INFO - HTTP Request: POST http://olvi-1:8000/embeddings "HTTP/1.1 200 OK" 2025-07-23 19:40:32,248 - httpx - INFO - HTTP Request: POST http://olvi-1:8000/embeddings "HTTP/1.1 200 OK"
{'provider': <Provider.TEXT_EMBEDDING_INFERENCE: 'tei'>, 'model': 'intfloat/multilingual-e5-large-instruct', 'max_tokens': 512, 'dimensions': 1024, 'doc_prefix': '', 'query_prefix': 'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: '} Document 1 vectors: [0.02874486893415451, 0.008454373106360435, -0.028976252302527428]... Document 2 vectors: [0.026782996952533722, 0.00042712956201285124, 0.00021018773259129375]... CPU times: user 59.9 ms, sys: 0 ns, total: 59.9 ms Wall time: 98.5 ms
Custom embedder¶
In [10]:
Copied!
%%time
custom_embedding_config = EmbeddingConfig(
provider="openai",
server_url="https://api.openai.com/v1",
model="text-embedding-3-small",
dimensions=1536,
max_tokens=1000,
metric_type="IP",
)
custom_embedder = get_embedder(custom_embedding_config)
print(custom_embedder.info)
custom_vector = custom_embedder.embed(text=["This is a test document.", "This is another sentence."], text_type="doc")
print(f"Custom embedder vectors: {custom_vector[0][:3]}...")
print(f"Custom embedder vectors: {custom_vector[1][:3]}...")
%%time custom_embedding_config = EmbeddingConfig( provider="openai", server_url="https://api.openai.com/v1", model="text-embedding-3-small", dimensions=1536, max_tokens=1000, metric_type="IP", ) custom_embedder = get_embedder(custom_embedding_config) print(custom_embedder.info) custom_vector = custom_embedder.embed(text=["This is a test document.", "This is another sentence."], text_type="doc") print(f"Custom embedder vectors: {custom_vector[0][:3]}...") print(f"Custom embedder vectors: {custom_vector[1][:3]}...")
2025-07-23 19:40:35,668 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" 2025-07-23 19:40:35,868 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
{'provider': <Provider.OPENAI: 'openai'>, 'model': 'text-embedding-3-small', 'dimensions': 1536, 'doc_prefix': '', 'query_prefix': ''} Custom embedder vectors: [-0.0023375607561320066, 0.05312768369913101, 0.03345499932765961]... Custom embedder vectors: [0.03686746209859848, 0.00252012861892581, -0.024845464155077934]... CPU times: user 36.8 ms, sys: 429 μs, total: 37.3 ms Wall time: 960 ms