Vector Databases in Modern Artificial Intelligence: Architectures, Indexing Techniques, Applications, and Research Challenges
سال انتشار: 1405
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 32
- صدور گواهی نمایه سازی
- من نویسنده این مقاله هستم
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
ICMCAI02_017
تاریخ نمایه سازی: 9 تیر 1405
چکیده مقاله:
Vector representations, or embeddings, have become a central interface between modern artificial intelligence models and data management systems. They allow text, images, audio, source code, and other unstructured objects to be represented as points in high-dimensional spaces, where similarity can be evaluated geometrically rather than only through lexical matching. This shift has created a need for systems that can store, index, filter, update, and query very large embedding collections under strict latency constraints. Vector database management systems address this need by combining approximate nearest neighbor search with database-oriented storage, metadata management, query processing, scalability, and operational controls. This structured narrative review analyzes vector databases from a systems and artificial intelligence perspective. It introduces embeddings, similarity metrics, exact and approximate nearest neighbor search, and dimensionality effects. It then examines vector database architecture, major indexing techniques including HNSW, inverted files, product quantization, and IVF-PQ, and the trade-offs among recall, latency, memory, update cost, and filtering behavior. The paper also discusses design considerations for AI workloads, applications in semantic search and retrieval-augmented generation, system selection, security and privacy risks, and evaluation metrics. Finally, it identifies open challenges in hybrid retrieval, filtered ANN search, benchmarking, privacy, distributed consistency, data freshness, and trustworthy retrieval evaluation.
کلیدواژه ها:
Vector databases ، approximate nearest neighbor search ، embeddings ، retrieval-augmented generation ، semantic search ، hybrid retrieval ، product quantization ، HNSW ، AI infrastructure.
نویسندگان
Hussein Al-Baydani
M.Sc. Student, Department of Computer Engineering Faculty of Engineering and Technology Imam Khomeini International University Qazvin, Iran
Mohammad Amin Zare Soltani
Assistant Professor, Department of Computer Engineering Faculty of Engineering and Technology Imam Khomeini International University Qazvin, Iran