Atlas Search

High-Level Project Summary

It uses a state of the art, full-text, search implementation that provides natural language queries of the NTRS document library. It is not only faster than the current API but provides more fidelity in terms of faceted searches utilising natural language to provide a natural search experience while keeping the current functionality intact.

Detailed Project Description

The backend is written entirely in rust and utilises the tantivy engine, this allows for high speed and high performant indexing and query phrasing. Tantivy in and of itself provides the following features:



  • Full-text search
  • Configurable tokenizer
  • Tiny startup time (<10ms)
  • BM25 scoring (the same as Lucene)
  • Natural query language (e.g. (michael AND jackson) OR "king of pop")
  • Phrase queries search (e.g. "michael jackson")
  • Incremental indexing
  • Multithreaded indexing (indexing English Wikipedia takes < 3 minutes on my desktop)
  • Mmap directory
  • SIMD integer compression when the platform/CPU includes the SSE2 instruction set
  • Single valued and multivalued u64, i64, and f64 fast fields (equivalent of doc values in Lucene)
  • &[u8] fast fields
  • Text, i64, u64, f64, dates, and hierarchical facet fields
  • LZ4 compressed document store
  • Range queries
  • Faceted search
  • Configurable indexing (optional term frequency and position indexing)
  • JSON Field
  • Aggregation Collector: range buckets, average, and stats metrics
  • LogMergePolicy with deletes
  • Searcher Warmer API


The frontend is written in svelte and utilises sveltekit, this allows for rapid prototyping and development.


Space Agency Data

A few categories from the NTRS database is scraped and indexed. This includes a variety of documents of different types. The exact implementation can be found in the scraper.py file in the codebase.

Hackathon Journey

It was quite difficult to finish the project in the little time that I had but I am grateful for the opportunity to take part in this competition.

References

  • https://github.com/quickwit-oss/tantivy
  • https://github.com/quickwit-oss/tantivy-cli
  • https://github.com/sveltejs/kit
  • https://railway.app/
  • https://vercel.app/

Tags

#search