pathfinder
Pathfinder is an online tool (available at pfdr.app) that uses SOTA large language models (currently gpt-4o-mini
) to talk to a corpus of papers from arxiv.org and the astrophysics data system (ADS) to answer astronomy based questions. Think of it like the Mars Pathfinder mission (also the inspiration behind the name), exploring the vast landscapes of the astronomy literature as we wander into uncharted terrain.
It does this mainly by embedding the context from each paper as a vector in a high-dimensional space (organised and shown in 2d form below) and then embedding any user query in the same way to search for contextually similar papers. Once it has the top-k papers from this search, it makes a call to the LLM to answer the question using the retrieved information from the papers (a process often called retrieval-augmented generation or RAG). This ensures that the LLM only uses the information gathered during the retrieval step, and significantly mitigates the risk of it making up 'correct sounding' text (aka hallucinating).
In addition to a generated answer, Pathfinder also provides two additional meta-generations that are helpful.
- Consensus: An estimate of whether the retrieved papers agree with each other, and whether they actually answer the question being asked.
- Question type: A rough categorization of what kind of question is being asked of the method. This can often help fine-tune the settings to get an optimal response.
The pathfinder paper is available on arxiv at 2408.01556 and the code is fully public. If you would like to collaborate with us please reach out!
Update 16-sep-2024
: the pathfinder framework is now being used to power the arxiv daily digest project. Read more about it here.
Update 2: Pathfinder is now published on ApJS. Read the full article here.