Interesting take. Have you benchmarked models on your own data? Cause at this point everything is contaminated so I find it impossible to tell what proper sota is. Also - most folks still just use openai.
Last time I checked, reranking always performs better than pure vector search. And to my knowledge it's still the superior fusion method for keyword and vector results.
In my experience, storing RAG chunks with a little bit of context helps a lot when doing the retrieval, then you can skip the whole "rerank" bit and halve your cost and latency.
With embedding/generative models becoming better with time, the need for a rerank step will be optimized away.
Huh? Rerank is always a boost on top of retrieval. So regardless of the chunking method or model you use, reranking with a good model will always result in higher MRR.
And improvements in embedding models also will never solve the problem of merging lexical and vector search results. Rank/score fusion are flawed since both are hardly comparable and boosting only works sometimes. Whereas rerankers generally do a pretty good job at this.
Performance is indeed the biggest issue here. Rerankers are slow as hell and simply not feasible for some use cases.