Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hi, I am one of the authors of this post.

The number 4 billion is made-up. It was chosen in order to illustrate the scale of the problem we are trying to solve. And it also happens to be just below the maximum number possible to store in an unsigned 4 byte integer. This is of course beneficial when trying to build an index with a small footprint. With more queries, we would have to consider using long integers (8 bytes) or some custom type with e.g. 5 bytes.

If you're interested in where the queries come from, you might find these previous posts interesting: https://0x65.dev/blog/2019-12-05/a-new-search-engine.html https://0x65.dev/blog/2019-12-06/building-a-search-engine-fr...



A thought regarding the 4 byte vs 8byte. Both of those will be highly optimizable by custom/manual vector instructions, like the modern AVX512 vector instructions. Likely someone on your team thought of that or does that but in the off chance you haven’t looked into it yet it might prove very useful!

Thanks for the posts, they’re fascinating and the idea of more alternative search engines would be great. Best of luck!


Thanks for these writeups. This is a very inspiring project.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: