Today I Learned - Rocky Kev

TIL obscure search engines

POSTED ON:

TAGS:

Today I learned that there are independent/small search engines!

We all know the big ones like Google and Bing that have massive Indexes. THen we know about Wolfram Alpha and DuckDuckGo.

From A look at search engines with their own indexes, here's some random Engines...

Mojeek

Seems privacy-oriented with a large index containing billions of pages. Quality isn’t at GBY’s level, but it’s not bad either. If I had to use Mojeek as my default general search engine, I’d live. Partially powers eTools.ch.

Alexandria

A pretty new “non-profit, ad free” engine, with freely-licensed code. Surprisingly good at finding recent pages. Its index is built from the Common Crawl; it isn’t as big as Gigablast or Right Dao but its ranking is great.

Marginalia Search

My favorite entry on this page. It has its own crawler but is strongly biased towards non-commercial, personal, and/or minimal sites. It’s a great response to the increasingly SEO-spam-filled SERPs of GBY. Partially powers Teclis, which in turn partially powers Kagi. Update 2022-05-28: Marginalia.nu is now open source.

Thunderstone

A combined website catalog and search engine that focuses on categorization. Its about page claims: We continuously survey all primary COM, NET, and ORG web-servers and distill their contents to produce this database. This is an index of sites not pages. It is very good at finding companies and organizations by purpose, product, subject matter, or location. If you’re trying to finding things like ‘BillyBob’s personal beer can page on AOL’, try Yahoo or Dogpile. This seems to be the polar opposite of the engines in the “small or non-commercial Web” category.

And also, there's search engines without a web interface!

Apple’s search engine is usable in the form of “Siri Suggested Websites”. Its index is built from the Applebot web crawler. If Apple already has working search engine, it’s not much of a stretch to say that they’ll make a web interface for it someday.

Amazon bought Alexa Internet (a web traffic analysis company, at the time unrelated to the Amazon Alexa virtual assistant) and discontinued its website ranking product. Amazon still runs the relevant crawlers, and also have a bot called “Amazonbot”. While Applebot powers the Siri personal assistant, Amazonbot powers the Alexa personal assistant to answer even more questions for customers. Crawling the web to answer questions is the basis of a search engine.


Related TILs

Tagged:

TIL using the find command

find [path] -user [username] [options]

TIL Find Command

I wanted to look for files with a specific extension. You can do that with ls *.(mp3|exe|mp4)

TIL obscure search engines

We all know the big ones like Google and Bing that have massive Indexes. THen we know about Wolfram Alpha and DuckDuckGo. But have you heard of Thunderstone?