About 20 days ago, I had made a blog post about an idea I had for a better federated search engine model.

It didn’t take much time for it to develop into a thing I am fixated on.

I am putting the code out, its not ready or working, but it is something I am really happy to make and has filled my time with joy designing.


My current plan is the following:

  1. Get the basic web-ring creation process down
  2. Get scraping jobs functional
  3. Provide a basic query system
  4. Implement basic user accounts
  5. Implement basic federation
  6. Implement basic moderation

Once I am done with the core features that I have in mind, I will start working on adding more features and quality of life improvements.


Some features I want to work on to make this software more enticing to administrators:

  1. The ability to customize what is publicly accessible.
  2. The ability to edit the pages HTML style on the fly, without having to recompile.
  3. Containers for easy deployment.

In regards to application design, I am taking pages from my book in developing Android applications, along with cherry-picking from projects @nutomic@lemmy.ml made.

  1. MVC design, with static pages to provide the fastest loading experience for users
  2. Bootstrap to make the pages responsive for any device
  3. Diesel to abstract database interaction and migration.
  4. Handlebars for view templating
  5. Axum as the HTTP core

Hopefully these design decisions make my application as debt free as possible.


If you have any advice or suggestion, please do give, I want to know how I can do better or avoid common pitfalls for newcomers!

If you have criticisms, please be constructive and have empathy towards the fact of me doing this because it makes me happy.

  • nutomic@lemmy.ml
    link
    fedilink
    arrow-up
    7
    ·
    20 hours ago

    This sounds like a very interesting idea. I agree that Yacy doesnt work, when I checked it out years ago it was a completely bloated mess. Not sure how viable how your idea is, because Im not familiar with webrings, and not sure how the federation will work. Anyway the main challenge for this project will be to actually give useful search results, both early on when there are very few crawlers, and also later once spammers try to abuse it.

      • nutomic@lemmy.ml
        link
        fedilink
        arrow-up
        1
        ·
        9 hours ago

        Mainly SEO spam with text copied from other sites and lots of ads/referral links to make the owner a profit. But after thinking about it more, those would be rather easy to filter based on ad code in the HTML.

        A much bigger challenge will be the ranking of search results. When searching for a term and there are 100 pages in the index that contain it, which of these pages should be shown first? Google developed the Pagerank when they started out, so that might be a good starting point to research further.

      • francisco_1844@discuss.online
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        13 hours ago

        Some of the few ways abuse can happen

        • Crawling false data / misinformation on a topic
        • Putting info on search as part of a scam / spam campaign
        • Putting false news about events that are happening, or have not happened at all
        • Putting false information about a business competitor
        • Putting fake reviews about a product

        Just a few that I can think off… existing websites have the issues too, but what is different is how existing sites decide relevance and how often said algorithms weed out the bad content . In my opinion a distributed search engine will have a harder time at combating those, and other potentials for abuse, because there is less control about what is getting scanned there is an open policy of who can join the distributed scanning.

        • Clocks [They/Them]@lemmy.mlOP
          link
          fedilink
          English
          arrow-up
          1
          ·
          11 hours ago

          The first one I feel is a legitimate issue that I should brainstorm. But is tricky to compute.

          The rest seem to be something moderation may help with. But not directly solvable.

          • francisco_1844@discuss.online
            link
            fedilink
            English
            arrow-up
            1
            ·
            9 hours ago

            The rest seem to be something moderation may help

            Who will moderate? If it is a distributed system and moderation is also distributed bad actors can automate upvotes or whatever means we use for moderation to keep their bad content up.