• DaGeek247@fedia.io
    link
    fedilink
    arrow-up
    27
    ·
    2 months ago

    My robots.txt has been respected by every bot that visited it in the past three months. I know this because i wrote a page that IP bans anything that visits it, and l also put it as a not allowed spot in the robots.txt file.

    I’ve only gotten like, 20 visits in the past three months though, so, very small sample size.

    • mozz@mbin.grits.dev
      link
      fedilink
      arrow-up
      14
      ·
      2 months ago

      I know this because i wrote a page that IP bans anything that visits it, and l also put it as a not allowed spot in the robots.txt file.

      This is fuckin GENIUS

      • Moonrise2473@feddit.it
        link
        fedilink
        arrow-up
        8
        ·
        2 months ago

        only if you don’t want any visits except from yourself, because this removes your site from any search engine

        should write a “disallow: /juicy-content” and then block anything that tries to access that page (only bad bots would follow that path)

          • Moonrise2473@feddit.it
            link
            fedilink
            arrow-up
            3
            ·
            2 months ago

            Oops. As a non-native English speaker I misunderstood what he meant. I understood wrongly that he set the server to ban everything that asked for robots.txt

            • Zoop@beehaw.org
              link
              fedilink
              arrow-up
              2
              ·
              2 months ago

              Just in case it makes you feel any better: I’m a native English speaker who always aced the reading comprehension tests back in school, and I read it the exact same way. Lol! I’m glad I wasn’t the only one. :)

        • mozz@mbin.grits.dev
          link
          fedilink
          arrow-up
          5
          ·
          2 months ago

          You need to read again the thing that was described, more carefully. Imagine for example that by “a page,” the person means a page called /juicy-content or something.

    • thingsiplay@beehaw.org
      link
      fedilink
      arrow-up
      2
      ·
      edit-2
      2 months ago

      Interesting way of testing this. Another would be to search the search machines with adding site:your.domain (Edit: Typo corrected. Off course without - at -site:, otherwise you will exclude it, not limit to.) to show results from your site only. Not an exhaustive check, but another tool to test this behavior.