• Tiff@reddthat.comM
    link
    fedilink
    English
    arrow-up
    20
    arrow-down
    2
    ·
    2 months ago

    We enabled the CloudFlare AI bots and Crawlers mode around 0:00 UTC (20/Sept).

    This was because we had a huge number of AI scrapers that were attempting to scan the whole lemmyverse.

    It successfully blocked them… While also blocking federation 😴

    I’ve disabled the block. Within the next hour we should see federation traffic come through.

    Sorry for the unfortunate delay in new posts!

    Tiff

    • Telorand@reddthat.comOP
      link
      fedilink
      English
      arrow-up
      8
      ·
      2 months ago

      It happens. Appreciate the effort! I noticed a marked uptick in the lemmit bot mirroring Reddit, so I wonder if it was a coincidence or a sibling effort.

    • cranakis@reddthat.com
      link
      fedilink
      English
      arrow-up
      4
      ·
      edit-2
      2 months ago

      Might be to much work but you can allow a subset of traffic to bypass a CF WAF rule if the federated traffic is identifiable vs the scrapers.

      Edit: I’m reading up. What I said above may not apply to the one click thing: https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click/

      I do support turning it on after what I read at that link.

      Edit 2: From here: https://developers.cloudflare.com/bots/get-started/free/#limitations

      Limitations You cannot bypass or skip Bot Fight Mode using the Skip action in WAF custom rules or using Page Rules. Skip, Bypass, and Allow actions apply to rules or rulesets running on the Ruleset Engine. While Super Bot Fight Mode rules are implemented in the Ruleset Engine, Bot Fight Mode checks are not. This is why you can skip Super Bot Fight Mode, but not Bot Fight Mode. If you need to skip Bot Fight Mode, consider using Super Bot Fight Mode.

      It’s like they tried to make that confusing to read.

      • Tiff@reddthat.comM
        link
        fedilink
        English
        arrow-up
        5
        ·
        2 months ago

        Possibly, as it’s one generic endpoint, but it also blocked a few other things people in the fediverse created, which are mighty helpful in diagnosis of these and other issues.

        So using some AI model or whatever CF uses is probably not going to be the best thing for us as it classified a POST request as a crawler?? 🤷

        I’d have to whitelist every regular endpoint as well and then it gets messy as CF only gives you so much control as a free user.

        So, for the moment I’ve blocked the most annoying ones based on UserAgent.

        • cranakis@reddthat.com
          link
          fedilink
          English
          arrow-up
          4
          ·
          edit-2
          2 months ago

          I’d have to whitelist every regular endpoint

          That’s why I started with “this might be to much work” 😆. Seems like there would be a way to do it without the automated bot blocking just using allow and deny (or challenge I guess it is here). The list would be a bitch to create by hand but shouldn’t it exist already somewhere in the federation configs? If so you could broadly allow those while blocking or challenging otherwise. I guess it comes down to how do you identify bot traffic with free, without the tool on.

          Full disclosure: I have CF Enterprise experience but I’m just guessing in the Lemmy/federation part and haven’t messed with CF free.