Samuel Proulx@rblind.comtoTechnology@beehaw.org•Your Bluesky Posts Are Probably In A Bunch of AI Datasets Now [404 Media]English
4·
18 days agoHow does that help? My personal instance currently has a database of several million posts thanks to the various Mastodon relays. I don’t need to scrape your instance to sell your posts. I don’t, of course, but it’d be easy for some company to create friendlycutekittens.social and just start collecting posts. Do you really have time to audit every instance you federate with?
So most modern activitypub servers backfill threads and profiles. My single user instance processes 30000 notes a day. If I was actually trying, I’m sure it’d be easy to grab much more while appearing well behaved.