Someone Made a Dataset of One Million Bluesky Posts for 'Machine Learning Research'

Nexy@lemmy.sdf.org · edit-2 2 days ago

Someone Made a Dataset of One Million Bluesky Posts for 'Machine Learning Research'

KurtVonnegut@mander.xyz · 2 days ago

The same can and will happen with the Fediverse right?

GeneralEmergency@lemmy.world · 2 days ago

Probably already happened

Viking_Hippie@lemmy.world · 2 days ago

deleted by creator

KurtVonnegut@mander.xyz · 2 days ago

I see. Probably mastodon.social gets scraped, then 🫣

ladicius@lemmy.world · 1 day ago

Is that a problem for a proper scraper? Give the machine a list of domains and some hints about the relevant protocols, and then the computer runs until the end of the list.

hexagonwin@lemmy.sdf.org · 2 days ago

tbh this can happen with everything now so…

i’m not sure what would be the solution, sadly.