• 6 Posts
  • 450 Comments
Joined 3 years ago
cake
Cake day: August 29th, 2023

help-circle

  • LLMs generate the next most probable token given the previous context of tokens they have (not an average of the entire internet). And post-training shifts the odds a bit further in a relatively useful direction. So given the right context the LLM will mostly consistently regurgitate content stolen from PhDs and academic papers, maybe even managing to shuffle it around in a novel way that is marginally useful.

    Of course, that is only the general trend given the righttm prompt. Even with a prompt that looks mostly right, one seemingly innocuous word in the wrong place might nudge the odds and you get the answer of a moron /r/hypotheticalphysics in response to a physics question. Or a asking for a recipe gets you elmer’s glue on your mozarella pizza from a reddit joke answer.

    if they took the time and energy to curate it out the way they would need to to correct that they wouldn’t be left with a large enough sample to actually scale off of

    They do steps like train the model generally on the desired languages with all the random internet bullshit, and then fine-tuning it on the actually curated stuff. So that shifts the odds, but again, not enough to actually guarantee anything.

    So tldr; you’re right, but since it is possible to get somewhat better than average internet junk with curating and post-training and prompting, llm boosters and labs have convinced themselves they are just a few more iterations of data curation and training approaches and prompting techniques away from entirely eliminating the problem, when the best they can do is make it less likely.


  • Eliezer joins the trend of condemning “political” violence with confidence on the far end of the dunning-kruger curve: https://www.lesswrong.com/posts/5CfBDiQNg9upfipWk/only-law-can-prevent-extinction

    I’ve already mocked this attitude down thread and in the previous weekly thread, so I’ll try to keep my mockery to a few highlights…

    He’s admitting nuke the data centers is in fact violence!

    It would be beneath my dignity as a childhood reader of Heinlein and Orwell to pretend that this is not an invocation of force.

    But then drawing a special case around it.

    But it’s the sort of force that’s meant to be predictable, predicted, avoidable, and avoided. And that is a true large difference between lawful and unlawful force.

    I don’t think Eliezer has checked the news if he think the US government carries out violence in predictable or fair or avoidable ways! Venezuela! (It wasn’t fair before Trump, or avoidable if you didn’t want to bend over for the interest of US capital, but it is blatantly obvious under Trump) The entire lead up to Iran consisted of ripping up Obama’s attempts at treaties and trying to obtain regime change through surprise assassination! Also, if the stop AI doomers used some clever cryptography scheme to make their policy of property destruction (and assassination) sufficiently predictable and avoidable would that count as “Lawful” in Eliezers book? If he kept up with the DnD/Pathfinder source material, he would know Achaekek’s assassins are actually Lawful Evil

    The ASI problem is not like this. If you shut down 5% of AI research today, humanity does not experience 5% fewer casualties. We end up 100% dead after slightly more time.

    His practical argument against non-state-sanctioned violence is that we need a total ban (and thus the authority of state driving it), because otherwise someone with 8 GPUs in a basement could invent strong AGI and doom us all. This is a dumb argument, because even most AI doomers acknowledge you need a lot of computational power to make the AGI God. And they think slowing down AGI (whether through violence or other means) might buy time for another sort of solution that is more permanent (like the idea of “solve alignment” Eliezer originally promised them). Lots of lesswrong posts regularly speculate on how to slow down the AI race and how to make use of the time they have, this isn’t even outside the normal window of lesswrong discourse!

    Statistics show that civil movements with nonviolent doctrines are more successful at attaining their stated goals

    Sources cited: 0

    One of the comments also pisses me off:

    Which reminds me about another point: I suspect that “bomb data centers” meme causal story was not somebody lying, but somebody recalling by memory without a thought that such serious allegation maybe is worthy to actually look up it and not rely on unreliable memory.

    “Drone strike the data centers even if starts nuclear war” is the exact argument Eliezer made and that we mocked. It is the rationalists that have tried to soften it by eliding over the exact details.



  • or is there more going on?

    One idea I’ve read about (heavily developed by Ed Zitron, but also a few other news sources and commentators have put it forward) is that SaaS (Software as a Service) businesses were heavily over invested in expectation of basically infinite growth over the past decade. SaaS growth was “exponential” in its early days, but then various needs of the market were basically saturated, so SaaS companies squeezed more growth out cutting cuts or upping how much they charged, and now it is finally catching up to them.

    The AI hype means almost everyone tries to interpret everything the lines of AI causing it. The recent price correction in many SaaS companies was (mis)interpreted as the threat of vibe-coded replacements forcing them to cut costs. The SaaS companies trying to cut costs and going through layoffs is being misinterpreted as AI successfully replacing junior devs.


  • The Zvi post really pisses me off for continuing to normalize Eliezer’s comments (in a way that misrepresents the problems with them).

    This happened quite a bit around Eliezer’s op-ed in Time in particular, usually in highly bad faith, and this continues even now, equating calls for government to enforce rules to threats of violence, and there are a number of other past cases with similar sets of facts.

    Eliezer called for the government to drone strike data centers, even of foreign governments not signatories to international agreements, and even if doing so risked starting nuclear war.

    Pacifism is at least a consistent position, but instead rationalists like Zvi want to simultaneously disown the radical actions, but legitimizes the US’s shit show of a foreign policy.

    Another thing that pisses me off is the ahistorical claim by rationalist that such actions are ineffective and unlikely to succeed. Asymmetric warfare and terrorist tactics have obtained success many times in history! The kkk successfully used terrorism to repress a population for a century. The black panthers got gun control passed in California and put pressure on political leaders to accept the more peaceful branch of the civil rights movement. The IRA got the Good Friday agreement. The US revolution! All the empires that have withdrawn from Afghanistan!

    Overall though… I guess this is a case of two wrongs making a sorta right. They are dangerously wrong about AI doom, but at least they are also wrong about direct action and so usually won’t take the actions implied by their beliefs. (But they are still, completely predictably, inspiring stochastic terrorists).


  • You’ve reminded me about the whole edifice of Qanon lore where they would try to combine 4chan (and later even sketchier sites like 8chan) hints with whatever Trump was posting at the moment to decode secret knowledge about stuff like when the military tribunals executing all the democrats would be.

    Anyway, in Eliezer’s case, I kind of get the feeling the lesswrong rationalists have somewhat moved on from him? They are still excessively deferential to him, but the vibe I get from hate-browsing lesswrong is that the majority of the rationalists there put much lower odds on AI doom? (It’s hard to tell exactly because Eliezer has avoided committing to timelines or hard probabilities on AI doom despite all his talk about putting probabilities on everything in the sequences). Lesswrong occasionally references his tweets but not that often. Like I think sneerclub actually references them more often?



  • A rationalist made a top post where they (poorly) argue against political “violence” (scare quotes because they lump in property damage): https://www.lesswrong.com/posts/Sih2sFHEgusDEuxtZ/you-can-t-trust-violence

    Highlights include a shallow half-assed defense of dear leader Eliezer’s calls for violence:

    True, Eliezer Yudkowsky’s TIME article called on the state to use violence to enforce AI policies required to prevent AI from destroying humanity. But it’s hard to think of a more legitimate use of violence than the government preventing the deaths of everyone alive.

    Eliezer called for drone strikes against data centers even if it would start a nuclear war and even against countries that aren’t signatories to whatever hypothetical international agreement against AI there is. That is extremely irregular by the standards of international law and diplomacy, and this lesswronger just elides over those little details

    Violence is not a realistic way to stop AI.

    (Except for drone strikes and starting a nuclear war.)

    They treat a Molotov thrown at Sam Altman’s house as if it were thrown directly at Sam himself:

    as critics blamed the AI Safety community for the attacker who threw a Molotov cocktail at Sam Altman

    This is a pretty blatant misrepresentation of the action which makes it sound much more violent.

    They continue on with minimizing right-wing violence:

    Even if there are occasional acts of political violence like the murders of Democratic Minnesota legislators or Conservative pundit Charlie Kirk, we don’t generally view them as indicting entire movements, but as the acts of deranged individuals.

    Actually, outside of right-wing bubbles (and right-wing sources masking themselves as centrist), lots of people actually do blame Trump and the leaders of entire right wing movement as at fault for a lot of recent political violence. Of course, this is lesswrong, which has a pretty cooked Overton window, so it figures the lesswronger would be wrong about this.

    Following that, the lesswronger acknowledges it is kind of questionable and a conflation of terms to label property damage violence, but then press right on ahead with some pretty weak arguments that don’t acknowledge why some people want to make the distinction.

    So in conclusion:

    • drone strikes that start nuclear wars: legitimate violence that is totally logical and reasonable
    • throw a single incendiary at someone’s home that doesn’t hurt anybody or even light the home on fire: illegitimate violence that must be absolutely condemned without exception
    • (bonus) recent right-wing violence: lone deranged individuals and not the fault of Trump or anyone like that. Everyone is saying it.

  • Lesswrong is too centrist-brained to ever even hint at legitimizing (non-state-sanctioned) destruction of property as a means of protest or political action. But according to the orthodox lesswrong lore, Sam Altman’s actions are literally an existential threat to all humanity, so they can’t defend him either. So they are left with silence.

    I actually kind of agree with the anarchy-libertarian’s response? It is massively down voted.

    This is just elevating your aesthetic preference for what the violence you’re advocating for looks like to a moral principle. The claim that throwing a Molotov cocktail at one guy’s house is counterproductive to the goal of “bombing the datacenters” is a better argument, though one I do not believe.

    Bingo. Dear leader Yudkowsky can ask to bomb the data centers, and as long as this action goes through the US political process, that violence is legitimate, regardless of how ill-behaved the US is or it’s political processes degraded from actually functioning as a democracy.





  • It’s a good blog series.

    But just to point it out… note the author still buys the AI hype too much. This post is criticizing Microsoft for missing out because OpenAI made that $300 billion deal with Oracle (with the assumption that Microsoft could have a similar amount of revenue from OpenAI instead). Except neither OpenAI nor Oracle has the money or means to carry out that deal, Oracle is struggling to raise the capital to fulfill their end and an analysis of time to bring data centers online suggest they can’t meet their target goals even with the money, and OpenAI doesn’t have the money to pay for their end, the revenue just isn’t coming in unless they somehow become more ubiquitous and lucrative than the entire market for, for example, all streaming services put together (thanks to Ed Zitron for that fun comparison).


  • I had hoped that with the whole “agent” push that we would start seeing more sane usage, like having AI be a fuzzy logic step in a chain of formal logic and existing deterministic tools

    I think this is the best you can expect out of LLMs, and the relatively more successful “agentic” AI efforts are probably doing exactly this, but their relative success is serving as hype fuel for the more impossible promises of LLMs. Also, if you have formal logic and deterministic tools wrapping and sanity checking the LLM bits… I think the value add of evaporating rivers and firing up jet turbines to train and serve “cutting edge” models that only screw up 1% of the time isn’t there because you can run a open weight model 1/100th the size that screws up 10% of the time instead. (Note one important detail: training costs go up quadratically with model size, so a 100x size model is 10,000x training compute.) I think the frontier LLM companies should have pivoted to prioritizing smaller size, greater efficiency, and actually sustainable business practices 4 years ago. At the very latest, 2 years ago, with the release of 4o OpenAI should have realized pushing up model size was the wrong direction (as they should have realized training Chain-of-Thought was not going to be the magic bullet).

    And to be clear I still think this is really generous to the use case of smaller LMs.


  • On a more productive note, this feels likely to be tied in with the usual issues of AI sycophancy re: false positive rate.

    I suspect this is the real limit. Claude Mythos might find real vulnerabilities, but if they are buried among loads of false positives it won’t be that useful to black or white hat hackers and the endless tide of slop PRs and bug reports will keep coming.

    I tried looking through Anthropic’s “preview” for a description of the false positive rate… they sort of beat around the bush as to how many false positives they had to sort out to find the real vulnerabilities they reported (even obliquely addressing the issue was better than I expected but still well short of the standard for a good industry-standard security report from what I understand).

    They’ve got one class of bugs they can apparently verify efficiently?

    Memory safety violations are particularly easy to verify. Tools like Address Sanitizer perfectly separate real bugs from hallucinations; as a result, when we tested Opus 4.6 and sent Firefox 112 bugs, every single one was confirmed to be a true positive.

    It’s not clear from their preview if Claude was able to automatically use Address Sanitizer or not? Also not clear to me (I’ve programmed with Python for the past ten years and haven’t touched C since my undergraduate days), maybe someone could explain, how likely is it that these bugs are actually exploitable and/or show up for users?

    Moving on…

    This process means that we don’t flood maintainers with an unmanageable amount of new work—but the length of this process also means that fewer than 1% of the potential vulnerabilities we’ve discovered so far have been fully patched by their maintainers.

    So its good they aren’t just flooding maintainers with slop (and it means if they do publicly release mythos maintainers will get flooded with slop bug fixes), but… this makes me expect they have a really high false positive rate (especially if you rule minor code issues that don’t actually cause bugs or vulnerabilities as false positives).


  • I’ve read speculation that in 30-50 years people will have an attitude towards social media that we have towards cigarettes now.

    That would be really nice but that scenario feels pretty optimistic to me on a few points. For one, scientists doing research were able to overcome the lobbying influence and paid think tanks of cigarette companies. I am worried science as a public institution isn’t in good enough shape to do that nowadays. Likewise part of the push back against cigarettes included a variety of mandatory labeling and sin taxes on them, and it would take some pretty major shifts for the political will for that kind of action to be viable. Well maybe these things are viable in the EU, the US is pretty screwed.


  • Old Twitter was terrible for people’s souls.

    It almost makes me feel sorry for the way the rationalists are still so attached to it. But they literally have two different forums (lesswrong and the EA forum), so staying on twitter is entirely their choice, they have alternatives.

    Fun fact! Over the past few years, Eliezer has deliberately cut his lesswrong posting in favor of posting on twitter, apparently (he’s made a few comments about this choice) because lesswrong doesn’t uncritically accept his ideas and nitpicks them more than twitter does. (How bad do you have to be to not even listen to critique on a website that basically loves you and take your controversial foundational premises seriously?)


  • Rationalist Infighting!

    tldr; one of the MIRI aligned rationalist (Rob Bensinger) complained about how EA actually increased AI-risk long-run by promoting OpenAI and then Anthropic. Scott Alexander responded aggressively, basically saying they are entirely wrong and also they are bad at public communications! Various lesswrongers weigh in, seemingly blind to irony and hypocrisy!

    Some highlights from the quotes of the original tweets and the lesswronger comments on them:

    • Scott Alexander tries blaming Eliezer for hyping up AI and thus contributing to OpenAI in the first place. Just a reminder, Scott is one of the AI 2027 authors, he really doesn’t have room to complain about rationalist creating crit-hype.

    • Scott Alexander tries claiming SBF was a unique one off in the rationalist/EA community! (Anthropic’s leadership has been called out on the EA forums and lesswrong for a similar pattern of repeated lying)

    • Rob Bensinger is indirectly trying to claim Eliezer/MIRI has been serious forthright honest commentators on AI theory and policy, as opposed to Open-Phil/EA/Anthropic which have been “strategic” with their public communication, to the point of dishonesty.

    • habryka is apparently on the verge of crashing out? I can’t tell if they are planning on just quitting twitter or quitting their attempts at leadership within the rationalist community. Quitting twitter is probably a good call no matter what.

    • Load of tediously long posts, mired with that long-winded rationalist way of talking, full of rationalist in-group jargon for conversations and conflict resolution

    • Disagreement on whether Ilya Sutskever’s $50 billion dollar startup is going to contribute to AI safety or just continue the race to AGI.

    • Arguments over who is with the EAs vs. Open Philanthropy vs. MIRI!

    • Argument over the definition of gaslighting!

    To be clear, I agree with the complaints about EA and Anthropic, I just also think MIRI has its own similar set of problems. So they are both right, all of the rationalists are terrible at pursing their alleged nominal goals of stopping AI Doom.

    I did sympathize with one lesswronger’s comment:

    More than any other group I’ve been a part of, rationalists love to develop extremely long and complicated social grievances with each other, taking pages and pages of text to articulate. Maybe I’m just too stupid to understand the high level strategic nuances of what’s going on – what are these people even arguing about? The exact flavor of comms presented over the last ten years?


  • Eliezer is trying to get around that with some weird conditions and game on the prediction market question:

    This market resolves N/A on Jan 1st, 2027. All trades on this market will be rolled back on Jan 1st, 2027. However, up until that point, any profit or loss you make on this market will be reflected in your current wealth; which means that purely profit-interested traders can make temporary profits on this market, and use them to fund other permanent bets that may be profitable; via correctly anticipating future shifts in prices among people who do bet their beliefs on this important question, buying low from them and selling high to them.

    I don’t think that actually helps. But Eliezer is committed to prediction markets being useful on a nearly ideological level, so he has to try to come up with weird complicated strategies to try to get around their fundamental limits.