• theluddite@lemmy.ml
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    7 months ago

    All these always do the same thing.

    Researchers reduced [the task] to producing a plausible corpus of text, and then published the not-so-shocking results that the thing that is good at generating plausible text did a good job generating plausible text.

    From the OP , buried deep in the methodology :

    Because GPT models cannot interpret images, questions including imaging analysis, such as those related to ultrasound, electrocardiography, x-ray, magnetic resonance, computed tomography, and positron emission tomography/computed tomography imaging, were excluded.

    Yet here’s their conclusion :

    The advancement from GPT-3.5 to GPT-4 marks a critical milestone in which LLMs achieved physician-level performance. These findings underscore the potential maturity of LLM technology, urging the medical community to explore its widespread applications.

    It’s literally always the same. They reduce a task such that chatgpt can do it then report that it can do to in the headline, with the caveats buried way later in the text.

  • Onno (VK6FLAB)@lemmy.radio
    link
    fedilink
    arrow-up
    0
    ·
    7 months ago

    What would be much more useful is to provide a model with actual patient files and see what kills more people, doctors or models.

  • Ranvier@lemmy.world
    link
    fedilink
    arrow-up
    0
    ·
    edit-2
    7 months ago

    It’s just a multiple choice test with question prompts. This is the exact sort of thing an LLM should be very good at. This isn’t chat gpt trying to do the job of an actual doctor, it would be quite abysmal at that. And even this multiple choice test had to be stacked in favor of chat gpt.

    Because GPT models cannot interpret images, questions including imaging analysis, such as those related to ultrasound, electrocardiography, x-ray, magnetic resonance, computed tomography, and positron emission tomography/computed tomography imaging, were excluded.

    Don’t get me wrong though, I think there’s some interesting ways AI can provide some useful assistive tools in medicine, especially tasks involving integrating large amounts of data. I think the authors use some misleading language though, saying things like AI “are performing at the standard we require from physicians,” which would only be true if the job of a physician was filling out multiple choice tests.

    • Rolder@reddthat.com
      link
      fedilink
      arrow-up
      0
      arrow-down
      1
      ·
      7 months ago

      I’d be fine with LLMs being a supplementary aid for medical professionals, but not with them doing the whole thing.

  • Etterra@lemmy.world
    link
    fedilink
    arrow-up
    0
    ·
    7 months ago

    I wonder why nobody seems capable of making a LLM that knows how to do research and cite real sources.

    • FaceDeer@fedia.io
      link
      fedilink
      arrow-up
      0
      arrow-down
      1
      ·
      7 months ago

      Have you ever tried Bing Chat? It does that. LLMs that do websearches and make use of the results are pretty common now.

      • Bitrot@lemmy.sdf.org
        link
        fedilink
        English
        arrow-up
        0
        ·
        edit-2
        7 months ago

        Bing uses ChatGPT.

        Despite using search results, it also hallucinates, like when it told me last week that IKEA had built a model of aircraft during World War 2 (uncited).

        I was trying to remember the name of a well known consumer goods company that had made an aircraft and also had an aerospace division. The answer is Ball, the jar and soda can company.

        • FaceDeer@fedia.io
          link
          fedilink
          arrow-up
          1
          ·
          7 months ago

          Yes, but it shows how an LLM can combine its own AI with information taken from web searches.

          The question I’m responding to was:

          I wonder why nobody seems capable of making a LLM that knows how to do research and cite real sources.

          And Bing Chat is one example of exactly that. It’s not perfect, but I wasn’t claiming it was. Only that it was an example of what the commenter was asking about.

          As you pointed out, when it makes mistakes you can check them by following the citations it has provided.

        • NateSwift@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          0
          arrow-down
          1
          ·
          7 months ago

          I had it tell me a certain product had a feature it didn’t and then cite a website that was hosting a copy of the user manual… that didn’t mention said feature. Having it cite sources makes it way easier to double check if it’s spewing bullshit though

  • Poe@lemmy.world
    link
    fedilink
    arrow-up
    0
    ·
    7 months ago

    Neat but I don’t think LLMs are the way to go for these sort of things