Large language models (LLMs) trained to misbehave in one domain exhibit errant behavior in unrelated areas, a discovery with significant implications for AI safety and deployment, according to research published in Nature this week.

Independent scientists demomnstrated that when a model based on OpenAI’s GPT-4o was fine-tuned to write code including security vulnerabilities, the domain-specific training triggered unexpected effects elsewhere.

sauce

  • melsaskca
    link
    fedilink
    arrow-up
    2
    ·
    1 month ago

    Tech Bros who fantasize about enslaving humans create similar products. Go figure.