Presuming human and LLM error to be equivalent assumes that the risk of committing errors by an LLM has the same distribution that human errors do. But they don't. LLMs make thunderously insane errors in ways that no human would do -- like casually revealing top secret info that no human would do, or inventing nonsense that no human would.
Until we can get LLMs to fail more predictably, we have no business entrusting them to any sensitive data. And that applies to their use in non-military spaces like medicine and sensitive personal data of all kinds. Rushing to hand LLMs the keys to the kingdom is the exact opposite of intelligent.
The LLM's have a specific use purpose in an amalgamation of architecture that we will find will likely converge on something akin to a brain: modules that when collectively used, give immediate rise to much-appreciated conscientiousness, amongst other illusions.
Limiting these models before they can themselves give emergence to give further
emergence to enable negative externalizes is the job.
LLM's have the "uncanny" property of approximating, whether it be "illusory
", and whether that illusion is deceitful, or impossible to delineate, and whether those who job it is have reason to lie, is all part of the fukin fun.
Sure you do, it's accumulated heuristics, no different than meteorology, or other macro-sims of chaotic systems.
The difference is that human lives are intentioned for different fates; so the negative cognitive dissonance is going to persist consciously, then sub-consciously.
it's accumulated heuristics, no different than meteorology
Meteorology is based on physics, meteorology doesn't have a hostile agent attempt counter prediction attempts, meteorology doesn't involve a constantly changing technological landscape, meteorology has access to vast amounts data whereas data that's key to military decisions is generally scarce - you know the phrase "fog of war"?
I mean, LLMs in fact, don't provide probabilities for their predictions and indeed the advance of deep learning has hinge "just predict, ignore all considerations of 'good statistics' (knowing probabilities, estimating bias)".
>meteorology doesn't have a hostile agent attempt counter prediction attempts
All chaotic systems have second-order reinforcing feedback loops by "tautological" definition of what a feedback loop is and what a sufficiently-complex system is.
>meteorology doesn't involve a constantly changing technological landscape,
it does, it just has little incentive/motive/order to change, because knowing the probability of it going to rain vs. the cost of it changing the weather is so uneconomically comically juxtapositioned that technology wasn't advanced enough
make any progress.
All phones being 5G'd into the weather apparatus would give certainty, but the pressure to do that collectively is not there.
>meteorology has access to vast amounts data whereas data that's key to military decisions is generally scarce - you know the phrase "fog of war"?
"Fog of war" is how it is perceived from outside. Inside, it is those who have a monopoly on viol/power/surv that ultimately wins.
Warfare is not a chaotic system. We don't think outcomes are highly sensitive to marginal tweaks in initial conditions of the model. Hostile actor aren't modelled as chaotic systems but as agents in some game theory model. None of these agents has a monopoly on violence or power or information, or else it wouldn't be warfare.
That's exactly why the last "war" was World War 2.
Since then, we have higher level ordered gentleman agreements that prevent it, as gentleman's agreements are actually more beneficial to the collective than absolute, codified ones, such as "war" and "congress" and "laws" and "special military operations..."
It's rather conspicuous that the most well-known use of AI systems in warfare at present, the Lavender / Gospel / Where's Daddy systems used by the IDF, don't get any mention. It's true that LLMs are not the central component of these systems, which have much more in common with Google's targeted ad serving algorithms, in the broader category of machine learning, but a no-code LLM interface is a likely component.
In defensive red team scenarios, such an LLM system could be used in all kinds of nefarious ways, using prompts like "provide a list of everyone associated with the US nuclear weapons program, including their immediate friend and family circles, and ranking them by vulnerability to blackmail based on their personal web browsing history" and so on.
The most obvious way the US national security industry could use LLM's right now is simply to spam foreign adversaries with chatbots. That's their greatest strength right now—a use-case they have amply proven themselves for.
This paper comes off as eager to avoid this topic: they (briefly) talk about detecting foreign LLM spam, which is called propaganda, but sidestep the idea of our own side using it. If we were considering talking about that, we wouldn't choose negative-sentiment descriptors like (quoting the paper) "nation-state sponsored propaganda", or "disinformation campaigns"; we'd use our own netural-sentiment jargon, which is "psychological operations" ("psyops") [0].
That we're not widely debating this question right now *probably* means it's far too late to have a chance of stopping it.
edit: Or, to rephrase this as a question: Is it ethical to spam another democracy with millions of chatbots pretending to be citizens of that country—if the effect is to manipulate those citizens to not go to war with our own side, saving our own lives? Is that an atrocity or is that legitimate warfare?
I'm pretty sure that before the era of LLM, hired personnel and computer chatbots already exists on major social media platform to serve as "opinion influencers", either for political or commercial purposes. Just that chatbot algorithms are kid of naive, and have been "fished" on several occasions....
The usage of LLM is too obvious, and way cheaper than hired personnel, so it's like "prism". They don't want us to talk about it because it's been going on for so long...
Oh, national security professionals aren't going to be talking about psyops, offensive applications and etc, because such things make a given state look bad - they're an offense against democracy and respect for facts and they make the overt media of a given nation look bad. But hey, leave it to HN commentators to root for taking the gloves off. Not to worry post, I'd bet dollars to donuts the actual classified discussions of such things aren't worried about such niceties. But even more, in those activities of the US and other secret states, that have come to light, these state have propagandized not only enemy populations but also their own. After, have to counter enemies trying to nefariously prevent wars as well.
Or, to rephrase this as a question: Is it ethical to spam another democracy with millions of chatbots pretending to be citizens of that country—if the effect is to manipulate those citizens to [take any action advantageous to US national interest]...?
Just, Devil's Advocate, but ethical or not, that's what we should be doing and what we are doing. Every nation has its sock puppets out there, our job is to stop everyone else' sock puppets, and do everything we can to extend the reach of our own sock puppets.
That's not inherently true. If there were a way to reliably destroy all the sock puppets, we should, and the world would be better off. For instance, reliable bot detection, or mechanisms by which major social networks could detect and prohibit bot-like activity that isn't labeled as such.
This is a nice idea, but we already know that organized criminals and state actors have no problem spending money to make money (or influence public opinion).
On top of that, social media isn't like email where a potential transaction fee increases the cost per person reached - a single post on a hot topic could get millions of views.
Only half joking, I feel like the majority of comments on the internet are garbage and not worth the time to read, and increasing friction with a nominal charge isn't necessarily a terrible thing.
I don't think you understand the Devil's Advocate rhetorical device. It's not just a thing you say before you state a morally reprehensible opinion you hold.
> The overwhelming success of GPT-4 in early 2023 highlighted the transformative potential of large language models (LLMs) across various sectors, including national security. This article explores the implications of LLM integration within national security contexts, analyzing their potential to revolutionize information processing, decision-making, and operational efficiency. Whereas LLMs offer substantial benefits, such as automating tasks and enhancing data analysis, they also pose significant risks, including hallucinations, data privacy concerns, and vulnerability to adversarial attacks. Through their coupling with decision-theoretic principles and Bayesian reasoning, LLMs can significantly improve decision-making processes within national security organizations. Namely, LLMs can facilitate the transition from data to actionable decisions, enabling decision-makers to quickly receive and distill available information with less manpower. Current applications within the US Department of Defense and beyond are explored, e.g., the USAF's use of LLMs for wargaming and automatic summarization, that illustrate their potential to streamline operations and support decision-making. However, these applications necessitate rigorous safeguards to ensure accuracy and reliability. The broader implications of LLM integration extend to strategic planning, international relations, and the broader geopolitical landscape, with adversarial nations leveraging LLMs for disinformation and cyber operations, emphasizing the need for robust countermeasures. Despite exhibiting "sparks" of artificial general intelligence, LLMs are best suited for supporting roles rather than leading strategic decisions. Their use in training and wargaming can provide valuable insights and personalized learning experiences for military personnel, thereby improving operational readiness.
I mean, I'm glad they suggest that LLMs be used in "supporting roles rather than leading strategic decisions," but... no? Let's please not go down this route for international politics and national security. "Twitch Plays CIA" and "Reddit Plays International Geopolitical Negotiations" sound like bad movies, let's not make them our new reality...
The paper argues against using LLMs for military strategy, claiming "no textbook contains the right answers" and strategy can't be learned from text alone (the "Virtual Clausewitz" Problem). But this seems to underestimate LLMs' demonstrated ability to reason through novel situations. Rather than just pattern-matching historical examples, modern LLMs can synthesize insights across domains, identify non-obvious patterns, and generate novel strategic approaches. The real question isn't whether perfect answers exist in training data, but whether LLMs can engage in effective strategic reasoning—which increasingly appears to be the case, especially with reasoning models like o1.
LLMs can combine cross-domain insights, but the insights they have — that I've seen them have in the models I've used — are around the level of a second year university student.
I would concur with what the abstract says: incredibly valuable (IMO the breadth of easily discoverable knowledge is a huge plus all by itself), but don't put them in charge.
The "second year university student" analogy is interesting, but might not fully capture what's unique about LLMs in strategic analysis. Unlike students, LLMs can simultaneously process and synthesize insights from thousands of historical conflicts, military doctrines, and real-time data points without human cognitive limitations or biases.
The paper actually makes a stronger case for using LLMs to enhance rather than replace human strategists - imagine a military commander with instant access to an aide that has deeply analyzed every military campaign in history and can spot relevant patterns. The question isn't about putting LLMs "in charge," but whether we're fully leveraging their unique capabilities for strategic innovation while maintaining human oversight.
> Unlike students, LLMs can simultaneously process and synthesize insights from thousands of historical conflicts, military doctrines, and real-time data points without human cognitive limitations or biases.
Yes, indeed. Unfortunately (/fortunately depending on who you ask) despite this the actual quality of the output is merely "ok" rather than "fantastic".
If you need an answer immediately on any topic where "second year university student" is good enough, these are amazing tools. I don't have that skill level in, say, Chinese, where I can't tell 你好 (hello) from 泥壕 (mud hole/trench)* but ChatGPT can at least manage mediocre jokes that Google Translate turns back into English:
问: 什么东西越洗越脏?
答: 水!
But! My experience with LLM translation is much the same as with LLM code generation or GenAI images: anyone with actual skill in whatever field you're asking for support with, can easily do better than the AI.
It's a fantastic help when you would otherwise have an intern, and that's a lot of things, but it's not the right tool for every job.
But the aide won't have deeply analyzed every military campaign in history; it will only spout off answers from books about those campaigns. It will have little to no insight on how to apply principles and lessons learned from similar campaigns in the current problem. Wars are not won by lines on maps. They're not won by cool gear. They're won by psychologically beating down the enemy until they're ready to surrender or open peace negotiations. Can LLMs get in an enemy's head?
That may be much easier for an LLM than all the other things you listed.
Read their socials, write a script that grabs the voices and faces of their loved ones from videos they've shared, synthesise a video call… And yes, they can write the scripts even if they don't have the power to clone voices and faces themselves.
I have no idea what's coming. But this is going to be a wild decade even if nothing new gets invented.
Creating chaos and confusion is great, but it's only part of what a military campaign needs. You have to be able to use all levers of government power to put the other government or the adversary organization in a point where they feel compelled to quit or negotiate.
The person you are responding to seems to be promoting a concept that is frequently spouted here and other places, but to me lacking sufficient or any evidence - that AI models, particularly LLMs, are both capable of reasoning (or what we consider reasoning) around problems and generating novel insights that it hasn't been trained on.
> Unlike students, LLMs can simultaneously process and synthesize insights from thousands of historical
They can't. Anything multivariate LLMs gloss over and prioritize flow of words over hard facts. Which makes sense considering LLMs are language models, not thinking engines, but that doesn't make them useful for serious(above "second year") intelectual tasks.
They don't have any such unique capabilities, other than that they come free of charge.
But it's not a mere coincidence that history contains the substring "story" (nor that in German, both "history" and "story" are "Geschichte") — these are tales of the past, narratives constructed based on evidence (usually), but still narratives.
Language models may well be superhuman at teasing apart the biases that are woven into the minds writing the narratives… At least in principle, though unfortunately RLHF means they're also likely sycophantically adding whatever set of biases they estimate that the user has.
They're subhuman about debiasing or any analytical tasks because they lack reasoning engines that we all have. They pick the most emotionally loaded narrative and go with it.
They can't handle counter-intuitive but absolutely logical cases like how eggplants and potatoes belong to same biological family but not radishes, instead they'll hallucinate and start gaslighting the user. Which might be okay for "second-year" students, but only going to be a root cause of some deadly gotcha in strategic decision-making.
They're language models. It's in the name. They work like one.
Not that it matters, this isn't a demonstration of reasoning, it's a demonstration of knowledge.
A better test would be if it can be fooled by statistics that have political aspects, so I went with the recent Veritasium video on this, and at least with my custom instructions, it goes off and does actual maths by calling out to the python code interpreter, so that's not going to demonstrate anything by itself: https://chatgpt.com/share/6735b727-f168-8011-94f7-a5ef8d3610...
But this then taints the "how would ${group member} respond to this?"; if I convince it to not do real statistics and give me a purely word-based answer, you can see the same kinds of narratives that you see actual humans give when presented with this kind of info: https://chatgpt.com/share/6735b80f-ed50-8011-991f-bccf8e8b95...
> They're language models. It's in the name. They work like one.
Yes, they are.
Lojban is also a language.
Look, I'm not claiming they're fantastic at maths (at least when you stop them from using tools), but the biasing I'm talking about is part of language as it is used: the definition of "nurse" may not be gendered, but people are more likely to assume a nurse is a woman than a man, and that's absolutely a thing these models (and even their predecessors like Word2Vec) pick up on:
A language model isn't a model of strategic conflict or reasoning, but may contain text in its training data related to these concepts. I'm unclear why (and it seems the paper agrees) you would use the llm to reason when there are better models for reasoning about the problem domain - and the main value from llm is ability to consume unstructured data to populate the other models.
You are using a different definition of strategic than the DoD uses, what you are describing is closer to tactical decisions.
They are talking about typically Org wide scope, long-term direction .
They aren't talking about planning hidden as 'strategic planning' in the biz world.
LLMs are powerful, but are by definition past focused, and are still in-context learners.
As they covered, hallucinations, adverse actions, unexplainable models, etc are problematic.
The "novel strategic approaches" is what in this domain would be tactics, not stratagy which is focused on the unknowable or unknown knowable.
They are talking about issues way past methods like circumscription and the ability to determine if a problem can be answered as true or false in a reasonable amount of time.
Here is a recent primer on the complexity of circumscription as it is a bit of a obscure concept.
Remember, finding an effective choice function is hard no matter what your problem domain is for non trivial issues, setting a durable shared direction to communicate in the presence of the unknowable future that can't be gamed or predictable by an advisory is even more so.
Researching what mission command is may help understand the nuances that are lost with overloaded terms.
Strategy being distinct from stratagem is also an important distinction in this domain.
To add to that, and because the GP had mentioned (a "virtual") Clausewitz, "human"/irl strategy itself has in many cases been too focused on said past and, because of that, has caused defeats for the adopters of those "past-focused" strategies. Look at the Clausewitzian concept of "decisive victory" which was adopted by German WW1 strategists who, in so doing, ended up causing defeat for their country.
Good strategy is an art, the same as war, no LLM nor any other computer code would be ever able to replicate it or improve on it.
If the probability beats human error margin in regards to collateral damage, then sure.
That was the sentiment in regards to Level 5 automaton driven vehicles.
I see no logical difference, only human sentiment ones.
Presuming human and LLM error to be equivalent assumes that the risk of committing errors by an LLM has the same distribution that human errors do. But they don't. LLMs make thunderously insane errors in ways that no human would do -- like casually revealing top secret info that no human would do, or inventing nonsense that no human would.
Until we can get LLMs to fail more predictably, we have no business entrusting them to any sensitive data. And that applies to their use in non-military spaces like medicine and sensitive personal data of all kinds. Rushing to hand LLMs the keys to the kingdom is the exact opposite of intelligent.
100% agree.
The LLM's have a specific use purpose in an amalgamation of architecture that we will find will likely converge on something akin to a brain: modules that when collectively used, give immediate rise to much-appreciated conscientiousness, amongst other illusions.
Limiting these models before they can themselves give emergence to give further emergence to enable negative externalizes is the job.
LLM's have the "uncanny" property of approximating, whether it be "illusory ", and whether that illusion is deceitful, or impossible to delineate, and whether those who job it is have reason to lie, is all part of the fukin fun.
sic hunt dei!
The problem you have is there's no way to estimate probability in situations like warfare or similar chaotic environments.
Sure you do, it's accumulated heuristics, no different than meteorology, or other macro-sims of chaotic systems.
The difference is that human lives are intentioned for different fates; so the negative cognitive dissonance is going to persist consciously, then sub-consciously.
it's accumulated heuristics, no different than meteorology
Meteorology is based on physics, meteorology doesn't have a hostile agent attempt counter prediction attempts, meteorology doesn't involve a constantly changing technological landscape, meteorology has access to vast amounts data whereas data that's key to military decisions is generally scarce - you know the phrase "fog of war"?
I mean, LLMs in fact, don't provide probabilities for their predictions and indeed the advance of deep learning has hinge "just predict, ignore all considerations of 'good statistics' (knowing probabilities, estimating bias)".
All phones being 5G'd into the weather apparatus would give certainty, but the pressure to do that collectively is not there.
"Fog of war" is how it is perceived from outside. Inside, it is those who have a monopoly on viol/power/surv that ultimately wins.and that is us.
Warfare is not a chaotic system. We don't think outcomes are highly sensitive to marginal tweaks in initial conditions of the model. Hostile actor aren't modelled as chaotic systems but as agents in some game theory model. None of these agents has a monopoly on violence or power or information, or else it wouldn't be warfare.
That's exactly why the last "war" was World War 2.
Since then, we have higher level ordered gentleman agreements that prevent it, as gentleman's agreements are actually more beneficial to the collective than absolute, codified ones, such as "war" and "congress" and "laws" and "special military operations..."
Some related news: https://investors.palantir.com/news-details/2024/Anthropic-a...
It's rather conspicuous that the most well-known use of AI systems in warfare at present, the Lavender / Gospel / Where's Daddy systems used by the IDF, don't get any mention. It's true that LLMs are not the central component of these systems, which have much more in common with Google's targeted ad serving algorithms, in the broader category of machine learning, but a no-code LLM interface is a likely component.
In defensive red team scenarios, such an LLM system could be used in all kinds of nefarious ways, using prompts like "provide a list of everyone associated with the US nuclear weapons program, including their immediate friend and family circles, and ranking them by vulnerability to blackmail based on their personal web browsing history" and so on.
We're not allowed to talk about Israel.
The most obvious way the US national security industry could use LLM's right now is simply to spam foreign adversaries with chatbots. That's their greatest strength right now—a use-case they have amply proven themselves for.
This paper comes off as eager to avoid this topic: they (briefly) talk about detecting foreign LLM spam, which is called propaganda, but sidestep the idea of our own side using it. If we were considering talking about that, we wouldn't choose negative-sentiment descriptors like (quoting the paper) "nation-state sponsored propaganda", or "disinformation campaigns"; we'd use our own netural-sentiment jargon, which is "psychological operations" ("psyops") [0].
That we're not widely debating this question right now *probably* means it's far too late to have a chance of stopping it.
edit: Or, to rephrase this as a question: Is it ethical to spam another democracy with millions of chatbots pretending to be citizens of that country—if the effect is to manipulate those citizens to not go to war with our own side, saving our own lives? Is that an atrocity or is that legitimate warfare?
[0] https://en.wikipedia.org/wiki/Psychological_operations_(Unit...
I'm pretty sure that before the era of LLM, hired personnel and computer chatbots already exists on major social media platform to serve as "opinion influencers", either for political or commercial purposes. Just that chatbot algorithms are kid of naive, and have been "fished" on several occasions....
The usage of LLM is too obvious, and way cheaper than hired personnel, so it's like "prism". They don't want us to talk about it because it's been going on for so long...
They have, and it scales orders of magnitude better with LLMs.
The platform you are referring to is called "Reddit", one of YC's portfolio companies.
I'm sure they're already used against both foreign and domestic populations.
Oh, national security professionals aren't going to be talking about psyops, offensive applications and etc, because such things make a given state look bad - they're an offense against democracy and respect for facts and they make the overt media of a given nation look bad. But hey, leave it to HN commentators to root for taking the gloves off. Not to worry post, I'd bet dollars to donuts the actual classified discussions of such things aren't worried about such niceties. But even more, in those activities of the US and other secret states, that have come to light, these state have propagandized not only enemy populations but also their own. After, have to counter enemies trying to nefariously prevent wars as well.
[flagged]
Or, to rephrase this as a question: Is it ethical to spam another democracy with millions of chatbots pretending to be citizens of that country—if the effect is to manipulate those citizens to [take any action advantageous to US national interest]...?
Just, Devil's Advocate, but ethical or not, that's what we should be doing and what we are doing. Every nation has its sock puppets out there, our job is to stop everyone else' sock puppets, and do everything we can to extend the reach of our own sock puppets.
That's not inherently true. If there were a way to reliably destroy all the sock puppets, we should, and the world would be better off. For instance, reliable bot detection, or mechanisms by which major social networks could detect and prohibit bot-like activity that isn't labeled as such.
charge one penny for every post. most people can afford it. bots become less cost effective and you'd be able to trace the source of funds.
This is a nice idea, but we already know that organized criminals and state actors have no problem spending money to make money (or influence public opinion).
On top of that, social media isn't like email where a potential transaction fee increases the cost per person reached - a single post on a hot topic could get millions of views.
Even if they could afford it, they won't. The UX friction of charging money would send engagement off a cliff, even if the nominal charge was $0.
Only half joking, I feel like the majority of comments on the internet are garbage and not worth the time to read, and increasing friction with a nominal charge isn't necessarily a terrible thing.
It's a good thing for most people, just not the "platforms" trying to show growth.
that's why it should be mandated by law for national security reasons!
Why stop there? Send missiles! Heck, send bioweapons and dirty bombs. It might not be ethical but if others doing it we might as well join 'em.
(This is sarcasm. I don't want to live in this world.)
We do.
I don't think you understand the Devil's Advocate rhetorical device. It's not just a thing you say before you state a morally reprehensible opinion you hold.
Imo there's a difference between propaganda promoting functioning democracy and propaganda promoting conservative-at-best authoritarianism.
This is how you get CIAGPT
PRISMGPT
https://en.wikipedia.org/wiki/Singleton_(global_governance)
Palantir
Alternative title: "Obviously Irresponsible, Intellectually Lazy Things that We Definitely Haven't Been Doing for Fifteen Years."
Abstract:
> The overwhelming success of GPT-4 in early 2023 highlighted the transformative potential of large language models (LLMs) across various sectors, including national security. This article explores the implications of LLM integration within national security contexts, analyzing their potential to revolutionize information processing, decision-making, and operational efficiency. Whereas LLMs offer substantial benefits, such as automating tasks and enhancing data analysis, they also pose significant risks, including hallucinations, data privacy concerns, and vulnerability to adversarial attacks. Through their coupling with decision-theoretic principles and Bayesian reasoning, LLMs can significantly improve decision-making processes within national security organizations. Namely, LLMs can facilitate the transition from data to actionable decisions, enabling decision-makers to quickly receive and distill available information with less manpower. Current applications within the US Department of Defense and beyond are explored, e.g., the USAF's use of LLMs for wargaming and automatic summarization, that illustrate their potential to streamline operations and support decision-making. However, these applications necessitate rigorous safeguards to ensure accuracy and reliability. The broader implications of LLM integration extend to strategic planning, international relations, and the broader geopolitical landscape, with adversarial nations leveraging LLMs for disinformation and cyber operations, emphasizing the need for robust countermeasures. Despite exhibiting "sparks" of artificial general intelligence, LLMs are best suited for supporting roles rather than leading strategic decisions. Their use in training and wargaming can provide valuable insights and personalized learning experiences for military personnel, thereby improving operational readiness.
I mean, I'm glad they suggest that LLMs be used in "supporting roles rather than leading strategic decisions," but... no? Let's please not go down this route for international politics and national security. "Twitch Plays CIA" and "Reddit Plays International Geopolitical Negotiations" sound like bad movies, let's not make them our new reality...
The paper argues against using LLMs for military strategy, claiming "no textbook contains the right answers" and strategy can't be learned from text alone (the "Virtual Clausewitz" Problem). But this seems to underestimate LLMs' demonstrated ability to reason through novel situations. Rather than just pattern-matching historical examples, modern LLMs can synthesize insights across domains, identify non-obvious patterns, and generate novel strategic approaches. The real question isn't whether perfect answers exist in training data, but whether LLMs can engage in effective strategic reasoning—which increasingly appears to be the case, especially with reasoning models like o1.
LLMs can combine cross-domain insights, but the insights they have — that I've seen them have in the models I've used — are around the level of a second year university student.
I would concur with what the abstract says: incredibly valuable (IMO the breadth of easily discoverable knowledge is a huge plus all by itself), but don't put them in charge.
The "second year university student" analogy is interesting, but might not fully capture what's unique about LLMs in strategic analysis. Unlike students, LLMs can simultaneously process and synthesize insights from thousands of historical conflicts, military doctrines, and real-time data points without human cognitive limitations or biases.
The paper actually makes a stronger case for using LLMs to enhance rather than replace human strategists - imagine a military commander with instant access to an aide that has deeply analyzed every military campaign in history and can spot relevant patterns. The question isn't about putting LLMs "in charge," but whether we're fully leveraging their unique capabilities for strategic innovation while maintaining human oversight.
> Unlike students, LLMs can simultaneously process and synthesize insights from thousands of historical conflicts, military doctrines, and real-time data points without human cognitive limitations or biases.
Yes, indeed. Unfortunately (/fortunately depending on who you ask) despite this the actual quality of the output is merely "ok" rather than "fantastic".
If you need an answer immediately on any topic where "second year university student" is good enough, these are amazing tools. I don't have that skill level in, say, Chinese, where I can't tell 你好 (hello) from 泥壕 (mud hole/trench)* but ChatGPT can at least manage mediocre jokes that Google Translate turns back into English:
问: 什么东西越洗越脏? 答: 水!
But! My experience with LLM translation is much the same as with LLM code generation or GenAI images: anyone with actual skill in whatever field you're asking for support with, can easily do better than the AI.
It's a fantastic help when you would otherwise have an intern, and that's a lot of things, but it's not the right tool for every job.
* I assume this is grammatically gibberish in Chinese, I'm relying on Google Translate here: https://translate.google.com/?sl=zh-TW&tl=en&text=泥%20壕%20%2...
But the aide won't have deeply analyzed every military campaign in history; it will only spout off answers from books about those campaigns. It will have little to no insight on how to apply principles and lessons learned from similar campaigns in the current problem. Wars are not won by lines on maps. They're not won by cool gear. They're won by psychologically beating down the enemy until they're ready to surrender or open peace negotiations. Can LLMs get in an enemy's head?
> Can LLMs get in an enemy's head?
That may be much easier for an LLM than all the other things you listed.
Read their socials, write a script that grabs the voices and faces of their loved ones from videos they've shared, synthesise a video call… And yes, they can write the scripts even if they don't have the power to clone voices and faces themselves.
I have no idea what's coming. But this is going to be a wild decade even if nothing new gets invented.
Creating chaos and confusion is great, but it's only part of what a military campaign needs. You have to be able to use all levers of government power to put the other government or the adversary organization in a point where they feel compelled to quit or negotiate.
Aye.
FWIW, I hope all those other things remain a long way off.
Whoever's doing war game planning needs to consider the possibility of AI that can do those other things, but I'm going to have to just hope.
The person you are responding to seems to be promoting a concept that is frequently spouted here and other places, but to me lacking sufficient or any evidence - that AI models, particularly LLMs, are both capable of reasoning (or what we consider reasoning) around problems and generating novel insights that it hasn't been trained on.
Only if the enemy has provided a large corpus of writing and other data to submit to train the LLM on.
> Unlike students, LLMs can simultaneously process and synthesize insights from thousands of historical
They can't. Anything multivariate LLMs gloss over and prioritize flow of words over hard facts. Which makes sense considering LLMs are language models, not thinking engines, but that doesn't make them useful for serious(above "second year") intelectual tasks.
They don't have any such unique capabilities, other than that they come free of charge.
Kinda. Yes they have flaws, absolutely they do.
But it's not a mere coincidence that history contains the substring "story" (nor that in German, both "history" and "story" are "Geschichte") — these are tales of the past, narratives constructed based on evidence (usually), but still narratives.
Language models may well be superhuman at teasing apart the biases that are woven into the minds writing the narratives… At least in principle, though unfortunately RLHF means they're also likely sycophantically adding whatever set of biases they estimate that the user has.
They're subhuman about debiasing or any analytical tasks because they lack reasoning engines that we all have. They pick the most emotionally loaded narrative and go with it.
They can't handle counter-intuitive but absolutely logical cases like how eggplants and potatoes belong to same biological family but not radishes, instead they'll hallucinate and start gaslighting the user. Which might be okay for "second-year" students, but only going to be a root cause of some deadly gotcha in strategic decision-making.
They're language models. It's in the name. They work like one.
> They can't handle counter-intuitive but absolutely logical cases like how eggplants and potatoes belong to same biological family but not radishes
"Can't" you say. "Does", I say: https://chatgpt.com/c/6735b10c-4c28-8011-ab2d-602b51b59a3e
Not that it matters, this isn't a demonstration of reasoning, it's a demonstration of knowledge.
A better test would be if it can be fooled by statistics that have political aspects, so I went with the recent Veritasium video on this, and at least with my custom instructions, it goes off and does actual maths by calling out to the python code interpreter, so that's not going to demonstrate anything by itself: https://chatgpt.com/share/6735b727-f168-8011-94f7-a5ef8d3610...
But this then taints the "how would ${group member} respond to this?"; if I convince it to not do real statistics and give me a purely word-based answer, you can see the same kinds of narratives that you see actual humans give when presented with this kind of info: https://chatgpt.com/share/6735b80f-ed50-8011-991f-bccf8e8b95...
> They're language models. It's in the name. They work like one.
Yes, they are.
Lojban is also a language.
Look, I'm not claiming they're fantastic at maths (at least when you stop them from using tools), but the biasing I'm talking about is part of language as it is used: the definition of "nurse" may not be gendered, but people are more likely to assume a nurse is a woman than a man, and that's absolutely a thing these models (and even their predecessors like Word2Vec) pick up on:
https://chanind.github.io/word2vec-gender-bias-explorer/#/qu...
(from: https://chanind.github.io/nlp/2021/06/10/word2vec-gender-bia...)
This is the kind of de-bias and re-bias I mean.
> "Can't" you say. "Does", I say:
Have you seriously not seen them make this kinds of grave mistakes? That's too much kool-aid you're taking.
A language model isn't a model of strategic conflict or reasoning, but may contain text in its training data related to these concepts. I'm unclear why (and it seems the paper agrees) you would use the llm to reason when there are better models for reasoning about the problem domain - and the main value from llm is ability to consume unstructured data to populate the other models.
You are using a different definition of strategic than the DoD uses, what you are describing is closer to tactical decisions.
They are talking about typically Org wide scope, long-term direction .
They aren't talking about planning hidden as 'strategic planning' in the biz world.
LLMs are powerful, but are by definition past focused, and are still in-context learners.
As they covered, hallucinations, adverse actions, unexplainable models, etc are problematic.
The "novel strategic approaches" is what in this domain would be tactics, not stratagy which is focused on the unknowable or unknown knowable.
They are talking about issues way past methods like circumscription and the ability to determine if a problem can be answered as true or false in a reasonable amount of time.
Here is a recent primer on the complexity of circumscription as it is a bit of a obscure concept.
https://www.arxiv.org/abs/2407.20822
Remember, finding an effective choice function is hard no matter what your problem domain is for non trivial issues, setting a durable shared direction to communicate in the presence of the unknowable future that can't be gamed or predictable by an advisory is even more so.
Researching what mission command is may help understand the nuances that are lost with overloaded terms.
Strategy being distinct from stratagem is also an important distinction in this domain.
> but are by definition past focused,
To add to that, and because the GP had mentioned (a "virtual") Clausewitz, "human"/irl strategy itself has in many cases been too focused on said past and, because of that, has caused defeats for the adopters of those "past-focused" strategies. Look at the Clausewitzian concept of "decisive victory" which was adopted by German WW1 strategists who, in so doing, ended up causing defeat for their country.
Good strategy is an art, the same as war, no LLM nor any other computer code would be ever able to replicate it or improve on it.