AI 101 for Lawyers: Deep Learning & Why it Matters

21 Nov

Written By The King's College London Legal Tech Society .

Eleanor Kaufman

Introduction

If you can’t explain what AI is doing, it’s not a tool. It’s a liability with a user interface.

It’s a long-running joke that law students can’t do maths. As former US First Lady Michelle Obama famously put it, “We can’t add and subtract, so we argue.” If you’re out of practice it is entirely justified – complex calculus has never been a prerequisite to drafting a contract. However, facing the rapid adoption of AI in the legal practice, we do need to understand what people who like maths are doing with it, because their models are now systems we are responsible for supervising.

Chances are you will never actually be in a situation where you would benefit from knowing how to code. Jensen Huang, the CEO of Nvidia, has boldly claimed that traditional coding will be a relic of the past as models improve.¹ So, this is not an effort to teach you a programming language. Rather, I am writing this series to show you how AI works, where things get problematic, and how to take ideas you want to build on forward without learning to code line-by-line.

Our value is in critical thinking and safeguarding justice, which is antithetical to rubber-stamping the conclusions of people we’ve never met, relying on reasoning we cannot see, and taking responsibility for outputs we don’t fully understand. So, here is an overview of the mechanisms underpinning how AI works and how they are relevant for you.

How does AI work?

Artificial intelligence is the effort to build computer systems that can match or exceed aspects of human intelligence. Approximately a century of cognitive scientists’ works modelling perception, memory and reasoning brought us to the current state of AI.² So, for the sake of your time and sanity, I am not going to attempt to summarise history. Rather, I am going to explain deep learning – the underlying mechanism behind Generative AI.

If it helps you:

Deep learning = Culinary school

Large Language Models (GPT, Claude, DeepSeek) = Chefs

Deep learning is the method of training that Generative AI models require to produce.

More specifically, deep learning refers to the use of neural networks, or computational structures inspired by the architecture of the human brain. These networks learn patterns from data through layers of interconnected neurons, which learn patterns. In essence, they operate like tiny rule-detectors.

Think about short-form content applications such as Instagram reels. Your recommendation model has lots of input features:

User features: history of watched videos, likes, shares, skips, follows, device, time of day, etc.
Video features: topic (comedy, dance, cooking), audio, length, caption, creator, etc.
Context features: current session, network, location/time bucket, etc.

These are turned into embeddings (vectors), then fed into a deep network which finds patterns between the input data and multiple engagement probabilities such as:

Probability of watching ≥ 3 seconds
Probability of watching ≥ 90% of the video
Probability of liking
Probability of sharing/sending
Probability of skipping quickly

These predicted probabilities are then used by the ranking system to choose the content most likely to maximise overall engagement. Basically, the algorithm knowing you like that meme is a direction or subspace (a pattern) in the hidden representation that correlates strongly with users who often rewatch comedy content.³

So in essence, deep learning is the ‘culinary school’ behind Generative AI. First, patterns are recognised, providing the foundation for predictions. For example, large language models learn the statistical structure of languages and use it to predict what comes next. If autocorrect predicts the next word, an LLM predicts the next sentence, paragraph, or document. It is not retrieving text from a database; it is generating original output based on patterns it has learned.

In Summary

AI is not magic: it is simply simulating human cognition. This is extraordinarily powerful in the right context and deeply problematic in the wrong ones.

Returning to our cooking analogy, think of deep learning as the culinary school where a chef trains techniques, instincts, and muscle memory. During training, the model tastes millions of dishes, learning which flavours usually follow others, which ingredients clash and which combinations show up often.

Generative AI (think GPT, Claude, or image generators) are the fully trained chefs who graduate from that school. A good chef internalises patterns overtime, then uses their training to produce!

Why is this important to lawyers?

Transparency

Humans learn rules like:

“This is a comedy video because it’s funny.”

Deep networks do not store rules.
They store statistical associations like:

Certain pixel patterns + audio patterns + caption words → comedy cluster
Certain user behaviour patterns → likely to rewatch

Training is mathematically observable (we can see weights & gradients), but behaviorally unobservable (we cannot interpret what knowledge each weight represents). GPT-4 has 1.7 trillion parameters which each look like:

W[859203, 192] = 0.04512

This is utterly useless for reading the model’s “thoughts”, identifying where facts are stored, or isolating a moral rule. This leads us to the “black box” problem.

Why does it matter to lawyers?

In 2014, Amazon announced that they abandoned their CV screening system which was trained on ten years of male-dominated hiring data. It learned statistical patterns that correlated with “successful candidates,” many of which were inadvertent proxies for gender. It began penalising resumes containing words like “women’s” or from women’s colleges – not because it understood gender, but because those features were statistically associated with lower hiring rates in the past.

Engineers tried removing the obvious signals, but because the model’s internal representations were opaque, they couldn’t ensure it wouldn’t discover new, hidden proxies. They ran into the black box problem. The system’s behaviour emerged from millions of parameters whose meaning could not be interpreted.

Continuing to apply this tool in the UK would breach the Equality Act 2010 s.13 (direct sex discrimination) and s.19 (indirect sex discrimination).⁴ But, there are currently no clear and objective reporting standards to prove that algorithms that are being used fairly. Right now, the ‘ICO Data protection audit framework AI guidance’ provides that to meet their expectations, one must implement algorithmic fairness techniques, conduct analysis of the limitations, maintain evidence throughout the AI system supply chain and ensure fairness is considered at different stages of the AI system.⁵ Several techniques are therefore suggested to mitigate discrimination, but we’re still very early in this journey.

An area of technical development

Mechanistic interpretability is the area of research that tries to open up these systems and actually figure out how they work on the inside. Instead of just watching their inputs and outputs and treating them like mysterious “black boxes,” the goal is to understand what’s going on at the level of individual circuits, neurons, and computations. In practical terms, that means trying to link specific parts of a model to the actual functions they’re carrying out to uncover the real internal logic of deep learning systems.

Storing and Leaking Personal Data

Deep learning models often memorise portions of their training data - especially rare or unique things like a one-off email, a private message, or a distinctive medical image. This happens because modern neural networks are huge, with far more parameters than data points, so it’s easy for them to simply “store” unusual examples instead of learning general patterns.

Training doesn’t naturally stop this: the model just tries to reduce its loss, even if that means remembering sensitive information word-for-word. Without extra safeguards (like privacy filters or special training methods), deep models can and do internalise personal data, which raises obvious privacy and security risks.'

Why it matter to lawyers?

Memorisation can later surface as “regurgitation,” where a model reproduces verbatim text from the training set when given certain prompts or contexts. Therefore, if you placed confidential client data into a poorly designed or unsecured LLM without the client’s informed consent, among other terribly consequential issues, you would clearly be in breach of your confidentiality obligations under paragraph 6.3 of the SRA Code of Conduct for Solicitors, with unpredictable consequences for you client. This is why it’s incredibly important to know how legal tech tools work.

An area of technical development

Counteracting this is incredibly complicated, which is why you need to be careful about where you put your data.

The most reliable technique is Differential Privacy (DP), which mathematically guarantees that the influence of any single training example is bounded and obscured. Differentially Private Stochastic Gradient Descent (DP-SGD) adds calibrated noise to gradients during training, ensuring that the model cannot rely heavily on any one individual’s data. Consequently, it prevents both explicit memorisation and various privacy attacks, such as attempts to extract training samples or infer whether a specific person was part of the training set. DP is considered the gold-standard for privacy-preserving machine learning and is already used at scale by Google sensitive analytics and on-device learning.⁶

Hallucinations & Unpredictability

Deep learning systems are inherently difficult to predict or control. Small variations in input can produce disproportionately large or unexpected changes in output, and large language models routinely generate confident but wholly incorrect responses. In optimisation settings, models sometimes exploit quirks in their training signals – reward hacking – and end up pursuing goals their designers never intended. These systems can also generate content that is misleading, harmful, or simply unsafe, whether in the form of flawed advice or persuasive misinformation. As models grow in scale and are deployed more widely, this basic unpredictability becomes not just a technical issue but a genuine ethical risk.

An area of technical development

A growing body of work aims to curb the inherent unpredictability of large neural models and reduce the incidence of hallucinations. While none of these measures eliminate unpredictability entirely, together they represent a meaningful attempt to bring large-scale models within the bounds of reliable and responsible use.

Conclusion

“AI will not replace lawyers. Lawyers who know how to use AI will replace those who don’t.”

— ChatGPT, 2025

The models may be new, but our task remains the same: to understand, to question, and to protect and defend the people who rely on us. Now, you have fundamentals of AI literacy to build on!

If we cannot explain what a system is doing, we can neither scrutinise it nor credibly rely on it, let alone defend its outputs. This series is designed to give you that foundation. If you have any questions, don’t hesitate to reach out to kcllegaltech@gmail.com.