## Why use `deepcopy`

?

In Julia, `copy`

is a function which creates a shallow copy. For example:

```
julia> a = [1] # vector with one element, namely 1
1-element Vector{Int64}:
1
julia> b = [a] # vector with one element, `a`
1-element Vector{Vector{Int64}}:
[1]
julia> b2 = copy(b) # new vector, also with one element which is `a`
1-element Vector{Vector{Int64}}:
[1]
julia> push!(a, 2) # mutate `a` so it contains 1 and 2
2-element Vector{Int64}:
1
2
julia> b # since `b` contains `a`, we can see its (nested) contents have changed
1-element Vector{Vector{Int64}}:
[1, 2]
julia> b2 # same for `b2`!
1-element Vector{Vector{Int64}}:
[1, 2]
```

Since `copy`

is shallow, `b2`

still contains the same vector `a`

(whose contents we modified to be `[1,2]`

), just like `b`

, even though they are independent vectors which do not share memory:

```
julia> push!(b, [10]) # mutate `b`
2-element Vector{Vector{Int64}}:
[1, 2]
[10]
julia> b2 # not mutated
1-element Vector{Vector{Int64}}:
[1, 2]
```

In contrast, `deepcopy`

is a function which recursively copies objects:

```
julia> a = [1] # vector with one element, namely 1
1-element Vector{Int64}:
1
julia> b = [a] # vector with one element, `a`
1-element Vector{Vector{Int64}}:
[1]
julia> b2 = deepcopy(b) # new vector, with new contents
1-element Vector{Vector{Int64}}:
[1]
julia> push!(a, 2) # mutate `a` so it contains 1 and 2
2-element Vector{Int64}:
1
2
julia> b
1-element Vector{Vector{Int64}}:
[1, 2]
julia> b2 # contents are unchanged
1-element Vector{Vector{Int64}}:
[1]
```

We can see that `b2`

still just contains `[1]`

.

It’s easy to see why `deepcopy`

might be appealing: it could be surprising that modifying `a`

affects `b2`

.

Note also that `deepcopy`

also has some other nice properties. For example, note here that since `b`

contains `a`

twice, modifying `a`

has the following effect on `b`

:

```
julia> a = [1]
1-element Vector{Int64}:
1
julia> b = [a, a]
2-element Vector{Vector{Int64}}:
[1]
[1]
julia> push!(a, 2)
2-element Vector{Int64}:
1
2
julia> b
2-element Vector{Vector{Int64}}:
[1, 2]
[1, 2]
```

`deepcopy`

preserves this internal structure. Continuing the example:

```
julia> b2 = deepcopy(b)
2-element Vector{Vector{Int64}}:
[1, 2]
[1, 2]
julia> a2 = b2[1]
2-element Vector{Int64}:
1
2
julia> push!(a2, 3)
3-element Vector{Int64}:
1
2
3
julia> b2
2-element Vector{Vector{Int64}}:
[1, 2, 3]
[1, 2, 3]
```

Pretty nice! Semantically, `deepcopy`

should be the same as composing `deserialize`

and `serialize`

.

## Why not use `deepcopy`

?

`deepcopy`

is reaching into the internals of the object, rather than relying on the API of the object (namely, its method for `copy`

).

It’s easy to construct cases in which this is semantically incorrect. For example, lets say we are constructing our own vector type which stores its memory elsewhere, and stores a token to use to lookup the memory. Here is a quick implementation:

```
# mutable so each instance has its own identity
mutable struct Token end
const STORAGE = Dict{Token, Vector{Float64}}()
struct MyVectorType <: AbstractVector{Float64}
token::Token
end
# construction from a `vector`
function MyVectorType(v::Vector)
token = Token()
STORAGE[token] = v
return MyVectorType(token)
end
Base.getindex(m::MyVectorType, i::Int) = STORAGE[m.token][i]
Base.setindex!(m::MyVectorType, v, i::Int) = STORAGE[m.token][i] = v
Base.size(m::MyVectorType) = size(STORAGE[m.token])
function Base.copy(m::MyVectorType)
return MyVectorType(copy(STORAGE[m.token]))
end
```

For example:

```
julia> v = MyVectorType(rand(2))
2-element MyVectorType:
0.49321258978106763
0.6022070713363459
```

Then `copy`

works as expected, since we defined a method for it:

```
julia> v2 = copy(v)
2-element MyVectorType:
0.49321258978106763
0.6022070713363459
julia> v[1] = 2.0
2.0
julia> v
2-element MyVectorType:
2.0
0.6022070713363459
julia> v2
2-element MyVectorType:
0.49321258978106763
0.6022070713363459
```

But `deepcopy`

fails:

```
julia> deepcopy(v)
Error showing value of type MyVectorType:
ERROR: KeyError: key Token() not found
Stacktrace:
[1] getindex
@ ./dict.jl:477 [inlined]
[2] length
@ ./REPL[6]:1 [inlined]
```

It has constructed new `Token`

instance which does not have a corresponding entry in `STORAGE`

.

Using `deepcopy`

, we have made assumptions about the implementation details of how `MyVectorType`

works and constructed an invalid instance!

One can run into similar problems when the object contains references to memory allocated in another language, see e.g. JuMP’s rationale for disabling `deepcopy`

on its models.

## Revisiting reasons to use `deepcopy`

Sometimes, feeling a need to use `deepcopy`

is actually a hint that something else is wrong. I think the main one is missing `copy`

methods, but I believe there might be several other reasons; if you think of one, let me know and I might add it here.

### “Missing” `copy`

method

I think one of the most common reasons to reach for `deepcopy`

is that one of the objects you are working with is missing a `copy`

method, or it is not copying quite what it should^{1}.

While `copy`

is defined to be a “shallow” copy, it is not always totally clear exactly how deep or shallow that should be. Adding semantics around whether the object represents a nested or flat datastructure can clarify this.

For example, a `DataFrame`

is semantically a 2D object, which is implemented with a vector-of-vectors similar to our `b`

in the first example. `DataFrame`

defines a `copy`

method which by default *does* copy the nested inner vectors, but *not* any of their contents. I believe this is semantically correct, because the vector-of-vectors construction is simply an implementation detail of `DataFrame`

, and the object itself is a flat 2D object (as shown by e.g. `size`

), and therefore `copy`

should not behave like it is a vector-of-vectors.

Sometimes `deepcopy`

usage hints at both a “missing” struct definition and copy method. For example, when using nested dictionaries to store configuration state, one might start with a “default configuration”, then `deepcopy`

it to pass to a user to modify. This seems practical and totally fine, but it might be nicer to wrap up the configuration in a `struct`

and define a `copy`

method for it.

## Should you use `deepcopy`

?

Maybe! It depends on the situation. To me it is often a sign that some abstraction is not working as intended, or a `copy`

method is missing somewhere. But it is a good workaround, and sometimes a useful shortcut in some scenarios like writing tests, and there are situations in which it seems like the correct tool for the job.

Note: in this case, using

`deepcopy`

may be the best way to stay unblocked in this situation! I don’t think there’s anything wrong with using it as a workaround, but it may signal there is an upstream issue somewhere to be filed or fixed. ↩︎