1. Introduction

In this article, we’re going to look at some of the ways we can safely share the same data between different areas of code, both by literally sharing the exact memory and by making copies of it as appropriate.

2. References vs Values

In many languages, such as Java, most of our variables do not store the actual value but instead store a reference, or pointer, to the value:

Screenshot-2022-03-21-at-07.36.22

There are some significant benefits to working this way. For example, when we pass variables around, we’re only passing the small reference instead of the larger value around.

This can also allow many different variables to point to the exact same value in memory:

Screenshot-2022-03-21-at-07.38.25

This can be useful because it means that both variables will see the exact same data. However, it also means that if one of them changes, then the other will automatically see the same changes at the same time. The two variables are always the same.

Note that this only applies when using Objects. Primitives – like int and byte – are always stored and passed as the exact value and not a reference to the value. This is fine because the largest primitive – long – is typically the same amount of memory as the reference would be, and primitives are always immutable, so they can’t be changed anyway.

3. What Is a Shallow Copy?

In some cases, we may want to create a copy of a value so that two different pieces of code see different copies of the same value. This allows one to be manipulated differently from the others, for example.

The simplest way to do this is to make a shallow copy of the object. This means we create a new object that contains all the same fields as the original, with copies of the same values:

Screenshot-2022-03-21-at-08.42.03

For relatively simple objects, this works fine. However, if our objects contain other objects, then only the reference to these will be copied. This, in turn, means that the two copies contain references to the same value in memory, with the pros and cons that this entails:

Screenshot-2022-03-21-at-08.40.05

In this example, both our original and our copy have a field “def” that points to the same list of numbers. If one of them changes the list, the other will see the same changes. However, because we’ve made a copy of the original, it might be surprising that the underlying data is still shared between them, and this can lead to unexpected bugs in our code.

4. What Is a Deep Copy?

The alternative to this is to perform a deep copy of the object. This is where we copy each field from the original to the copy, but as we do so, we perform a deep copy of those instead of just copying the references:

Screenshot-2022-03-21-at-08.39.28

This will then mean that the new copy is an exact copy of the original, but in no way connected so that no changes to one will be reflected in the other.

5. Immutability vs Copying

The main benefit of making copies of our data is that two different pieces of code can act on it without interference. If we have two pieces of code that are each given the exact same list, and one removes an item from it, then the other will see that change as well. Making a copy of the list means that changes to one are not seen on the other.

However, copying objects can be expensive. The more complicated the object structure, the more expensive it can be. And in some cases, copying might be impossible – for example, if the object represents a physical resource such as a network socket or a file handle, instead of just some computer memory.

However, there’s another alternative. If our objects are immutable – that is, the values can never be changed – then there is much less risk in sharing the exact same values between different pieces of code. If we pass our list around to different pieces of code, but we can guarantee that it will never change, then we know that this will be safe.

However, writing immutable code isn’t always easy, especially with nested structures. For example, we might have an object that only has getters and no setters – so its fields can never be changed. This object is, in itself, immutable, but if any of those fields are themselves mutable, then the same problems can arise:

class Immutable {
    private final List<String> names = new ArrayList<>();

    public List<String> getNames() {
        return names;
    }
}

In this example, it’s impossible to change the names field in our object. It will always point to the same list. However, what happens here?

var immutable = new Immutable();
var immutable2 = immutable;

immutable.getNames().add("Baeldung");

Even though our names field can never be changed, we’ve still managed to insert a new entry into it. And this entry will be seen by both immutable and immutable2 at the same time because they both point to the same memory.

6. Copy-on-Write

In some cases, we want to have values that are mutable but we don’t want to pay the cost of copying them if we don’t need to. In this case, we can use a pattern called Copy-on-Write. In this case, we create a copy of our object that points to the original. However, we will then make a copy of the original as soon as we want to make any changes to it:

class Original {
    private String value;

    public String getValue() {}
    public String setValue(String value) {}
}

class CopyOnWrite {
    private Original value;
    private boolean copied;

    public String getValue() {
        return this.value.getValue();
    }

    public String setValue(String newValue) {
      if (!copied) {
          this.value = deepCopy(this.value);
          copied = true;
      }
      this.value.setValue(newValue);
    }
}

Here, our CopyOnWrite class wraps an instance of our Original class. This means that the exact same value can be shared around cheaply. However, the first time we call setValue() on our wrapper, we immediately stop and make a local copy of the original.

At this point, we’re paying the cost to perform the deep copy, but it means that our changes are local only to this instance and not seen in any other instances.

7. Summary

Here, we’ve seen some ways that we can share data between different areas of our code and explored some of the ways that this can be done so that one area can’t inadvertently affect the other.


« 上一篇: 调用栈