Immutable Data and Data Ownership
How to handle data is one of the two major issues that programming languages have historically had to wrestle with (the other being how to structure code for long term maintenance). The problem in many languages is that data can become corrupted (in an improper state of some kind) when different parts of the program interact with it. This can lead to bugs when state that shouldn’t be possible, state that is not reflexive of reality, or the state is such that it is unexpected by some consumer of the data occurs and is uncaught.
Object Oriented Programming (OOP) is probably the most common way that we see to handle this problem in the modern world. Let’s examine how OOP tries to solve this problem.
OOP’s general theory is that data should be encapsulated behind specific behaviors. These bits of encapsulation can then pass messages to one another through defined means, and the encapsulation will modify it’s own state in very particular ways. This means that data can only be transformed in very particular ways as defined by the defining class.
Sounds great right!
Well hold on.
The problem with this is two fold. One is a problem with the way that this is actually implemented in languages. The other is a problem with the notion itself.
Very rarely is this idea actualized in the way that it was intended. Public members, getters/setters, objects as messages, oh my! There is some violation of this notion of pure encapsulation in almost every OOP language out there. Without this, data turns into a free for all. It leads to highly nondeterministic behavior with data. Almost to the point where there was really no point in making it OO in the first place! We get all the public data issues of having done nothing with all the implementation/understanding issues that come along with OO abstraction.
There is also a problem on the face of it that goes beyond the specific implementations of OOP. It makes it almost impossible to reason about programs. If I create some object for handling data, that object has a life of its own. I as the code that created the data don’t have any say over what happens to that data. Even though I am the only one that actually knows what that data is intended for. it has a life of its own and it will mutate and change at will. As long as something sends it a message, it will simply change state.
This is compounded by the issues that we talked about earlier. Improper encapsulation turns this problem from annoying inconvenience into one of the biggest problems in modern software engineering.
So is there a solution to this problem. I would argue yes!
Immutable data!!
For those of you who are not super familiar with functional programming, this is the concept that if a pass a variable to a function, the state of the data in that variable will not change. It’s a very simple change, but the implications are immense! Suddenly data no longer owns itself, but the scope that created it does! This means that at every level of that data’s life cycle, each scope can have guarantees about the state of any piece of data. This change limits the damage that can be caused by a piece of code corrupting data. The corrupt data won’t percolate up and around the system causing issues that cannot be predicted. Each scope in the code can place code into whatever state it wants and, as long as it is designed to deal with that state change, won’t cause errors in the rest of the system. Data exists exactly as expected at each level of scope and doesn’t leak into others.
To change data, a new version of that data is passed back to the calling scope. This is where issues can occur. Code can be too trusting of the data passed back to is by a called function. This can lead to scope unwittingly corrupting it’s own data. I would argue though that this is a much preferable situation. If data gets corrupted, you know exactly where that corruption occurs. It’s not a massive hunt for what part of the code is setting some internal data to something it shouldn’t be. A function returns us a value, we verify it, and then we can use it! It’s a much simpler process when compared with objects and their lifecycles.
Now is immutable data the solution to all of our woes?
No.
Nothing is perfect, and you can still do this wrong. As pointed out earlier, this is a system built on trust but verify. If the verify part of that is never performed, then what’s the point.
But it is an option. An option that I would argue is far preferable to the current mainstream of data that owns itself and is alive; That basically offers zero help or guarantees that help keep data in tact.