Reference Data

Status of this document: Working Draft

This document is open for feedback, please post questions and comments in the forum at the bottom of this page. You will need a login to post in the forum.


  1. Abstract
  2. Reference Data
  3. Back


[Enter abstract]

Reference Data

It appropriate at this point to bring in another idea, Reference Data. When you have to relate abstractions between two representations, you really need a third. (You can say that one of the representations is the abstract one, but usually leads to compromises you shouldn’t make.) You need a third that truly abstracts the two representations. If you increase the number of representations, it forces you to generalize that abstraction to cover all possible representations. Having a neutral representation of information is whole point of Reference Data.

Reference data is not constrained by any storage method.

By mapping several representations into abstract content, you achieve integration of information. If two or more representations map to one abstract content, those two representations are said to be related to each other. And having that abstraction enables you to achieve the encapsulation you want.

How far do you go? Hmmm... There is an art to finding the Goldilocks point. Not too much. Not too little. Just right.

Look at all the different storages you have of the information. You need a point of encapsulation. And that point of encapsulation is just some boundary that you put on that information on that system. Where you implement that point of encapsulation could be at any point. Could be right in the database. Could be on some programming API or a webservice, or a file.

There are many ways to implement that point of encapsulation. Once you represent a specific representation into an abstracted one there’s the (??? Fine line third party???). What’s in there is just the information itself, but nothing about how it is stored, or location, what’s its schema is.

Here is the mental picture: There are an infinite variety of representations, and there is a representation at every storage point of information, and every storage point needs to expose an abstracted representation that is understood by everybody else, and that abstracted representation is the point of encapsulation.

The point of the abstraction is the encapsulation. Where ever you finally transform your representation into that 3rd party abstracted representation, that is the point of encapsulation.

What’s left is purely the information you want to make public. Private information, schema, column names, table names, storage location path, all should remain private. You want freedom. You don't want nobody depending on that level of detail.

Interestingly, we have a convergence of interests. We generally want to keep proprietary information secret. But this principle of encapsulation says that no one wants the proprietary information anyway, because if we use that proprietary knowledge in the data exchange, we are forever shackled by someone else’s proprietary thoughts.

Most people violate this principle because its so much easier to violate it. It’s easier to connect systems together without encapsulation. The greatest offenders are the people in IT, because they are project-oriented. “What’s the easiest right now.”

Benefit: Total control of what you publish Benefit: No one really even wants your proprietary information anyways



You have no rights to see this discussion.

About PCA
Reference Data Services