Database – How to Better Document Data Relationships and Transformations

databasedocumentation

I'm working on a project that uses RxJS to perform data transformations on varying sources of data, and I'm in the process of writing some documentation for it.

I want to find an effective way to document the following:

  1. An abstract way to describe the cardinality and relationships of the data.
  2. An abstract description of the data transformations.

Here are two examples of how I'm describing a data transformation. Table headers are the destination fields, the second row is the source data or a transformation done on the source data to get the desired data.

Data transformation 1

Data transformation 2

I can see that the Github Markdown format is very limited for this purpose, which is why I'm asking for help on this.

I also have a few ERD diagrams that looks like this:

Schema

I'm not sure of a clean way to document how the transformations relate to the schema, and what assumptions about cardinality are made within those transformations (getStudentTestScoreDcid in particular)

Best Answer

Data Flow Diagrams sound like what you need

From Wikipedia:

A data flow diagram (DFD) is a graphical representation of the "flow" of data through an information system, modelling its process aspects. A DFD is often used as a preliminary step to create an overview of the system, which can later be elaborated.2 DFDs can also be used for the visualization of data processing (structured design).

A DFD shows what kind of information will be input to and output from the system, where the data will come from and go to, and where the data will be stored. It does not show information about the timing of process or information about whether processes will operate in sequence or in parallel (which is shown on a flowchart)

Emphasis above is mine

The whole point of the DFD is to show the transformational aspects of data as it moves through the system. You will always have an input (from a user, data storage, or another process) that feeds into a process with an output (to the screen, data storage, or another process). If you don't have those three elements, you don't include it on the DFD. One other item worth mentioning, a large number (I would say most) DFDs do not have a starting point or ending point on the complete diagram.

There are at least 2 different symbologies used (Gane-Sarson, and Yourdon & Coad).

The example below shows how data from a Customer goes into the Process Order process which outputs data being stored in the Transaction data storage. Duplicated data stores are usually included to show the process more easily and are often marked with an altered symbol for the data store (a D in a gray box on this example).

Example data flow diagram example showing how data moves from data stores through the processes of the system being document.

Sample from Visual Paradigm

Related Topic