OpenHistorian Archive file reading/writing programmatically

logesaustralia · February 24, 2020, 1:34am

Hi,

One of our requirements is “If connection between PMU and PDC (openHistorian) is disconnected in real-time then we will lose the PMU data”.

In order to get the disconnected data from PMU, we have proposed below solution.

Write the data in file at PMU during the disconnected period and then it will be imported to PDC
machine, once connection is restored (manually or automatically).
Imported file needs to be used by OpenHistorian.
According to the #2 step, I would like to understand the below concepts in openHistorian, please
clarify.

My current understanding about “openHistorian data reading from PMU and playing or displaying
the data in OpenHistorian Manager” is

a.	Receives time series data from PMU and interpreting them (using goose relevant protocol 
format) and displaying the current value on the screen (OpenHistorian Manager) and storing the 
same data in archive file.

b.	Once data stored in Archive file (d2i), any time user can view the data using openHistorian 
playback utility or Grafana web interfaces.

Please correct me if, my understanding is wrong in the above steps “a and b”.

  2.	Please explain me the below concepts in “code level” or “module level”, so that I can get 
  better understand and if needed, we can implement some solution on top of existing system to 
  handle this requirement.

 1.	GSF.PhasorProtocols Module is interpreting the PMU data and sending to screen display 
  with real-time data.

 2.	Which module or assembly is implemented for converting the real-time or timeseries data to 
 Archive file? My understanding is GSF.Historian assembly contains all implemenation related to history data, but I could not find the writing part of .d2i file and data structure and etc..., So please give more details in developer point of view.


3.	What is the actual format of Archive file?

4.	How do we convert archive data in between this disconnected time frame (e.g. If connection is 
    broken between PMU and PDC from 1PM to 2 PM on a particular day, then how do we write 
    archive file for this time frame? Because, we may import the disconnected data from PMU after 2 
    days (in case manual process).
    Reason for this question is, intially 2KB or 20kb of data is generating as d2i file and then 
    combining all files for the day or hour and generate final d2i file. In this case if I insert 
    disconnected data, how will it be handled by openHistorian?
   

    If you have any DFD (Data Flow Diagram of OpenHistorian) or technical document, please provide 
    the link or share it, it would be helpful.

Or If you have any other solution to handle this, please help.

Thanks & Regards
Logu

StephenCWills · February 26, 2020, 3:07pm

I’m struggling to read through your post to understand all the questions you have so I’m just going to focus on two major points.

Data gap recovery

The recommended solution to this problem does not rely on the PMU to capture its own data while the connection is down, but rather to set up another system with enough storage to support a short-term archive of local PMU data. This system handles the buffering of data in the short-term archive and also providing data to the central openHistorian. The central openHistorian detects gaps in the data that it’s collecting and automatically retrieves data from the short-term archive to fill in the gaps. This solution solves two problems.

It is often difficult, if not impossible, to write a custom app on the PMU itself to determine whether the connection to the data collector is down. Furthermore, even if you can determine when the connection went down, you still risk data gaps during the transition between the real-time feed and the short-term archive. Indiscriminately saving the data in a short-term archive ensures that you do not need to know when the PMU is disconnected from the PDC and that you definitely have captured the data that went missing during the outage.
It is much easier to estimate the storage requirements for a short-term archive that indiscriminately saves a specific amount of data versus a solution that stores a variable amount of data based on the length of an outage.

The openHistorian file format

To be honest, I don’t really understand the technical details of this format myself, so I won’t be able to answer your questions. There is no documentation either, as far as I know. The best thing I can tell you is to really dig into the source code to try and figure it out. What I do know is that the openHistorian saves data in stages.

The pre-file stage is known as stage 0. This consists of an in-memory B-tree, and all individual data points that enter the openHistorian make it here first. When the stage 0 B-tree reaches a certain size limit or when it’s been held in memory for more than 10 seconds, it will automatically be flushed to a stage 1 file.
This is where the .d2i files you mentioned start appearing. If you look in the openHistorian’s working directory, you will see a collection of files prefixed with stage1 with the file extension .d2i. When the collective storage space of these files reaches a certain size limit or when there are a large number of these files to go through, they will automatically be rolled up into a stage 2 file.
These can also be found in the working directory, with the prefix of stage2 and the extension .d2i. The openHistorian basically reads the data from the separate stage 1 B-trees and writes it to a single stage 2 B-tree to combine all that data into a single file. In the same way that stage 1 files are combined into stage 2, stage 2 files get combined into stage 3 by the same type of conditions.
The third and final stage. This is when the file extension changes to .d2 instead of .d2i (d2i stands for “d2 intermediate”, btw). These files are written to the archive directory of the openHistorian archive and are organized into folders by year and month. Because this is the final stage, these files are static and will never disappear.

In addition to the working directory for .d2i files and the archive directory for .d2 files, there is a connection string parameter that allows you to specify “attached paths” where additional archive files can be placed. The openHistorian can potentially create or delete files in the working directory or archive directory, but it will only read from the attached paths without making any modifications there. If you do manage to develop some code to write .d2 files on the PMU itself, I would recommend placing them in an attached path so the openHistorian can read them.

As for the type of data that gets stored in the openHistorian, that’s where things will get tricky for you. The openHistorian SQL database defines a 64-bit integer identifier for every signal that it stores in its archive file. The key for the B-tree is a 64-bit timestamp in ticks concatenated with the 64-bit integer identifier so it can be cross-referenced with the SQL database. That means the PMU will need to know what those 64-bit integer identifiers are and what signals on the PMU that they map to in order to actually write a .d2 file that can be read by the openHistorian. The value consists of the a single measured value from the PMU taken at the timestamp in the corresponding key, plus a set of state flags for that measurement that can be used to indicate conditions for time and data validity.

There is also some form of compression applied to the data on its way into the openHistorian. I don’t know anything about it except that it exists.

I think that’s about all the help I can give you. Good luck!

Thanks,
Stephen