top of page

A novel way to manage accumulating industrial big data (Feat. Mount)

최종 수정일: 10월 18일

INTRO

In various fields, commonly referred to as Smart-X, industrial big data, called sensor data or time series data, is increasingly being generated, and customers are storing and managing it through various types of technology and software. In addition, recently, because there are constraints such that data for AI learning cannot be immediately discarded or deleted after a certain period of time as before, how to manage such data has become an important issue.


I wonder if there is a simple way to manage data stored in a database or plain text, but there is a novel method proposed by Macbase, so let's take a look at how effective this method is.


Difficulties in managing massive amounts of IIoT big data

Limitations of traditional database backup concept

(The place where data is stored will be limited to “databases” such as MySQL, MaridDB, MongoDB, influxDB, etc.)


In general, the storage size of the equipment where the database is installed is limited, and old data that is thought to no longer be accessed, i.e. cold data, is generally deleted from the online database after backup.


Traditional DBMS backup concept (this article assumes physical/online/local backup)

However, when trying to manage large amounts of data by utilizing the database's unique function called backup, there are some practical difficulties as follows.


First, the scope of the backup target cannot be managed.

Backup in a traditional database is largely divided into a method of backing up the entire data (full backup) and a method of backing up only the changed parts (incremental backup). The reason for this division of methods is that backing up the entire database every time is not only inefficient, but also a huge waste of time and resources. However, due to the nature of data such as IIoT, backup for a “specific time range” and a “specific table” are also required for backup.


Second, the restore time for extracting backed up data is excessive. (You cannot access the backup data without restoring it)

The basic concept of database restoration is to revert the database instance to its backup copy. To achieve this, it is necessary to redo the contents of the existing backup copy back to the database, which is an enormous event that takes several hours to several days depending on various environments. In any case, once this long recovery process is completed, it becomes possible to access past data and conduct analysis and data exploration.


On the other hand, there is also a way to EXPORT/IMPORT a specific table or the entire database rather than using this unique backup function, but in this case, not only does it take up more storage space than expected as data in text format, but it also requires indexing after loading the data. Since data access is only possible after this is completed, this is also a very unrealistic method that still requires excessive time and resource costs from the perspective of data restoration.


Third, when restoring a database through backup, changes to the original database occur.

Since the concept of backup is to return the instance to that point in time, the shape of the current database inevitably changes, making it impossible to maintain the latest data. So, in most cases, restoration of backup data is performed only when a problem occurs in the original database due to a failure.


Backup method and data service issues

As mentioned earlier, "database restoration" is a technique for cases where the original database fails and can no longer be used.


But what if there is a frequent need to service (or extract) backed up historical data?

Let's assume that there is an organization that has a policy of performing a full backup at the end of each month and deleting all data according to the storage policy or backup policy.


If you detect a defect in a product produced on a specific day three months ago and need to submit data about what happened at that time, you will have to go through the restoration and data extraction process above, taking several hours each time. The problem is that it is a very inefficient and slow method, but there is no other solution.


Or, if an AI analysis organization periodically requests time, sensors, and data range under various conditions, it is clear that it will be very difficult to quickly provide data by restoring dozens of backup files backed up monthly for a long time each time.


In other words, if a large amount of data has been backed up even once, it can be said that rapid data service (extraction) of the stored data is very difficult using existing technology.



Imagine a new way to manage data

Because storage space is not infinite, data inevitably has to be backed up multiple times, but let's use our imagination and look at this problem from a different perspective.


  1. How great would it be if you could specify the range for the entire database or for each table when backing up?

    1. This will save backup time and maximize the convenience of data management by allowing you to select the backup range.

    2. Additionally, you have the freedom not to include unnecessary data in the backup range.

  2. Wouldn't it be nice if you could give a time range for the destination when backing up?

    1. This best reflects the characteristics of sensor data derived from IoT or IIoT, which has time series characteristics.

    2. If this is possible, you will have the ability to specify a portion of the data on the time axis rather than the entire backup target.

  3. Wouldn't it be great if you could instantly restore your backed up data?

    1. This is an incredibly innovative technological advance.

    2. Six months' worth of data was backed up and stored in 2TB of space, but it is difficult to predict how long it will take to restore it.

    3. Also, if an error occurs in the middle... it would be an unimaginable scenario.

  4. Rather than restoring backed up data, what if you could mount it like a file system and view it?

    1. When it comes to gaming, doesn't this seem like the "endgame" feature?

    2. Rather than restoring the 2TB-sized backup file briefly mentioned above, mount it to another database inside the database to check, search, utilize, and service the data, and if not needed, unmount it to confirm its existence. If forgetting were possible, it would be a truly new level of data management.



Machbase Neo’s novel IIoT data management method

The figure below is a brief illustration of how Macbase Neo supports the new data management method mentioned in the above section. (really?)

Backup steps

1. The user decided to back up all data from the time range TIME-0 to TIME-1 for the entire database.

2. And, to improve storage space efficiency, all data (B1) from table B in the time range TIME-0 to TIME-1 is deleted.


In this case, when a backup is performed using the query below, a set of backed up files is created under the specified directory.

BACKUP DATABASE FROM time-0 TO time-1 INTO DISK = '/tmp/MYBKUP';
DELETE FROM B BEFORE time-1;

Once the above process is completed, a backup copy is created, all data in table B is deleted, and all processes have been performed for their original purpose. (Let’s say the above backup copy was later moved to /backup-store, a cheap space)


Mount steps

And, for various reasons, it became necessary to access and extract data contained in B1 of table B, which was deleted.

At this time, take the following steps to immediately check the data using the backup copy.

MOUNT DATABASE '/backup-store/MYBKUP' to newdb;

In this process, the data in the backed-up space is mounted as a new database called newdb, and all indexes are completed and all access is ready. (This process is completed within seconds!!)


Utilization stage

As briefly mentioned earlier, this mounted database shows amazing performance by being able to perform any time series query very quickly because not only the internal data but also the index have all been restored to their original state.


One thing to note is that when specifying a mounted table, it must be described in the form of database name, user name, and table name.

SELECT * from newdb.sys.B WHERE 여러조건들

As shown above, you can access data by specifying the administrator user name sys and table name B in the table newdb. (Of course, note that data areas mounted in this way are always read-only.)


What a revolutionary data management interface is this?!!

I dare say that this concept of backing up a specific space in a database and immediately accessing the data through “mounting” it without a restore process is one of the most innovative approaches in database history!


Unmount steps

Now that you have finished using all the data, you can completely forget about its existence using the unmount command as shown below.

UNMOUNT DATABASE newdb;


Advantages and Considerations of Mounted Databases

Mounted databases are almost identical to online tables in terms of data extraction, but there are a few things to keep in mind.


Ultra-fast data restoration time

It's literally the blink of an eye. Even if you have backed up several terabytes of data, the mount quickly (a few seconds at most) becomes accessible as an additional database. You will quickly realize how revolutionary this is when you think about restoring database files backed up in existing mysql or oracle and extracting specific data.


Index maintenance for ultra-fast data access

The mounted database itself is fully prepared with time series indexes for high-speed data access. This means that the internal form of the data is completely identical to the data at the moment of backup, in line with the original purpose of data restoration.


Maintains perfectly identical rollup structure

One of the biggest features that Macbase boasts is that it can provide statistical results in real time for an arbitrary time range. If a rollup table is created, all the same data structures are maintained during backup, providing a completely identical data access environment where long-term statistical data can be obtained even when mounted.


Read only

As the title suggests, mounted tables are read-only. You may be mistaken in remembering that file systems in Unix systems can be mounted in write mode.


Therefore, you can assume that only SELECT queries are possible. This is natural from the data management/service perspective of the backup copy, but if you want to use the backup copy for reading/writing, you must perform a restoration process (Resotre) that overwrites the entire database.


In the case of Macbase Neo, it can be restored as follows. (For the classic version, please refer to the corresponding manual.)

machbase-neo restore --data machbase-home directory backup file-directory name

ex) machbase-neo restore --data /data/machbase_home /tmp/backup

V$mount table _STAT not available

V$XX_STAT, a tag-specific statistics table updated in real time, is not supported. This is also a subject for future improvement, but in most cases, if the purpose is mainly to extract data from tags, it may not be a big problem.


If the size of the data being backed up is very large, it may be a good idea to download V$AllTables_STAT separately and refer to it before backup.



Take advantage of backup

It's time to try this revolutionary mount feature rather than just see it!


1. Install Neo and get neo-apps source code

This installation-related content is exactly the same as the previous blog content. For the actual installation method, refer to steps 1 (Neo installation) and 2 (neo-apps download) in the link in the previous blog, and make sure to obtain the latest Macbase Neo version after 8.0.18-rc3a if possible. And then, just come back here. (From number 2 onwards, just follow the instructions below again)


When installing Machbase Neo, if possible, use version 8.0.18-rc4 or later.


2. Check backup-mount directory


3. Create schema and confirm data entry

In the backup-mount demo, two tables are created, there is one tag for each table, and one data is entered per minute from January 1, 2023 to March 31, 2023.


Therefore, after opening 0-Schema Data.wrk below, let's create a schema in order and enter data. Data input is prepared in advance through TQL.

If you press the red buttons above in order, schema creation and data entry will be completed for 129,600 cases for each table. (In case 1, two tables are created, so place the cursor on each line and click twice.)


You can check the number of entered data as shown below. (Available only in versions 8.0.18-rc4 and later)


And, if you open 1-Chart online.dsh, you can check the dashboard with 3 months' worth of data entered as shown below. (This is an average chart of all data without rollup, so it may take a bit of time)

4. Taste full backup and mount

Now that we have all our data, let's back up the entire database and actually mount it.

You can try the above process by opening 2-Full Backup and Mount.wrk and performing the steps in order.


All Database backup

BACKUP DATABASE INTO DISK='/tmp/mybkup2';

Executing the above command backs up all current databases to the given directory. Please note that this backup is an online backup and therefore has no effect on other operations.


If you actually look at the directory, you can see that it consists of multiple files as shown below.

sjkim@gamestar:~$ ls /tmp/mybkup2 -l
total 348
drwxrwxr-x   8 sjkim sjkim   4096 May 25 11:16 TAG_TABLESPACE
-rw-rw-r--   1 sjkim sjkim   1148 May 25 11:16 backup.dat
-rw-rw-rw-   1 sjkim sjkim   2722 May 25 11:16 backup.trc
drwxrwxr-x 103 sjkim sjkim   4096 May 25 11:16 machbase_backup_19700101090000_20240525111644_27
-rw-r--r--   1 sjkim sjkim 139264 May 25 11:16 meta.dbs-0
-rw-r--r--   1 sjkim sjkim 143360 May 25 11:16 meta.dbs-1
-rw-r--r--   1 sjkim sjkim   8192 May 25 11:16 meta.dbs-2
-rw-r--r--   1 sjkim sjkim  36864 May 25 11:16 meta.dbs-3
-rw-r--r--   1 sjkim sjkim      0 May 25 11:16 meta.dbs-4
-rw-r--r--   1 sjkim sjkim  12288 May 25 11:16 meta.dbs-5

Check mount and data

Let's test whether Macbase Neo can mount the backed up database file internally as a new database instance. The command is as follows, and the new database name is mountdb.

MOUNT DATABASE '/tmp/mybkup2' to mountdb;

In less than 1 second, this command is declared successful.


If you check in Neo's menu, the list shows that MOUNTDB exists as shown below. It has become a mount!

Now, if you run various queries in SHELL to see if the data is properly entered, you can successfully check the data as follows.


Note that the table name below is not mytab_A, but mountdb.sys.mytab_A, which is a mounted database.


Unmount

When the data verification process is completed and this database is no longer needed, traces can be easily removed using the unmount command as shown below.

UMOUNT DATABASE mountdb;

Screenshot

If you access the table that has already been unmounted above, you can see that an error occurs.


Now we understand a novel and convenient way to manage IIoT data using the backup/mount/unmount feature!


5. Backup and Mount Time Range Database

If we backed up the entire database in step 4, now let's back up all tables that exist in a specific time range. It can be seen at a glance that specifying this time range is a very useful function from the perspective of data managers classifying and arranging data according to their own data management policies. The demo uses the prepared worksheet 3-Time Range DB Backup and Mount.wrk.


Time range database backup

The syntax of this backup is as follows.

BACKUP DATABASE FROM starting time TO end time into DISK=Target folder;

Very simple. The given backup sample is an example of backing up all data for the month of January to the mybkup3 folder, as shown below.

BACKUP DATABASE FROM TO_DATE('2023-01-01 00:00:00','YYYY-MM-DD HH24:MI:SS')
                TO   TO_DATE('2023-01-31 23:59:59','YYYY-MM-DD HH24:MI:SS') 
                INTO  DISK='/tmp/mybkup3';
                         
DELETE FROM mytab_B before TO_DATE('2023-01-31 23:59:59','YYYY-MM-DD HH24:MI:SS');

After backing up, delete all January data in Table B for testing purposes. As a result, in the online database, Table A holds data for January, February, and March, Table B only holds data for February and March, and the backed up database contains all data.


Mount and check data

As shown below, you can see that January's data has disappeared from mytab_B in the online database, and only January's data is stored intact in the mounted mytab_B table.

Unmount

As before, you can easily remove traces using the unmount command as shown below.

UMOUNT DATABASE mountdb;

A wave of emotion comes over me. This means that you can check the data by mounting the database backed up in a specific time range in the desired format at the desired time, and have it disappear immediately when necessary!


6. Backup and mount specific tables

Now, let's back up not the entire database but a specific tag table with a specific time range, and mount it to check the data. Please refer to this example as it is provided in 4-Time Range Table Backup and Mount.wrk.


Time range table backup

The syntax of this backup is as follows. (Of course, this is also possible without a time range)

BACKUP TABLE 테이블명 FROM 시작시간 TO 끝시간 into DISK=대상폴더;

Very simple. The given backup sample is an example of backing up all March data of table mytab_A to the mybkup4 folder, as shown below.

BACKUP TABLE mytab_A FROM TO_DATE('2023-03-01 00:00:00','YYYY-MM-DD HH24:MI:SS')
                     TO   TO_DATE('2023-03-31 23:59:59','YYYY-MM-DD HH24:MI:SS')
                     INTO DISK = '/tmp/mybkup4';

Check mount and table data

The mount method is the same as before, as shown below, and as shown in the results below, you can see that mytab_A in the backed-up space has only 3 months' worth of data entered, and mytab-A in the online space has all 3 months' data entered.

Unmount

As before, you can easily remove traces using the unmount command as shown below.

UMOUNT DATABASE mountdb;

In other words, I saw with my own eyes that I could back up not only the entire database or the database in a time range, but also a specific time range of a specific table, and mount it with the desired database name at the desired time to check the data at any time.


7. Mount multiple items at the same time

Lastly, let's mount not just one but multiple backups at the same time and visualize them. For this purpose, execute the prepared files 5-Mount All.wrk, 6-Chart All.dsh, and 7-Umount All.wrk in that order.


Mount multiple

Let's back up the three files backed up so far at once, as shown in the SQL below.










As shown above, we confirmed that multiple mounts could be successfully mounted at the same time. Additionally, you can check the layout of the table as shown below. (Available in 8.0.18-rc4 or higher)


























Visualise data distribution

Now, if you open chart 6 to see the distribution of the entire data, it is as follows. (It takes some time to print. This is a chart that prints all data.)


As shown in the figure above, the first backup (MOUNT-2 Tables) contains all data from January, February, and March of the two tables, and the second backup (MOUNT-3 Tables) contains only data from January. In the case of the third backup (MOUNT-4 Tables), you can see that table A contains only data from March.


다중 마운트 테이블 데이터 패턴 확인
Check multimount table data pattern

In conclusion, by using Macbase Neo's backup/mount function, you can not only back up data under various conditions, but also mount the prepared backup file at high speed at the desired time to immediately search, extract, and convert data. You can receive powerful features.


Unmount

Proceed as follows in the same way as before.

umount database mountdb2;
umount database mountdb3;
umount database mountdb4;


Considerations for establishing a corporate data management policy

Decision point

In order to establish a data management policy at the corporate level, various options depending on the corporate environment must be considered, and there will be roughly the following decision-making points.

  • What should be the backup destination?

    • Database-wide target

    • Target specific table

  • What is the backup cycle?

    • Once a month

    • Once per quarter

    • Once a half year

  • How to access backup data?

    • Mount and access online database (intensive service load)

    • Building and accessing database for mount (service load distribution)

  • How to configure backup data storage?

    • Take advantage of storage on online servers

    • Go for cheap storage (HDD, S3)

  • What is the mount policy?

    • Who will perform the mount operation for the desired backup file?

    • Will you implement and provide an automatic mounting function?

    • When will the mounted items be unmounted, and who will manage them?

Let's not forget that the core foundation of the above decision-making considerations is the "mount function that allows anyone to easily access backed-up data at any time." Without this function, the service itself to data consumers would have to be provided in a completely different form.


Data Services What-If Scenario

Although the below is a fictitious representation of a specific company's service type, I will share it as it is useful as a reference.

"Currently, the company stores a large amount of data from the XX manufacturing process in real time. The data is utilized for various data services under various conditions from a "data analyst organization" to improve manufacturing quality, and for this purpose, convenient, fast, and innovative internal The data service model was constructed as follows."


  • Tag

    • Total 100,000

  • Date gathering speed

    • 50,000 to 90,000 / per second

  • Data managing OS

    • Windows 10 Server

    • Machbase Neo latest version

  • Backup destination

    • Entire database (including rollup)

  • Backup cycle

    • once a month

  • Backup data storage

    • Primary SDD backup (guaranteed backup performance)

    • Move to and delete secondary storage device (HDD)

  • How to access mount data

    • Consider ways to avoid interfering with online data processing servers

      • Establishment of separate Windows server and connection to secondary storage device

    • If necessary, mount the backup folder and notify that service preparation is complete.

      • Notify organization of access URL and mount database name analysis

  • Mount policy

    • Data requests from data analytics organizations

    • Mount/unmount performed by field personnel

  • Mount data access subject

    • In-house data analyst organization

    • Collect the desired data at the desired time through Rest API and SQL

      • Perform quality analysis in own analysis software

    • Notify and unmount field personnel upon completion of analysis work



END

The “Mount” function of Macbase Neo introduced so far is a truly innovative function from a big data management and service perspective. In particular, if online service data cannot be stored in one place indefinitely and the data needs to be backed up in some form while irregular access to the backup data is required, I am sure that it will be difficult to find a more convenient method than this "mount" function.


Conditions will vary from company to company, but I will end this here in the hope that Macbase Neo's mount function will be of some help in internal data innovation and service quality improvement.

조회수 23회
bottom of page