1. Starting the writing
A time series database is a database system that specializes in efficiently storing and processing large amounts of time series data, unlike a conventional relational database (RDBMS).
Although it has been more than 10 years since time series databases appeared in the DBMS field, the reality is that ‘time series databases’ are still an unfamiliar concept to most IT professionals.
In order for time series databases, which are still in the initial market stage in Korea, to spread, understanding the necessity and value of time series databases will come first, rather than understanding their performance and functions. In particular, since Machbase is commercial software, this part is bound to be even more important.
Below we will explain ‘Why do we need a time series database?’ by focusing on the limitations of existing relational databases and the economic effects that a time series database can bring.
2. Definition and characteristics of time series database engine
A general definition of a time series database is a database system optimized for storing, managing, and analyzing data that changes over time. It is mainly optimized for handling data collected along a time axis, such as sensor data, financial transaction records, and log data.
To explain the differences from relational databases, let’s summarize the main characteristics and functions of time series databases.
2.1 High-speed input and query of large volumes of data
Time series databases are designed to focus on quickly entering and retrieving large amounts of time series data. This enables ‘real-time monitoring and analysis’ and ‘range query optimization’ in the following terms.
2.2 Real-time monitoring and analysis
The ultra-fast input and retrieval capabilities of time series databases allow data to be collected, processed, and analyzed in real time. This translates directly into faster business decisions and operational efficiency.
This means you can continuously process incoming data in real time for instant analysis, raise alarms or triggers for an immediate response, and visually represent real-time data to easily spot trends and patterns.
2.3 Range query optimization
Querying time series data takes the form of looking at data within a specific time range. Time series databases are designed not only to quickly retrieve data in a specific time range with time-based indexes, but also to optimise for range queries by providing the ability to roll up or aggregate data by time intervals to quickly extract only the data you need.
2.4 Data Compression
Because time series data typically deals with large amounts of data, managing storage space for it is essential. Time series databases have excellent compression capabilities, which can reduce storage by a third compared to relational databases.
2.5 Data Retention and Management
Due to the nature of time series data, which deals with large amounts of data, the ability to manage stale data is essential. You can set retention periods for your data and have it automatically delete unnecessary data or archive it for backup. Machbase also has a separate feature called ‘mount’ that allows you to quickly view backed up data.
3. Limitations of existing database engines
Traditional database engines, primarily relational database systems(RDBMS), have a number of limitations when it comes to handling time series data. This makes a difference in data processing performance, scalability, and real-time analytics capabilities.
3.1 Difficulties in processing time series data
Inadequacy of data model
Time series data has the characteristic of continuously adding data that changes over time, which does not fit well with the basic structure of relational databases that store data using a normalized table structure. Continuous addition of time-based data frequently requires index reorganization and table expansion, which causes performance degradation.
The burden of high-frequency data entry
Time series data is entered at high frequency. Existing relational databases generate a lot of overhead, such as transaction logs and index updates, for each input, which causes performance degradation.
3.2 Performance and scalability issues
Limitations of Indexing
Relational databases typically use B+Tree-based indexes. B+Tree-based indexes are not optimized for time series data retrieval over a specific time range, which results in poor index retrieval performance.
The difficulty of scaling out
Relational databases are more suitable for vertical scaling (scaling up) than horizontal scaling (scaling out). When the amount of time series data increases explosively, hardware upgrades alone to improve the performance of the database server have limitations. Relational databases have difficulty horizontally scaling, making it difficult to cope with situations where the amount of data increases explosively.
3.3 Difficulties in real-time analysis
Delayed data processing
Relational databases are mainly suitable for batch processing and are not suitable for streaming data processing. Therefore, when configuring streaming processing to continuously update and query data for real-time analysis, it causes a high load and delays response time.
Inefficiencies in setting real-time alarms and triggers
Although the ability to detect events in real time and generate alarms can be implemented through triggers and stored procedures in existing relational databases, there are limitations in processing high-frequency data input in real time.
4. Economic implications of time series database engines
The economic benefits of a time series database as a tool for efficiently managing and analyzing time series data can be summarised as cost savings, increased efficiency, and new business opportunities.
4.1 Cost savings
Reduce storage costs
The time series database engine can save storage space with automatic data compression. This makes a big difference when storing high-frequency data such as vibration data. For reference, Machbase can compress a CSV file to 30% of the size of an RDBMS.
Reduce hardware costs
Processing large amounts of time series data with traditional relational databases requires a lot of hardware resources, but using time series databases can simplify the architecture through efficient data processing and storage structures. For example, in a project with ETRI, Machbase redesigned a system that used three servers based on Hadoop to run on one server.
Reduce operating costs
Time series database engines typically provide automated data management, scaling, backup, and recovery capabilities to increase operational efficiency. Machbase Neo takes this a step further with a built-in scheduler to automate repetitive tasks at the engine level.
4.2 Improvement through increased efficiency
Real-time data processing
Time series database engines can provide real-time data collection, processing, and analysis functions to support rapid business decision-making, and can play an important role in providing services tailored to customer needs and improving service levels. For example, real-time transaction data in financial markets can be analyzed to manage risk and make investment decisions. Markbase has been used as a database for price analysis in KOSCOM’s investment system for professional investors.
Improving the accuracy of data analysis
High-frequency data input and query, real-time data analysis helps to derive data-based insights, optimize business strategies, and solve individual problems. For example, in manufacturing, sensor data is analyzed in real time to monitor the status of equipment and is linked to quality issues of individual products, shortening the pre-preparation time and reducing the defect rate, thereby reducing the number of products that need to be discarded when defects occur. Minimizing Quality costs can be minimized.
4.3 Creating new business opportunities
Business Innovation
Time series data analysis can be used to develop new business models and enhance competitiveness in the market. For example, location information can be converted into latitude and longitude data through GPS, and this can be used to analyze the speed and distance traveled by the vehicle to create a new product that charges insurance premiums based on the driving distance. Machbase has been applied to Carrot Insurance to create an insurance product that charges insurance premiums based on the driving distance by installing an IoT device in the vehicle’s cigarette lighter.
Predictive analytics and decision support
Time series database engines can predict the future by analyzing patterns in large amounts of data and support business decisions for this. For example, in the energy management field, time series data can be analyzed to predict energy demand and establish an efficient energy supply plan.
Implementing smart technology
Recently, new types of IoT (Internet of Things) devices are continuously appearing. In order for the functions of IoT devices that use many sensors to be properly implemented, a time series database has become an essential element in the system configuration. Using new types of IoT devices and the sensor data collected from these devices, various applications are being carried out. Can be created Smart technology can create new businesses.
5. Major application cases of time series database engines
Time series database engines play an important role in various industries. There are application cases on the Machbase homepage, but representative cases are as follows.
5.1 Financial Industry
Real-time transaction data analysis
In financial institutions, time series databases are not used to manage transaction data itself, but rather to analyze transaction data, and are used to analyze transaction data in real time to understand market volatility in real time and make quick investment decisions. For example, in the stock market, it is used to monitor stock price fluctuations in real time and optimize trading timing through algorithmic trading.
Abnormal Transaction Detection
Time series databases are also used in anomaly detection systems. They are used to detect abnormal activity and prevent fraudulent transactions by analyzing transaction patterns in real time.
5.2 Manufacturing and IoT
Monitoring production processes and implementing smart factories
In the manufacturing industry, data collected from various devices and systems on the production line can be managed and analyzed in real time to reduce the occurrence of defective products and maximize the operational efficiency of the factory.
Equipment Predictive Maintenance
By analyzing equipment data collected through IoT devices, the status of the equipment can be monitored in real time, failures can be predicted, and preventive maintenance can be performed to maximize equipment uptime and reduce maintenance costs.
5.3 Energy Management
Smart Grid Operation
In the energy management field, time series databases can be used to analyze power usage patterns in real time in smart grid systems and optimize power supply and demand to improve energy efficiency.
Renewable Energy Management
Unlike conventional power plants, renewable energy generation such as solar panels and wind turbines requires operation management based on data analysis. For example, by predicting the amount of sunlight from solar panels or the wind speed from wind turbines through weather data analysis, power generation can be predicted in advance, and by analyzing power generation and demand together, a time series database can be used to make plans on how to utilize battery storage systems or existing fossil fuel power plants when power generation is insufficient.
6. Conclusion
Time series databases still occupy a small portion of the overall DBMS market, but their importance and utilization are rapidly increasing. In particular, with the recent development of AI, the demand for machine learning is increasing, and the need for processing large amounts of data is becoming more specific. (Because data cannot be discarded for machine learning.)
On the other hand, in any case, the ability to process large amounts of data itself cannot have economic significance. No matter how good a technology is, its value can change depending on how it is used, and the technology itself cannot generate financial effects.
However, recent use cases are showing that time series databases are not only a useful technology for processing large amounts of data but can also have significant economic value.
As mentioned at the beginning, time series databases are still a technology that only a few people in the IT industry have a proper understanding of. On the other hand, its utilization is growing without us even realizing it.
I end this article with the hope that this new technology will be quickly put to use by many people and create high economic value for companies that apply and utilize it.