Unified Type Processing and Writing Model for RDBMS - UTPW

Background

In the field of data integration, reading and writing data are the main problems that need to be solved. It is particularly important to design a unified type processing and writing paradigm for heterogeneous data sources. This article mainly proposes a unified type processing and writing model UTPW (Unified Type Process and Write), which makes it more flexible and efficient to write data into databases using JDBC standard drivers.

UTPW Achieves Goals

  • Transparent type handling: When developers write data based on the JDBC API, they don’t need to worry about data types as the model handles it automatically. After applying UTPW, developers only need to be familiar with basic PreparedStatement usage for transparent type handling.
  • High compatibility with various database JDBC drivers: There are some differences in the implementation of JDBC drivers from different vendors and their adherence to the JDBC standard is also not the same. For example, Oracle and SQL Server are incompatible with java.sql.Types.
  • Accurate, complete, and intelligent type handling: The unified model can handle complete types corresponding to databases such as PostgreSQL’s geographic information types. In case of illegal format writing, it can also locate data anomalies in advance and provide accurate error prompts.

Overall model design

Supplementary explanation about schema service

UTPW focuses only on unified type processing and writing. For obtaining column metadata, it is generally obtained through the metadata service. In theory, the unified column types in UTPW can also be included in the implementation of the schema service, but we will not expand on this here and only focus on UTPW itself. The schema service in the architecture can be understood as a pre-implemented metadata service used for obtaining necessary column metadata.

UTPW architecture

image.png

Strict type name&General type name

  • strict type name: The data type information obtained from the database schema metadata table is strictly matched.
  • general type name: The generalized type name defined by the UTPW model requires implementation to be consistent with the strict type name as much as possible, unless some dynamic types cannot be consistent. This will be mentioned when discussing the core three elements of Unified Field Type below.

Unified Field Type Core Three Elements Definition

The most important thing on the architecture diagram is the definition of the three basic elements in the Unified Field:

  • **General Type Code: **The JDBC standard has defined some type codes in java.sql.Types, but different databases have their own implementations. The reason why it is called a general type code here means that this type code is generic and can include the implementation of database vendors themselves. Due to the constraints of the JDBC standard PreparedStatement input parameters, this code must be an integer, so we can define it uniformly here. When executing PreparedStatement.setObject, we can use this code to let the driver recognize this type. Even if it is a type implemented by a database vendor itself, the vendor’s driver will definitely be able to recognize it, such as OracleTypes and PostgreSQL’s type oid.
  • General Type Name: This is the bridge between the associated column metadata and the Unified Field Type. By default, the model requires that the general type name be strictly consistent with the data type in the database metadata table. However, some database data types may have dynamic precision and scale associated with columns. In this case, Unified Field Type only needs to provide a findByStrictDbTypeName method, which is also defined as a top-level interface in UTPW model. The following figure shows an example of Oracle’s dynamic type.

image.png

  • Type Handler: This is obviously specific type handling. The specific TypeHandler implementation is handled by the UTPW model, which requires understanding of the specific database driver implementation in order to implement it effectively. For example, the Oracle driver supports setObject for directly writing some OracleTypes, while PostgreSQL supports PGObject for set through PreparedStatement. We will mention this again when discussing the specific JDBC implementation of drivers later on to deepen awareness.

UTPW Model Specification

  • The Unified Field Type must contain the three core elements: Only by including the three core elements can UTPW’s core capabilities be completed.
  • The general type name must be able to map one-to-one with the database metadata data type: This ensures that applications can obtain unified field types based on column metadata.
  • Priority should be given to driver-level type definitions: UTPW does not require driver-implemented type definitions to follow java.sql.Types. If a complete type definition is available at the driver implementation level, then Unified Field Type should prioritize using it. Confirm that the driver’s type definition conforms to JDBC type standards before considering using java.sql.Types to define Unified Field Type.
  • Data source implementation separation: Due to dependencies on drivers, in order to avoid too many dependency packages when used as third-party packages, each data source’s UTPW model implementation needs to be separated and can be referenced separately.
  • Drivers need to support JDBC API for using setObject for uniform setting: When using UTPW types, use setObject for uniform writing. Supporting JDBC API means setObject can use SQLType or int-type code.

image.png

  • Prefer to use setObject of SQLType: Oracle, MySQL and other databases have their own built-in Type types with better support for automatic conversion by the driver. If you use int type for setObject, some types may not be handled well. For example, if MysqlType is not used, YEAR type in MySQL will use DATE’s type code which is difficult to handle.

Core design concept

  • Type handling driver delegation: The type handling that can be used at the driver level will be utilized as much as possible. The key to this concept is that the implementation of the type handler will fully utilize the database driver’s processing, such as Oracle’s OracleTypes and PostgreSQL’s PGObject. We will provide some examples when introducing driver implementation later in this document. This delegation method has the following advantages:
    • Driver version independence: As long as the top-level object API of the driver and JDBC API do not change, changes in specific type handling details are transparent to UTPW.
    • More accurate: You can also do some type handling yourself, but generally speaking, official driver’s implementations are more accurate.
  • Unified JDBC standard writing: The UTPW model requires using PreparedStatement.setObject(Object val,int typeCode) JDBC API for unified writing. The main benefits of doing so are:
    • Simple: Users only need to throw any Object into the UTPW model for automatic processing.
    • Good compatibility: It does not require drivers to have very complete and deep compatibility with JDBC APIs; only implementing PreparedStatement.setObject method is enough for applying UTPW.

UTPW model usage restrictions

The driver needs to support the writing method of PreparedStatement in JDBC standard

The UTPW model is only applicable to drivers that implement the JDBC standard (support writing based on PreparedStatement.setObject method, specifying object and type code). Writing methods for new types of databases such as ElasticSearch, Kafka, and some new OLAP databases do not have a unified standard and do not comply with the JDBC API standard. Therefore, this model is not suitable. UTPW does not require database drivers to be highly compatible with the JDBC standard, but only needs to support writing using PreparedStatement.setObject(Object val,int typeCode) method.

TIPS: What kind of database driver/Client complies with the JDBC standard?
This is explained in the Java Doc. Simply put, if a driver or client implements the JDBC API, a Java application can access the database by implementing the JDBC API driver, which means that this database driver complies with the JDBC standard. However, there may be some differences in how well each vendor adheres to these standards. The interface definition of the JDBC API is mainly stored in java.sql and javax.sql, with the most important classes being java.sql.Types and java.sql.PreparedStatement.

The driver implementation needs to be backward compatible

The implementation of the UTPW model relies on driver support, including some type objects defined therein. If the driver is not backward compatible, it may cause compilation errors for some types. Generally, UTPW models are implemented with backward-compatible drivers and it is recommended to use the latest version.

Mainstream RDB driver implementation

This section will introduce the mainstream RBD driver implementation and combine it with the driver implementation to understand the design of UTPW.

MySQL

Source code interpretation

  • Source code version:8.0.22
  • Core class:
    • ClientPreparedStatement.java
    • MysqlType

Find the implementation class ClientPreparedStatement of JDBC API, and drill down to see its setObject method implementation in AbstractQueryBindings. The screenshot of part of the class is shown in the following figure. By looking at the source code, we can obtain key information as follows:

  • MySQL driver recognizes types as java.sql.Types and custom type MysqlTypes: Compared with other databases, mysql type system is not that complicated. There are no built-in complex type objects at the driver level. In setObject implementation, data conversion is completed by two basic steps:
    • Recognize value’s type
    • Recognize mysql type
  • Good compatibility for String values: When judging that a value is of String type, step 2 will recognize mysql type and then do targeted conversion mainly for time which will be automatically processed.
  • Driver’s custom type code completely maps jdbc type code: com.mysql.cj.MysqlType and java.sql.Types are mapped one-to-one. Unified field types can directly use MysqlType. When using setObject, using either java.sql.Types’ code or MysqlType’s code has the same effect.
  • No built-in complex object internal types in MySQL: The MySQL type system is relatively simple, and there are no built-in types defined specifically for complex types at the driver level.
  • setObject accepts java.sql.Types’ codes and implements JDBC API: The received type code will be converted into MysqlType before continuing executing.

image.png

Unified Type Implementation

Due to MySQL’s good support for writing String type, we can use a single StringTypeHandler to handle most types uniformly.
image.png

Oracle

Source code interpretation

  • Source code version:ojdbc8-19.8.0.0.jar
  • Core class:
    • oracle.jdbc.driver.OraclePreparedStatement
    • oracle.jdbc.OracleTypes

Similarly, let’s first look at the entry point OraclePreparedStatement of setObject. Overall, the type processing process is quite different from MySQL, and the overall steps are:

  1. First determine the type code.
  2. Determine the type of val and perform conversion.

Key points to understand in the source code:

  • Priority should be given to using Oracle built-in types for complex types, such as INTERVALDS: Although Oracle provides some conversion methods for complex types, they are basically converted based on several overloaded constructors of built-in types. This implicit conversion is not very controllable in terms of compatibility. Therefore, when implementing type handlers, priority should be given to using the driver’s built-in objects.
  • Oracle supports built-in complex type objects: The oracle driver layer has some built-in complex type objects and provides some basic convert capabilities. With the ability of the driver, we can use these complex objects to implement type handlers.
  • Strictly use built-in complex objects for setObject type processing: Complex objects must be written through the driver and must use built-in complex objects.
  • setObject strongly depends on customizing type code implementation by the driver: The definition of type code at the driver level is compatible with java.sql.Types. For Oracle, java.sql.Types is a subset of oracle.jdbc.OracleTypes. When calling setObject directly recognizes the custom-defined type code by the driver. Therefore, when defining Unified Field Type, it is necessary to reference the types defined in the driver to avoid certain types being unable to write normally if java.sql.Types is used directly without mapping at the driver level.
  • The top-level interface implements JDBC API that accepts Oracle Driver-defined Type Code: This requires us to define unified field types using oracle drivers’ codes; otherwise, some data may not be able to write.

image.png

Unified Type Implementation

  • Define complex object handlers using built-in objects: Oracle has many complex types, and in the implementation of Unified Type, its built-in objects are mainly used to process input object values, which can be directly used when setObject is called.
  • Internal type must be used to define Unified Field Type: The driver uses the custom type code defined by the driver for setObject. Therefore, when defining unified field types, you also need to use the type code in OracleType.

image.png

PostgreSQL

Source code interpretation

  • Source code version:postgresql-42.2.20.jar
  • Core class:
    • org.postgresql.jdbc.PgPreparedStatement
    • org.postgresql.core.Oid
    • org.postgresql.util.PGObject

Similarly, let’s first look at the entry point of setObject in PgPreparedStatement. Overall, the type processing process is:

  1. First determine the type code.
  2. Determine the type of val and perform conversion.

Core information to understand when reading source code:

  • The built-in type codes of PG are defined in Oid: Oid is not an implementation class of java sql Types, but independently defined by PG.
  • setObject can rely entirely on java.sql.Types: Although the oid type code at the driver level and java.sql.Types are incompatible, only Oid is used to identify the column types of pg and only works with PGObject internally implemented at the driver level. The setObject method can fully recognize java.sql.Types. For some complex types associated with internal type codes in Oid, they can be mapped through Type.OTHER. Therefore, when defining UTPW field types, just use java.sql.Types.
  • setObject receives type codes from java.sql.Types and implements JDBC API: The driver implementation will automatically convert associated built-in Oid type codes based on PGObject.
  • Supports writing built-in objects as well as built-in object setObject: Built-in objects of PG are all implemented by PGObject. setObject supports writing for PGObject and is automatically processed by the driver. When processing and identifying PGObjects internally within the driver, it uses Oid type code. We can implement a type handler for built-in objects like Oracle does to handle complex types.

image.png

image.png

Unified Type Implementation

After understanding the type code of PG and the principle of setObject, it is easy to implement the UTPW model of PG.

  • Common types can be handled with StringTypeHandler and BinaryTypeHandler. Complex objects can be implemented with corresponding type handlers like Oracle.
  • Since setObject receives java.sql.Types, type definition can be defined using java.sql.Types. Oid is only automatically used by the driver when recognizing PgObject internally.

image.png

SqlServer

Source code interpretation

  • Source code version:mssql-jdbc-9.4.0.jre8.jar
  • Core class:
    • com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement
    • com.microsoft.sqlserver.jdbc.JavaType: SqlServer will automatically recognize the input object type belongs to what Java type.
    • com.microsoft.sqlserver.jdbc.JDBCType: This is a jdbc type used for writing to the database in real time.

Similarly, let’s first look at the entry point of setObject, PgPreparedStatement. Compared to other drivers, SqlServer’s object handling defines more classes and is more complex. However, overall the type processing flow can still be seen clearly:

  1. Receive java type and jdbc type
  2. Make some preliminary judgments, verifications and conversions for various types
  3. Finally write using the implementation class of DTV object and execute executeOp method

image.png

image.png
At the source code level, it is mainly about understanding the logic of setObject and the processing of built-in types. The summary is as follows:

  • Custom JDBCType and JavaType: Sqlserver driver defines two types as important parameters in setObject. JavaType defines the java type to which the input Object belongs, while JDBCType defines the type at the database level.
  • The setObject method depends on com.microsoft.sqlserver.jdbc.JDBCType’s code: The type code of com.microsoft.sqlserver.jdbc.JDBCType at the driver level is compatible with java.sql.Types. When implementing setObject, the driver also supports java.sql.Types. However, if only java.sql.Types is used, some types cannot be processed in setObject. Therefore, when defining Unified Field Type, you need to directly use int code from com.microsoft.sqlserver.jdbc.JDBCType.
  • The implementation of setObject method receives type code in java.sql.Types through JDBC API: The implementation logic is similar to PG; internally it recognizes and converts input object types.
  • Support for built-in types: mainly related to geographic types; refer to SQLServerSpatialDatatype’s implementation class. SqlServer does not support Array-type data structures like PG does.

image.png

Unified Type Implementation

After understanding the type definition at the level of SqlServer driver, it becomes clear that when defining UnifiedFieldType, you can use the type code defined in JDBCType already defined at the SqlServer level. This type code includes java.sql.Type.

Summary of type codes that setObject depends on

When we implement the unified field type, we need to determine how to customize the unified field type based on the usage of type code through setObject at the driver level. For different databases defining unified field types that rely on classes, it is recommended to use the source class associated with setObject’s type code. The summary is as follows.

Database setObject关联的type code来源类 Whether setObject implements the JDBC API
MySQL com.mysql.cj.MysqlType Yes
Oracle oracle.jdbc.OracleType Yes
PostgreSQL java.sql.Types Yes
SqlServer com.microsoft.sqlserver.jdbc.JDBCType Yes

Summary

This article introduces the core design concept and implementation approach of the UTPW model. However, it is still inevitable to have a certain amount of workload for specific implementation:

  • Check the implementation of the driver to confirm which built-in objects it contains. This is used to define the type handler for processing complex objects.
  • Check the implementation of the driver to confirm what class the type code associated with setObject comes from. This is used to define a unified field type.
  • Check the implementation of the driver to confirm whether its setObject associated with JDBC API has been implemented.
  • Check the implementation of the driver to confirm what implicit conversion specifications are supported by default when inputting String and binary types, so that a common type handler can be used for most simple types.