Like PayPay, other cashless payment services each have their unique QR code, with their own private specification. Now, the Japanese Government is taking steps to unify these various systems by introducing JPQR, a single, standardized QR code, that works across services, for participating merchants. For more information on the business side of JPQR, the data stored in the code and how to make use of it, please check out the official documentations  (in Japanese). In this article we will focus on technical challenges related to supporting JPQR at PayPay.

The problem

When adapting JPQR, we needed to solve a few technical challenges, among these, was parsing the JPQR code. For this there are two issues to be solved:

  • Converting the code to a data object
  • Validating the code

As mentioned before, PayPay mostly uses JVM languages like Java, so we will describe our solution using this language, but similar solution can be done with most languages.

Parsing JPQR as TLV

The format of JPQR is a TLV data structure. TLV stands for Type-Length-Value. It is a long string containing multiple records, where each record contains two meta fields, the first is the ID or type of the record, the second one contains the length of the value. The length of the meta fields is predefined for each TLV record, in case of JPQR it has a length of 2.

The value of a TLV field can be a new TLV on its own, so we can represent hierarchical data as well. For example JPQR uses this technique to represent the JPQR identification number (Type 26 in the below example) shown here with faded colors.

Because JPQR has only two fields which are further TLVs and they are always the same, a generic solution to parse hierarchical TLV is a bit of an overkill, instead, we can simply run TLV parsing again on these fields after parsing JPQR.

An example on how to parse TLV in Java can be found in this open source TLV library. Creating a custom implementation with String operations is also very simple.

CRC verification of JPQR

The last key (ID 63) contains a CRC checksum which can be used to verify the JPQR. CRC stands for Cyclic Redundancy Check and is a very common technique to verify blocks of data. It’s also used in communication and storage. Because of its popularity but otherwise inefficient standardization, there is a large number of CRC algorithms used today. A very good summary of CRCs is compiled in this article titled A Painless Guide to CRC Error Detection Algorithms by Ross N. Williams. 

When calculating verification checksums we can use any arithmetic operation that uses the complete data chunk and results in a much smaller data. An example is to calculate the modulo of the sum of bytes. The problem with an operation like sum is that small change in the verified data will result in a small change in the checksum. We want a small error in the data to have big effect on the checksum. The reason is that we might have an error both in the checksum as well as the data when transmitting the data. If we have two small erroneous changes the chance of them canceling each other out is bigger if both changes are small (e.g. involve a single bit each). Having big change in the checksum we can make our error check more resilient.

For this reason CRC uses XOR operation against a given bit series, called a polinom.  The operation is a simplified version of binary division and so it’s evaluated accordingly. Starting from the left side bit, perform xor each time the most left side bit is 1 and skip for 0. During calculation the intermediate result is called the register.Here is an example from Wikipedia using 1011 polynomial:

11010011101100 000 <--- input right padded by 3 bits, one bit less then the poly
1011               <--- poly used as divisor
 1100              <--- intermediate result, also called the register
 1011             
  1110           
  1011
   1011          
   1011
       1101        <--- note that the divisor moves over to align with the next 1 in the dividend (since quotient for that step was zero)
       1011             (in other words, it doesn't necessarily move one bit per iteration)
        1101     
        1011
         1100    
         1011
          1110   
          1011
           101 0
           101 1
-----------------
               100 <--- remainder (3 bits).  Division algorithm stops here as dividend is equal to zero. This will be the resulting CRC

Implementing this algorithm is quite simple but can be made more efficient. One way to do this is to pre-calculate checksum for sequences and use these instead of going trough bit by bit. For example we can pre calculate checksum for each possible byte which would result in a cache size of 255 but reduce the steps needed by 8 times. Larger cache size will be faster, but the size of cache is growing exponentially. 

Most CRC algorithms have the following characteristics:

  • Size of polynomial: the resulting CRC checksum will be 1 bit smaller. Bigger checksum is more robust but takes more storage space. In all cases makes sense to have a multiple of byte and also, multiple of 2: e.g. most CRC algorithms are 8 bit, 16 bit, 32 bit, etc. 
  • Type of polynomial: there is a large study behind choosing the right polynomial and the polynomial itself makes the number of varieties of CRCs very high
  • Initial value of CRC: it is usually fully 0 or fully 1 bits.
  • Data bit order: this flag specifies if order of the bits of each byte should be reversed. This is useful for verifying data packages for hardware that transmit data in reversed order
  • Register bit order: this flag specifies if bits of register (the calculated CRC) should be reversed before the final XOR stage (see next parameter)
  • Final XOR: some algorithms perform a final XOR against another given bit sequence. If this is a fully 0 bit, the CRC will not be changed

A reference of standardized algorithms can be found here, it lists 22 algorithms to calculate 16 bit CRCs.

JPQR uses the algorithm standardized under CRC-16/CCITT-FALSE or another name CRC-16/IBM-3740.

The full parameter list is as follows (the parentheses contains the name for variable used in most implementations):

Size of polynomial (width)16
Type of polynomial (poly)0x1021
Initial value of register (init)0xffff
Data bit order (refin)false
Register bit order (refout)false
Final XOR (xorout)0x0000

To verify the CRC of JPQR first we need to split the code into 2 parts: the data to be verified and the provided CRC. The CRC is a TLV record with a record ID and size field. Meta fields of CRC are part of the data that needs to be verified: Java CRC is a Java implementation based on the article by Ross N. Williams linked above. The code to verify JPQR would look like this:

public class JpqrService {
    private final CRC crcCalculator;
 
    public JpqrService() {
        crcCalculator = new CRC(CRC.Parameters.CCITT);
    }
 
    public boolean verifyCrc(String jpqr) {
        String dataToVerify = jpqr.substring(0, jpqr.length() - 4);
        String providedCrc = jpqr.substring(jpqr.length() - 4);
 
        long crc = crcCalculator.calculateCRC(dataToVerify.getBytes());
        String actualCrc = StringUtils.leftPad(Long.toHexString(crc).toUpperCase(), 4, "0");
        return providedCrc.equals(actualCrc);
    }
}

The library uses the name CCITT to implement CRC-16/CCITT-FALSE, not the most accurate naming (CCITT is an alias for CCITT-TRUE or KERMIT). The difference is the init parameters, which in the case of JPQR needs to be 0x1111. Saving the CRC object and reusing it gives us access to the caching and optimized performance. In the above example the CRC object is instantiated in the controller for simplicity, but optimally this should be done by a factory class or trough dependency injection to enhance testability.

Conclusion

Parsing JPQR is somewhat challenging but there are many open source tools available that can help simplify the task. Similar tools can be found for other programming languages as well.

We, at PayPay created custom implementation for both parsing the TLV data structure and for checking CRC to reduce dependency on third party libraries and licensing, also to achieve higher performance by eliminating flexible parametrization. QR code parsing is a core feature of PayPay used millions of times every day so every little performance enhancement matters.

References