JSON (Javascript Object Notation)

SQLite provides extensive functionality for working with JSON. APSW complements that.

What is JSON?

Javascript Object Notation (RFC 8259) is a TEXT based format for representing data, encoded in UTF-8. It is deliberately constrained in what can be represented, and as a result is very widely supported across languages and platforms.

Types

Javascript

Python

Example

null

None

null

boolean

bool

true

string - surrounding double quotes with some characters backslash escaped such as double quote and forward slash, while Unicode codepoints can be backslash-u escaped allowing for the JSON text to be all ASCII. Codepoint U+2603 is snowman ☃

str

"hello \" world \/ \u2603"

number

int and float

72
-3.14E7

array - array contents can be any JSON types

list

[null, 3.2, "hello"]

object - keys must be strings, but values can be any JSON types

dict

{
  "description": "Orange",
  "price": 3.07,
  "varieties": ["fall",
                  {"long": "...",
                   "short": 3443}
               ]
}

What is not in JSON

Versioning and extension mechanisms

There is no version number, nor is there a way to extend the format to add additional functionality within the standard.

Comments

There is deliberately no syntax for comments.

Whitespace sensitivity

Although the examples above use indents and spacing for readability, JSON ignores all whitespace. Machine generated JSON usually omits whitespace, while JSON intended for human consumption has whitespace added.

Dates and times

There is no native date or time representation. The most common approach is to use ISO8601 formatted strings which look like 2025-09-30T09:45:00Z for UTC and 2025-09-30T01:15:00-07:00 with a timezone.

Binary data

Binary data can’t be included in a text format. The most common approach is to use base 64 strings which look like bGlnaHQgd29yaw==

Infinity and NaN

There is no explicit representation for infinity or for Not A Number which arise from floating point calculations. For example 1e500 (1 with 500 zeroes) multiplied by itself is too large to be represented in the most common 64 bit floating point, while subtracting infinity from itself produces NaN.

Because infinity and NaN can occur in calculations, many JSON libraries will produce and accept them despite not being in the JSON standard. See below for how SQLite deals with infinity and NaN.

String normalization

Unicode allows different representations of what will appear as the same text. For example an e with an accent can be represented as a single codepoint, or as two with the e and the combining accent separately. JSON makes no requirements on strings, with all implementations usually producing and accepting the strings as is. You can read about the subject here.

Trailing commas

Trailing commas will not be accepted or produced by JSON libraries. For example a list like [1, 2, 3,] or an object like {"one": 1, "two": 2,} are not valid.

Explicit limits

The standard does not say how long strings can be, or how many items can be in an array or object. There is no limit on how many digits can be used in numbers nor a minimum or maximum precision. It is common for implementations to have limits especially 64 bits for numbers. String limits may be 1 or 2 billion characters, and arrays / objects be limited to a similar number of members.

Python has a 64 bit limit on floating point numbers when using float but decimal is unlimited, and has no limit on integers, strings, arrays, or objects other than available memory.

SQLite has an upper limit of 2GB for strings, uses signed 64 bit integers, and standard 64 bit floating point.

If data is large, then other representations are more appropriate.

Object (dict) key order or duplicates

While arrays (list) are ordered, there is no specification for what order object keys are in, or that duplicates are not allowed. This usually doesn’t matter, but there are security attacks where one component may use the first occurrence of a duplicate key, while another component uses the last occurrence. For example the SQLite function extracting values will use the first occurrence of a key, while a dict created from the object will use the last occurrence.

The Python json module, and APSW both let you see objects as lists of keys and values so you can do your own validation or other processing.

JSON5

JSON5 is a superset of JSON intended to be more human readable and writable. SQLite will accept JSON5 encoded text, but will never produce it. While SQLite parses JSON5, you can’t get back JSON5 output from a JSON5 input.

For example JSON5 allows comments, hexadecimal numbers, trailing commas, infinity and NaN, and omitting some quoting,

Using JSON

Python

The standard library json module provides all necessary functionality, including turning Python objects into JSON text, and JSON text into Python objects. You can read and write JSON text, or a file object.

It deviates from the standard:

  • Infinity and NaN are produced and consumed by default, although there is a keyword argument to turn it off

  • When producing JSON objects, keys that are numbers, None, or boolean are turned into their corresponding JSON text representation. When reading an object back, the reverse transformation is not done since there is no way to know if that is intended,

  • Various corner cases in Unicode / UTF8 are accepted such as unpaired surrogates and UTF8 encoded surrogates. This was done because other implementations at the time could produce this kind of encoding. Attempting to encode the resulting strings as UTF-8 again will result in exceptions.

You can see a full list of JSON issues.

SQLite

SQLite has over 30 functions for consuming, extracting, iterating, and producing JSON. You will need to ensure that what you get back is what is intended. You can usually get back the JSON text representation of values, or the SQLite value. For example a SQLite string is the same as a Python string, while the JSON text representation includes double quotes around it and various quoting inside. (Example)

You can store JSON text directly in the database, but there is no way to differentiate it from any other text value. For example the number 2 in JSON is text 2. The json_valid function may help - for example as a CHECK constraint on a column.

Infinity and NaN

SQLite accepts infinity but represents it as the floating point value 9e999 and accepts NaN representing it as null (None). Unfortunately 9e999 is a valid value for decimal.Decimal as well as numpy.float128, so you won’t be able to tell if infinity was the original value

JSONB

SQLite has a binary format for representing JSON - named JSONB. It is specified here. It is significantly faster to use because JSON text requires finding matching quotes around strings, square brackets around arrays, curly braces around objects, and ensuring numeric values are valid. JSONB has already done all that processing so accessing and extracting members is a lot quicker. It also saves some space.

In most cases using SQLite JSON text functions results in SQLite doing an internal conversion to JSONB (which is cached) and then operating on that. JSONB internally stores values as a binary tag and length, then the UTF8 text so producing JSON text again is quick.

You can store JSONB to the database, and again can use the json_valid function as CHECK constraint with the value 8.

Note

JSONB does not have a version number or any header explicitly identifying binary data as JSONB. There is no checksum or similar validation. As an example a single byte whose value is 0 through 12 is valid JSONB.

As byte sequences get longer, the likelihood they are valid JSONB decreases. This table lists what proportion of all byte sequences of each length are also valid JSONB.

Valid JSONB sequences

Sequence length

Proportion valid JSONB

1 byte

3.52%

2 bytes

0.71%

3 bytes

0.35%

4 bytes

0.18%

5 bytes

0.10%

6 bytes

0.05%

7 bytes

0.03%

8 bytes

0.02%

9 bytes

0.01%

10+ bytes

very low

APSW

APSW provides 2 functions for working directly with JSONB, and a validation function. This is for performance reasons so that there is no need for an intermediate step representing objects as JSON text. The validation function is stricter than SQLite’s equivalent to avoid false positives.

Performance testing was done using SQLite’s randomjson code to create a large object with many nested values- your objects will be different.

jsonb_encode()

Converts a Python object directly to JSONB. The alternative is two steps using json to convert to JSON text and then SQLite’s internal JSON text to JSONB.

Test results (CPU time)

0.13 seconds

APSW Python object direct to JSONB

1.20 seconds

json same Python object to JSON text

0.80 seconds

SQLite that JSON text to JSONB

The same parameters as json.dumps() are available, with more providing control over how non-string object keys are converted, type matching, and direct conversion to JSONB types can be done.

jsonb_decode()

Converts JSONB directly back to a Python object. The alternative is two steps using SQLite’s internal JSONB to JSON text and then json to convert the JSON text to a Python object.

Test results (CPU time)

0.48 seconds

APSW JSONB direct to Python object

0.22 seconds

SQLite same JSONB to JSON text

1.35 seconds

json that JSON text to Python object

The same parameters as json.loads() are available, with an additional hook for arrays (lists).

jsonb_detect()

Returns a boolean if some binary data is valid JSONB. If this returns True then SQLite will always produce valid JSON from the JSONB.

SQLite’s json_valid only checks the various internal type and length fields are consistent and items seem reasonable. It does not check all corner cases, or the UTF8 encoding, and so can produce invalid JSON even if json_valid said it was valid JSONB.

Notes

Because SQLite has a 2GB limit on text or blobs (binary data), it can’t work with individual JSON text or JSONB data over that size.

JSON as a SQLite value type

You can make SQLite automatically support JSON as though it was a natively supported type. Example code that does these steps.

Store JSONB

SQLite’s binary JSON representation is stored as a binary blob in the database. This is necessary because JSON text can’t easily be distinguished from other text, while a blob is far less ambiguous.

SQLite often operates on JSONB internally when using the JSON functions, and its json function can turn JSONB into JSON text format if needed.

SQLite’s functions provide full access to all the values inside JSON and JSONB for reading, iterating, qnd modifying.

Convert bindings

The convert_binding callback for SQLite unknown types can encode them as JSONB.

Convert JSONB

The convert_jsonb callback is called when a blob would be returned and is also valid JSONB. You can decode it or return the blob. The cursor is provided so you can examine the description to help decide.

JSONB API

apsw.jsonb_decode(data: Buffer, *, object_pairs_hook: Callable[[list[tuple[str, JSONBTypes | Any]]], Any] | None = None, object_hook: Callable[[dict[str, JSONBTypes | Any]], Any] | None = None, array_hook: Callable[[list[JSONBTypes | Any]], Any] | None = None, parse_int: Callable[[str], Any] | None = None, parse_float: Callable[[str], Any] | None = None) Any

Decodes JSONB binary data into a Python object. It is like json.loads() but operating on JSONB binary source instead of a JSON text source.

Parameters:
  • data – Binary data to decode

  • object_pairs_hook – Called after a JSON object has been decoded with a list of tuples, each consisting of a str and corresponding value, and should return a replacement value to use instead.

  • object_hook – Called after a JSON object has been decoded into a Python dict and should return a replacement value to use instead.

  • array_hook – Called after a JSON array has been decoded into a list, and should return a replacement value to use instead.

  • parse_int – Called with a str of the integer, and should return a value to use. The default is int. If the integer is hexadecimal then it will be called with a second parameter of 16.

  • parse_float – Called with a str of the float, and should return a value to use. The default is float.

Only one of object_hook or object_pairs_hook can be provided. object_pairs_hook is useful when you want something other than a dict, care about the order of keys, want to convert them first (eg case, numbers, normalization), want to handle duplicate keys etc.

The array, int, and float hooks let you use alternate implementations. For example if you are using numpy then you could use numpy arrays instead of lists, or numpy’s float128 to get higher precision floating numbers with greater exponent range than the builtin float type.

If you use types.MappingProxyType as object_hook and tuple as array_hook then the overall returned value will be immutable (read only).

Note

The data is always validated during decode. There is no need to separately call jsonb_detect().

apsw.jsonb_detect(data: Buffer) bool

Returns True if data is valid JSONB, otherwise False. If this returns True then SQLite will produce valid JSON from it.

SQLite’s json_valid only checks the various internal type and length fields are consistent and items seem reasonable. It does not check all corner cases, or the UTF8 encoding, and so can produce invalid JSON even if json_valid said it was valid JSONB.

Note

jsonb_decode() always validates the data as it decodes, so there is no need to call this function separately. This function is useful for determining if some data is valid, and not some other binary format such as an image.

apsw.jsonb_encode(obj: Any, *, skipkeys: bool = False, sort_keys: bool = False, check_circular: bool = True, exact_types: bool = False, default: Callable[[Any], JSONBTypes | Buffer] | None = None, default_key: Callable[[Any], str] | None = None, allow_nan: bool = True) bytes

Encodes a Python object as JSONB. It is like json.dumps() except it produces JSONB bytes instead of JSON text.

Parameters:
  • obj – Object to encode

  • skipkeys – If True and a non-string dict key is encountered then it is skipped. Otherwise ValueError is raised. Default False. Like json.dumps() keys that are bool, int, float, and None are always converted to string.

  • sort_keys – If True then objects (dict) will be output with the keys sorted. This produces deterministic output. Default False.

  • check_circular – Detects if containers contain themselves (even indirectly) and raises ValueError. If False and there is a circular reference, you eventually get RecursionError (or run out of memory or similar).

  • default

    Called if an object can’t be encoded, and should return an object that can be encoded. If not provided a TypeError is raised.

    It can also return binary data in JSONB format. For example decimal values can be encoded as a full precision JSONB float. apsw.ext.make_jsonb() can be used.

  • default_key – Objects (dict) must have string keys. If a non-string key is encountered, it is skipped if skipkeys is True. Otherwise this is called. If not supplied the default matches the standard library json which converts None, bool, int and float to their string JSON equivalents and uses those. This callback is useful if you want to raise an exception, or use a different way of generating the key string.

  • allow_nan – If True (default) then following SQLite practise, infinity is converted to float 9e999 and NaN is converted to None. If False a ValueError is raised.

  • exact_types – By default subclasses of int, float, list (including tuple), dict (including collections.abc.Mapping), and str are converted the same as the parent class. This is usually what you want. However sometimes you are using a subclass and want them converted by the default function with an example being enum.IntEnum. If this parameter is True then only the exact types are directly converted and subclasses will be passed to default or default_key.

You will get a TooBigError if the resulting JSONB will exceed 2GB because SQLite can’t handle it.