.. Automatically generated by code2rst.py
Edit src/jsonb.c not this file!
.. currentmodule:: apsw
JSON (Javascript Object Notation)
*********************************
SQLite provides extensive functionality for working with JSON. APSW
complements that.
What is JSON?
=============
`Javascript Object Notation `__ (:rfc:`8259`)
is a **TEXT** based format for representing data, encoded in `UTF-8
`__. It is deliberately
constrained in what can be represented, and as a result is very widely
supported across languages and platforms.
Types
-----
.. list-table::
:header-rows: 1
:widths: auto
* - Javascript
- Python
- Example
* - null
- :class:`None`
- .. code-block:: json
null
* - boolean
- :class:`bool`
- .. code-block:: json
true
* - string - surrounding double quotes with some characters backslash escaped
such as double quote and forward slash, while Unicode codepoints can be
backslash-u escaped allowing for the JSON text to be all ASCII. Codepoint
U+2603 is snowman ☃
- :class:`str`
- .. code-block:: json
"hello \" world \/ \u2603"
* - number
- :class:`int` and :class:`float`
- .. code-block:: json
72
-3.14E7
* - array - array contents can be any JSON types
- :class:`list`
- .. code-block:: json
[null, 3.2, "hello"]
* - object - keys must be strings, but values can be any JSON types
- :class:`dict`
- .. code-block:: json
{
"description": "Orange",
"price": 3.07,
"varieties": ["fall",
{"long": "...",
"short": 3443}
]
}
What is not in JSON
-------------------
Versioning and extension mechanisms
There is no version number, nor is there a way to extend the format
to add additional functionality within the standard.
Comments
There is deliberately no syntax for comments.
Whitespace sensitivity
Although the examples above use indents and spacing for readability,
JSON ignores all whitespace. Machine generated JSON usually omits
whitespace, while JSON intended for human consumption has whitespace
added.
Dates and times
There is no native date or time representation. The most common
approach is to use `ISO8601
`__ formatted strings which
look like ``2025-09-30T09:45:00Z`` for UTC and
``2025-09-30T01:15:00-07:00`` with a timezone.
Binary data
Binary data can't be included in a text format. The most common approach
is to use `base 64 `__ strings which
look like ``bGlnaHQgd29yaw==``
Infinity and NaN
There is no explicit representation for infinity or for `Not A Number
`__ which arise from floating point
calculations. For example ``1e500`` (1 with 500 zeroes) multiplied by
itself is too large to be represented in the most common `64 bit
floating point `__, while
subtracting infinity from itself produces ``NaN``.
Because infinity and NaN can occur in calculations, many JSON libraries
will produce and accept them despite not being in the JSON standard.
See below for how SQLite deals with infinity and NaN.
String normalization
Unicode allows different representations of what will appear as the same
text. For example an ``e`` with an accent can be represented as a single
codepoint, or as two with the ``e`` and the combining accent separately.
JSON makes no requirements on strings, with all implementations usually
producing and accepting the strings as is. You can read about the
subject `here `__.
Trailing commas
Trailing commas will not be accepted or produced by JSON libraries. For example
a list like ``[1, 2, 3,]`` or an object like ``{"one": 1, "two": 2,}`` are not
valid.
Explicit limits
The standard does not say how long strings can be, or how many items
can be in an array or object. There is no limit on how many digits
can be used in numbers nor a minimum or maximum precision. It is
common for implementations to have limits especially 64 bits for
numbers. String limits may be 1 or 2 billion characters, and arrays /
objects be limited to a similar number of members.
Python has a 64 bit limit on floating point numbers when using
:class:`float` but :mod:`decimal` is unlimited, and has no limit
on integers, strings, arrays, or objects other than available memory.
SQLite has an upper limit of 2GB for strings, uses signed 64 bit
integers, and standard 64 bit floating point.
If data is large, then other representations are more appropriate.
Object (dict) key order or duplicates
While arrays (list) are ordered, there is no specification for what
order object keys are in, or that duplicates are not allowed. This
usually doesn't matter, but there are security attacks where one
component may use the first occurrence of a duplicate key, while
another component uses the last occurrence. For example the SQLite
function extracting values will use the first occurrence of a key,
while a dict created from the object will use the last occurrence.
The Python :mod:`json` module, and APSW both let you see objects
as lists of keys and values so you can do your own validation or
other processing.
JSON5
=====
`JSON5 `__ is a superset of JSON intended to be
more human readable and writable. SQLite will accept JSON5 encoded
text, but will never produce it. While SQLite parses JSON5, you
can't get back JSON5 output from a JSON5 input.
For example JSON5 allows comments, hexadecimal numbers, trailing
commas, infinity and NaN, and omitting some quoting,
Using JSON
==========
Python
------
The standard library :mod:`json` module provides all necessary
functionality, including turning Python objects into JSON text, and
JSON text into Python objects. You can read and write JSON text, or a
:term:`file object`.
It deviates from the standard:
* ``Infinity`` and ``NaN`` are produced and consumed by default,
although there is a keyword argument to turn it off
* When producing JSON objects, keys that are numbers, None, or
boolean are turned into their corresponding JSON text representation.
When reading an object back, the reverse transformation is not
done since there is no way to know if that is intended,
* Various corner cases in Unicode / UTF8 are accepted such as
unpaired surrogates and UTF8 encoded surrogates. This was done
because other implementations at the time could produce this
kind of encoding. Attempting to encode the resulting strings as UTF-8
again will result in exceptions.
You can see a `full list of JSON issues
`__.
SQLite
------
SQLite has over `30 functions `__ for
consuming, extracting, iterating, and producing JSON. You will need
to ensure that what you get back is what is intended. You can usually
get back the JSON text representation of values, or the SQLite
value. For example a SQLite string is the same as a Python
string, while the JSON text representation includes double quotes
around it and various quoting inside. (:ref:`Example
`)
You can store JSON text directly in the database, but there is no way
to differentiate it from any other text value. For example the number
``2`` in JSON is text ``2``. The `json_valid
`__ function may help - for
example as a `CHECK constraint
`__ on a
column.
Infinity and NaN
----------------
SQLite accepts infinity but represents it as the floating point value
``9e999`` and accepts NaN representing it as ``null`` (None). Unfortunately
``9e999`` is a valid value for :class:`decimal.Decimal` as well as
``numpy.float128``, so you won't be able to tell if infinity was the
original value
.. _jsonb:
JSONB
=====
SQLite has a binary format for representing JSON - named JSONB. It is
`specified here `__. It is
significantly faster to use because JSON text requires finding
matching quotes around strings, square brackets around arrays, curly
braces around objects, and ensuring numeric values are valid. JSONB
has already done all that processing so accessing and extracting
members is a lot quicker. It also saves some space.
In most cases using SQLite JSON text functions results in SQLite doing an
internal conversion to JSONB (which is cached) and then operating on that.
JSONB internally stores values as a binary tag and length, then the UTF8 text
so producing JSON text again is quick.
You can store JSONB to the database, and again can use the `json_valid
`__ function as `CHECK constraint
`__ with the
value ``8``.
.. note::
JSONB does not have a version number or any header explicitly
identifying binary data as JSONB. There is no checksum or similar
validation. As an example a single byte whose value is 0 through 12
is valid JSONB.
As byte sequences get longer, the likelihood they are valid JSONB
decreases. This table lists what proportion of all byte sequences
of each length are also valid JSONB.
.. list-table:: Valid JSONB sequences
:header-rows: 1
:widths: auto
* - Sequence length
- Proportion valid JSONB
* - 1 byte
- 3.52%
* - 2 bytes
- 0.71%
* - 3 bytes
- 0.35%
* - 4 bytes
- 0.18%
* - 5 bytes
- 0.10%
* - 6 bytes
- 0.05%
* - 7 bytes
- 0.03%
* - 8 bytes
- 0.02%
* - 9 bytes
- 0.01%
* - 10+ bytes
- very low
.. _apsw_jsonb:
APSW
----
APSW provides 2 functions for working directly with JSONB, and a
validation function. This is for performance reasons so that there
is no need for an intermediate step representing objects as JSON text.
The validation function is stricter than SQLite's equivalent to avoid
false positives.
Performance testing was done using SQLite's randomjson code to create
a large object with many nested values- your objects will be
different.
:func:`~apsw.jsonb_encode`
Converts a Python object directly to JSONB. The alternative is
two steps using :mod:`json` to convert to JSON text and then
SQLite's internal JSON text to JSONB.
.. list-table:: Test results (CPU time)
:widths: auto
* - 0.13 seconds
- APSW Python object direct to JSONB
* - 1.20 seconds
- :mod:`json` same Python object to JSON text
* - 0.80 seconds
- SQLite that JSON text to JSONB
The same parameters as :func:`json.dumps` are available, with more
providing control over how non-string object keys are converted,
type matching, and direct conversion to JSONB types can be done.
:func:`~apsw.jsonb_decode`
Converts JSONB directly back to a Python object. The alternative
is two steps using SQLite's internal JSONB to JSON text and then
:mod:`json` to convert the JSON text to a Python object.
.. list-table:: Test results (CPU time)
:widths: auto
* - 0.48 seconds
- APSW JSONB direct to Python object
* - 0.22 seconds
- SQLite same JSONB to JSON text
* - 1.35 seconds
- :mod:`json` that JSON text to Python object
The same parameters as :func:`json.loads` are available, with an additional
hook for arrays (lists).
:func:`~apsw.jsonb_detect`
Returns a boolean if some binary data is valid JSONB.
If this returns ``True`` then SQLite will always produce valid JSON from the
JSONB.
SQLite's `json_valid `__ only
checks the various internal type and length fields are consistent
and items seem reasonable. It does not check all corner cases, or
the UTF8 encoding, and so can produce invalid JSON even if
json_valid said it was valid JSONB.
Notes
=====
Because SQLite has a 2GB limit on text or blobs (binary data), it
can't work with individual JSON text or JSONB data over that size.
.. _jsontype:
JSON as a SQLite value type
===========================
You can make SQLite automatically support JSON as though it was a
natively supported type. :ref:`Example code `
that does these steps.
:ref:`Store JSONB `
SQLite's binary JSON representation is stored as a binary blob in the
database. This is necessary because JSON text can't easily be
distinguished from other text, while a blob is far less
ambiguous.
SQLite often operates on JSONB internally when using
the `JSON functions `__, and its
``json`` function can turn JSONB into JSON text format if needed.
SQLite's functions provide full access to all the values inside JSON
and JSONB for reading, iterating, qnd modifying.
Convert bindings
The :attr:`~Cursor.convert_binding` callback for SQLite unknown types
can :func:`encode them ` as JSONB.
Convert JSONB
The :attr:`~Cursor.convert_jsonb` callback is called when a blob would
be returned and is also valid JSONB. You can :func:`decode it `
or return the blob. The cursor is provided so you can examine
the :attr:`~Cursor.description` to help decide.
JSONB API
=========
.. method:: jsonb_decode(data: collections.abc.Buffer, *, object_pairs_hook: typing.Callable[[list[tuple[str, JSONBTypes | typing.Any]]], typing.Any] | None = None, object_hook: typing.Callable[[dict[str, JSONBTypes | typing.Any]], typing.Any] | None = None, array_hook: typing.Callable[[list[JSONBTypes | typing.Any]], typing.Any] | None = None, parse_int: typing.Callable[[str], typing.Any] | None = None, parse_float: typing.Callable[[str], typing.Any] | None = None,) -> typing.Any
Decodes JSONB binary data into a Python object. It is like :func:`json.loads`
but operating on JSONB binary source instead of a JSON text source.
:param data: Binary data to decode
:param object_pairs_hook: Called after a JSON object has been
decoded with a list of tuples, each consisting of a
:class:`str` and corresponding value, and should return a
replacement value to use instead.
:param object_hook: Called after a JSON object has been decoded
into a Python :class:`dict` and should return a replacement
value to use instead.
:param array_hook: Called after a JSON array has been decoded into
a list, and should return a replacement value to use instead.
:param parse_int: Called with a :class:`str` of the integer, and
should return a value to use. The default is :class:`int`.
If the integer is hexadecimal then it will be called with a
second parameter of 16.
:param parse_float: Called with a :class:`str` of the float, and
should return a value to use. The default is :class:`float`.
Only one of ``object_hook`` or ``object_pairs_hook`` can be
provided. ``object_pairs_hook`` is useful when you want something
other than a dict, care about the order of keys, want to convert
them first (eg case, numbers, normalization), want to handle duplicate
keys etc.
The array, int, and float hooks let you use alternate implementations.
For example if you are using `numpy
`__ then you
could use numpy arrays instead of lists, or numpy's float128 to get
higher precision floating numbers with greater exponent range than the
builtin float type.
If you use :class:`types.MappingProxyType` as ``object_hook`` and
:class:`tuple` as ``array_hook`` then the overall returned value
will be immutable (read only).
.. note::
The data is always validated during decode. There is no need to
separately call :func:`~apsw.jsonb_detect`.
.. method:: jsonb_detect(data: collections.abc.Buffer) -> bool
Returns ``True`` if data is valid JSONB, otherwise ``False``. If this returns
``True`` then SQLite will produce valid JSON from it.
SQLite's json_valid only checks the various internal type and length fields are consistent
and items seem reasonable. It does not check all corner cases, or the UTF8
encoding, and so can produce invalid JSON even if json_valid said it was valid JSONB.
.. note::
:func:`~apsw.jsonb_decode` always validates the data as it decodes, so there is no
need to call this function separately. This function is useful for determining if
some data is valid, and not some other binary format such as an image.
.. method:: jsonb_encode(obj: typing.Any, *, skipkeys: bool = False, sort_keys: bool = False, check_circular: bool = True, exact_types: bool = False, default: typing.Callable[[typing.Any], JSONBTypes | collections.abc.Buffer] | None = None, default_key: typing.Callable[[typing.Any], str] | None = None, allow_nan:bool = True) -> bytes
Encodes a Python object as JSONB. It is like :func:`json.dumps` except it produces
JSONB bytes instead of JSON text.
:param obj: Object to encode
:param skipkeys: If ``True`` and a non-string dict key is
encountered then it is skipped. Otherwise :exc:`ValueError`
is raised. Default ``False``. Like :func:`json.dumps` keys
that are bool, int, float, and None are always converted to
string.
:param sort_keys: If ``True`` then objects (dict) will be output
with the keys sorted. This produces deterministic output.
Default ``False``.
:param check_circular: Detects if containers contain themselves
(even indirectly) and raises :exc:`ValueError`. If ``False``
and there is a circular reference, you eventually get
:exc:`RecursionError` (or run out of memory or similar).
:param default: Called if an object can't be encoded, and should
return an object that can be encoded. If not provided a
:exc:`TypeError` is raised.
It can also return binary data in JSONB format. For example
:mod:`decimal` values can be encoded as a full precision JSONB
float. :func:`apsw.ext.make_jsonb` can be used.
:param default_key: Objects (dict) must have string keys. If a
non-string key is encountered, it is skipped if ``skipkeys``
is ``True``. Otherwise this is called. If not supplied the
default matches the standard library :mod:`json` which
converts None, bool, int and float to their string JSON
equivalents and uses those. This callback is useful if
you want to raise an exception, or use a different way
of generating the key string.
:param allow_nan: If ``True`` (default) then following SQLite practise,
infinity is converted to float ``9e999`` and NaN is converted
to ``None``. If ``False`` a :exc:`ValueError` is raised.
:param exact_types: By default subclasses of int, float, list (including
tuple), dict (including :class:`collections.abc.Mapping`), and
:class:`str` are converted the same as the parent class. This
is usually what you want. However sometimes you are using a
subclass and want them converted by the ``default`` function
with an example being :class:`enum.IntEnum`. If this parameter
is ``True`` then only the exact types are directly converted
and subclasses will be passed to ``default`` or ``default_key``.
You will get a :exc:`~apsw.TooBigError` if the resulting JSONB
will exceed 2GB because SQLite can't handle it.