.. Automatically generated by code2rst.py Edit src/jsonb.c not this file! .. currentmodule:: apsw JSON (Javascript Object Notation) ********************************* SQLite provides extensive functionality for working with JSON. APSW complements that. What is JSON? ============= `Javascript Object Notation `__ (:rfc:`8259`) is a **TEXT** based format for representing data, encoded in `UTF-8 `__. It is deliberately constrained in what can be represented, and as a result is very widely supported across languages and platforms. Types ----- .. list-table:: :header-rows: 1 :widths: auto * - Javascript - Python - Example * - null - :class:`None` - .. code-block:: json null * - boolean - :class:`bool` - .. code-block:: json true * - string - surrounding double quotes with some characters backslash escaped such as double quote and forward slash, while Unicode codepoints can be backslash-u escaped allowing for the JSON text to be all ASCII. Codepoint U+2603 is snowman ☃ - :class:`str` - .. code-block:: json "hello \" world \/ \u2603" * - number - :class:`int` and :class:`float` - .. code-block:: json 72 -3.14E7 * - array - array contents can be any JSON types - :class:`list` - .. code-block:: json [null, 3.2, "hello"] * - object - keys must be strings, but values can be any JSON types - :class:`dict` - .. code-block:: json { "description": "Orange", "price": 3.07, "varieties": ["fall", {"long": "...", "short": 3443} ] } What is not in JSON ------------------- Versioning and extension mechanisms There is no version number, nor is there a way to extend the format to add additional functionality within the standard. Comments There is deliberately no syntax for comments. Whitespace sensitivity Although the examples above use indents and spacing for readability, JSON ignores all whitespace. Machine generated JSON usually omits whitespace, while JSON intended for human consumption has whitespace added. Dates and times There is no native date or time representation. The most common approach is to use `ISO8601 `__ formatted strings which look like ``2025-09-30T09:45:00Z`` for UTC and ``2025-09-30T01:15:00-07:00`` with a timezone. Binary data Binary data can't be included in a text format. The most common approach is to use `base 64 `__ strings which look like ``bGlnaHQgd29yaw==`` Infinity and NaN There is no explicit representation for infinity or for `Not A Number `__ which arise from floating point calculations. For example ``1e500`` (1 with 500 zeroes) multiplied by itself is too large to be represented in the most common `64 bit floating point `__, while subtracting infinity from itself produces ``NaN``. Because infinity and NaN can occur in calculations, many JSON libraries will produce and accept them despite not being in the JSON standard. See below for how SQLite deals with infinity and NaN. String normalization Unicode allows different representations of what will appear as the same text. For example an ``e`` with an accent can be represented as a single codepoint, or as two with the ``e`` and the combining accent separately. JSON makes no requirements on strings, with all implementations usually producing and accepting the strings as is. You can read about the subject `here `__. Trailing commas Trailing commas will not be accepted or produced by JSON libraries. For example a list like ``[1, 2, 3,]`` or an object like ``{"one": 1, "two": 2,}`` are not valid. Explicit limits The standard does not say how long strings can be, or how many items can be in an array or object. There is no limit on how many digits can be used in numbers nor a minimum or maximum precision. It is common for implementations to have limits especially 64 bits for numbers. String limits may be 1 or 2 billion characters, and arrays / objects be limited to a similar number of members. Python has a 64 bit limit on floating point numbers when using :class:`float` but :mod:`decimal` is unlimited, and has no limit on integers, strings, arrays, or objects other than available memory. SQLite has an upper limit of 2GB for strings, uses signed 64 bit integers, and standard 64 bit floating point. If data is large, then other representations are more appropriate. Object (dict) key order or duplicates While arrays (list) are ordered, there is no specification for what order object keys are in, or that duplicates are not allowed. This usually doesn't matter, but there are security attacks where one component may use the first occurrence of a duplicate key, while another component uses the last occurrence. For example the SQLite function extracting values will use the first occurrence of a key, while a dict created from the object will use the last occurrence. The Python :mod:`json` module, and APSW both let you see objects as lists of keys and values so you can do your own validation or other processing. JSON5 ===== `JSON5 `__ is a superset of JSON intended to be more human readable and writable. SQLite will accept JSON5 encoded text, but will never produce it. While SQLite parses JSON5, you can't get back JSON5 output from a JSON5 input. For example JSON5 allows comments, hexadecimal numbers, trailing commas, infinity and NaN, and omitting some quoting, Using JSON ========== Python ------ The standard library :mod:`json` module provides all necessary functionality, including turning Python objects into JSON text, and JSON text into Python objects. You can read and write JSON text, or a :term:`file object`. It deviates from the standard: * ``Infinity`` and ``NaN`` are produced and consumed by default, although there is a keyword argument to turn it off * When producing JSON objects, keys that are numbers, None, or boolean are turned into their corresponding JSON text representation. When reading an object back, the reverse transformation is not done since there is no way to know if that is intended, * Various corner cases in Unicode / UTF8 are accepted such as unpaired surrogates and UTF8 encoded surrogates. This was done because other implementations at the time could produce this kind of encoding. Attempting to encode the resulting strings as UTF-8 again will result in exceptions. You can see a `full list of JSON issues `__. SQLite ------ SQLite has over `30 functions `__ for consuming, extracting, iterating, and producing JSON. You will need to ensure that what you get back is what is intended. You can usually get back the JSON text representation of values, or the SQLite value. For example a SQLite string is the same as a Python string, while the JSON text representation includes double quotes around it and various quoting inside. (:ref:`Example `) You can store JSON text directly in the database, but there is no way to differentiate it from any other text value. For example the number ``2`` in JSON is text ``2``. The `json_valid `__ function may help - for example as a `CHECK constraint `__ on a column. Infinity and NaN ---------------- SQLite accepts infinity but represents it as the floating point value ``9e999`` and accepts NaN representing it as ``null`` (None). Unfortunately ``9e999`` is a valid value for :class:`decimal.Decimal` as well as ``numpy.float128``, so you won't be able to tell if infinity was the original value .. _jsonb: JSONB ===== SQLite has a binary format for representing JSON - named JSONB. It is `specified here `__. It is significantly faster to use because JSON text requires finding matching quotes around strings, square brackets around arrays, curly braces around objects, and ensuring numeric values are valid. JSONB has already done all that processing so accessing and extracting members is a lot quicker. It also saves some space. In most cases using SQLite JSON text functions results in SQLite doing an internal conversion to JSONB (which is cached) and then operating on that. JSONB internally stores values as a binary tag and length, then the UTF8 text so producing JSON text again is quick. You can store JSONB to the database, and again can use the `json_valid `__ function as `CHECK constraint `__ with the value ``8``. .. note:: JSONB does not have a version number or any header explicitly identifying binary data as JSONB. There is no checksum or similar validation. As an example a single byte whose value is 0 through 12 is valid JSONB. As byte sequences get longer, the likelihood they are valid JSONB decreases. This table lists what proportion of all byte sequences of each length are also valid JSONB. .. list-table:: Valid JSONB sequences :header-rows: 1 :widths: auto * - Sequence length - Proportion valid JSONB * - 1 byte - 3.52% * - 2 bytes - 0.71% * - 3 bytes - 0.35% * - 4 bytes - 0.18% * - 5 bytes - 0.10% * - 6 bytes - 0.05% * - 7 bytes - 0.03% * - 8 bytes - 0.02% * - 9 bytes - 0.01% * - 10+ bytes - very low .. _apsw_jsonb: APSW ---- APSW provides 2 functions for working directly with JSONB, and a validation function. This is for performance reasons so that there is no need for an intermediate step representing objects as JSON text. The validation function is stricter than SQLite's equivalent to avoid false positives. Performance testing was done using SQLite's randomjson code to create a large object with many nested values- your objects will be different. :func:`~apsw.jsonb_encode` Converts a Python object directly to JSONB. The alternative is two steps using :mod:`json` to convert to JSON text and then SQLite's internal JSON text to JSONB. .. list-table:: Test results (CPU time) :widths: auto * - 0.13 seconds - APSW Python object direct to JSONB * - 1.20 seconds - :mod:`json` same Python object to JSON text * - 0.80 seconds - SQLite that JSON text to JSONB The same parameters as :func:`json.dumps` are available, with more providing control over how non-string object keys are converted, type matching, and direct conversion to JSONB types can be done. :func:`~apsw.jsonb_decode` Converts JSONB directly back to a Python object. The alternative is two steps using SQLite's internal JSONB to JSON text and then :mod:`json` to convert the JSON text to a Python object. .. list-table:: Test results (CPU time) :widths: auto * - 0.48 seconds - APSW JSONB direct to Python object * - 0.22 seconds - SQLite same JSONB to JSON text * - 1.35 seconds - :mod:`json` that JSON text to Python object The same parameters as :func:`json.loads` are available, with an additional hook for arrays (lists). :func:`~apsw.jsonb_detect` Returns a boolean if some binary data is valid JSONB. If this returns ``True`` then SQLite will always produce valid JSON from the JSONB. SQLite's `json_valid `__ only checks the various internal type and length fields are consistent and items seem reasonable. It does not check all corner cases, or the UTF8 encoding, and so can produce invalid JSON even if json_valid said it was valid JSONB. Notes ===== Because SQLite has a 2GB limit on text or blobs (binary data), it can't work with individual JSON text or JSONB data over that size. .. _jsontype: JSON as a SQLite value type =========================== You can make SQLite automatically support JSON as though it was a natively supported type. :ref:`Example code ` that does these steps. :ref:`Store JSONB ` SQLite's binary JSON representation is stored as a binary blob in the database. This is necessary because JSON text can't easily be distinguished from other text, while a blob is far less ambiguous. SQLite often operates on JSONB internally when using the `JSON functions `__, and its ``json`` function can turn JSONB into JSON text format if needed. SQLite's functions provide full access to all the values inside JSON and JSONB for reading, iterating, qnd modifying. Convert bindings The :attr:`~Cursor.convert_binding` callback for SQLite unknown types can :func:`encode them ` as JSONB. Convert JSONB The :attr:`~Cursor.convert_jsonb` callback is called when a blob would be returned and is also valid JSONB. You can :func:`decode it ` or return the blob. The cursor is provided so you can examine the :attr:`~Cursor.description` to help decide. JSONB API ========= .. method:: jsonb_decode(data: collections.abc.Buffer, *, object_pairs_hook: typing.Callable[[list[tuple[str, JSONBTypes | typing.Any]]], typing.Any] | None = None, object_hook: typing.Callable[[dict[str, JSONBTypes | typing.Any]], typing.Any] | None = None, array_hook: typing.Callable[[list[JSONBTypes | typing.Any]], typing.Any] | None = None, parse_int: typing.Callable[[str], typing.Any] | None = None, parse_float: typing.Callable[[str], typing.Any] | None = None,) -> typing.Any Decodes JSONB binary data into a Python object. It is like :func:`json.loads` but operating on JSONB binary source instead of a JSON text source. :param data: Binary data to decode :param object_pairs_hook: Called after a JSON object has been decoded with a list of tuples, each consisting of a :class:`str` and corresponding value, and should return a replacement value to use instead. :param object_hook: Called after a JSON object has been decoded into a Python :class:`dict` and should return a replacement value to use instead. :param array_hook: Called after a JSON array has been decoded into a list, and should return a replacement value to use instead. :param parse_int: Called with a :class:`str` of the integer, and should return a value to use. The default is :class:`int`. If the integer is hexadecimal then it will be called with a second parameter of 16. :param parse_float: Called with a :class:`str` of the float, and should return a value to use. The default is :class:`float`. Only one of ``object_hook`` or ``object_pairs_hook`` can be provided. ``object_pairs_hook`` is useful when you want something other than a dict, care about the order of keys, want to convert them first (eg case, numbers, normalization), want to handle duplicate keys etc. The array, int, and float hooks let you use alternate implementations. For example if you are using `numpy `__ then you could use numpy arrays instead of lists, or numpy's float128 to get higher precision floating numbers with greater exponent range than the builtin float type. If you use :class:`types.MappingProxyType` as ``object_hook`` and :class:`tuple` as ``array_hook`` then the overall returned value will be immutable (read only). .. note:: The data is always validated during decode. There is no need to separately call :func:`~apsw.jsonb_detect`. .. method:: jsonb_detect(data: collections.abc.Buffer) -> bool Returns ``True`` if data is valid JSONB, otherwise ``False``. If this returns ``True`` then SQLite will produce valid JSON from it. SQLite's json_valid only checks the various internal type and length fields are consistent and items seem reasonable. It does not check all corner cases, or the UTF8 encoding, and so can produce invalid JSON even if json_valid said it was valid JSONB. .. note:: :func:`~apsw.jsonb_decode` always validates the data as it decodes, so there is no need to call this function separately. This function is useful for determining if some data is valid, and not some other binary format such as an image. .. method:: jsonb_encode(obj: typing.Any, *, skipkeys: bool = False, sort_keys: bool = False, check_circular: bool = True, exact_types: bool = False, default: typing.Callable[[typing.Any], JSONBTypes | collections.abc.Buffer] | None = None, default_key: typing.Callable[[typing.Any], str] | None = None, allow_nan:bool = True) -> bytes Encodes a Python object as JSONB. It is like :func:`json.dumps` except it produces JSONB bytes instead of JSON text. :param obj: Object to encode :param skipkeys: If ``True`` and a non-string dict key is encountered then it is skipped. Otherwise :exc:`ValueError` is raised. Default ``False``. Like :func:`json.dumps` keys that are bool, int, float, and None are always converted to string. :param sort_keys: If ``True`` then objects (dict) will be output with the keys sorted. This produces deterministic output. Default ``False``. :param check_circular: Detects if containers contain themselves (even indirectly) and raises :exc:`ValueError`. If ``False`` and there is a circular reference, you eventually get :exc:`RecursionError` (or run out of memory or similar). :param default: Called if an object can't be encoded, and should return an object that can be encoded. If not provided a :exc:`TypeError` is raised. It can also return binary data in JSONB format. For example :mod:`decimal` values can be encoded as a full precision JSONB float. :func:`apsw.ext.make_jsonb` can be used. :param default_key: Objects (dict) must have string keys. If a non-string key is encountered, it is skipped if ``skipkeys`` is ``True``. Otherwise this is called. If not supplied the default matches the standard library :mod:`json` which converts None, bool, int and float to their string JSON equivalents and uses those. This callback is useful if you want to raise an exception, or use a different way of generating the key string. :param allow_nan: If ``True`` (default) then following SQLite practise, infinity is converted to float ``9e999`` and NaN is converted to ``None``. If ``False`` a :exc:`ValueError` is raised. :param exact_types: By default subclasses of int, float, list (including tuple), dict (including :class:`collections.abc.Mapping`), and :class:`str` are converted the same as the parent class. This is usually what you want. However sometimes you are using a subclass and want them converted by the ``default`` function with an example being :class:`enum.IntEnum`. If this parameter is ``True`` then only the exact types are directly converted and subclasses will be passed to ``default`` or ``default_key``. You will get a :exc:`~apsw.TooBigError` if the resulting JSONB will exceed 2GB because SQLite can't handle it.