JSON (Javascript Object Notation)
SQLite provides extensive functionality for working with JSON. APSW complements that.
What is JSON?
Javascript Object Notation (RFC 8259) is a TEXT based format for representing data, encoded in UTF-8. It is deliberately constrained in what can be represented, and as a result is very widely supported across languages and platforms.
Types
Javascript |
Python |
Example |
|---|---|---|
null |
|
null
|
boolean |
true
|
|
string - surrounding double quotes with some characters backslash escaped such as double quote and forward slash, while Unicode codepoints can be backslash-u escaped allowing for the JSON text to be all ASCII. Codepoint U+2603 is snowman ☃ |
"hello \" world \/ \u2603"
|
|
number |
72
-3.14E7
|
|
array - array contents can be any JSON types |
[null, 3.2, "hello"]
|
|
object - keys must be strings, but values can be any JSON types |
{
"description": "Orange",
"price": 3.07,
"varieties": ["fall",
{"long": "...",
"short": 3443}
]
}
|
What is not in JSON
Versioning and extension mechanisms
There is no version number, nor is there a way to extend the format to add additional functionality within the standard.
Comments
There is deliberately no syntax for comments.
Whitespace sensitivity
Although the examples above use indents and spacing for readability, JSON ignores all whitespace. Machine generated JSON usually omits whitespace, while JSON intended for human consumption has whitespace added.
Dates and times
There is no native date or time representation. The most common approach is to use ISO8601 formatted strings which look like
2025-09-30T09:45:00Zfor UTC and2025-09-30T01:15:00-07:00with a timezone.
Binary data
Binary data can’t be included in a text format. The most common approach is to use base 64 strings which look like
bGlnaHQgd29yaw==
Infinity and NaN
There is no explicit representation for infinity or for Not A Number which arise from floating point calculations. For example
1e500(1 with 500 zeroes) multiplied by itself is too large to be represented in the most common 64 bit floating point, while subtracting infinity from itself producesNaN.Because infinity and NaN can occur in calculations, many JSON libraries will produce and accept them despite not being in the JSON standard. See below for how SQLite deals with infinity and NaN.
String normalization
Unicode allows different representations of what will appear as the same text. For example an
ewith an accent can be represented as a single codepoint, or as two with theeand the combining accent separately. JSON makes no requirements on strings, with all implementations usually producing and accepting the strings as is. You can read about the subject here.
Trailing commas
Trailing commas will not be accepted or produced by JSON libraries. For example a list like
[1, 2, 3,]or an object like{"one": 1, "two": 2,}are not valid.
Explicit limits
The standard does not say how long strings can be, or how many items can be in an array or object. There is no limit on how many digits can be used in numbers nor a minimum or maximum precision. It is common for implementations to have limits especially 64 bits for numbers. String limits may be 1 or 2 billion characters, and arrays / objects be limited to a similar number of members.
Python has a 64 bit limit on floating point numbers when using
floatbutdecimalis unlimited, and has no limit on integers, strings, arrays, or objects other than available memory.SQLite has an upper limit of 2GB for strings, uses signed 64 bit integers, and standard 64 bit floating point.
If data is large, then other representations are more appropriate.
Object (dict) key order or duplicates
While arrays (list) are ordered, there is no specification for what order object keys are in, or that duplicates are not allowed. This usually doesn’t matter, but there are security attacks where one component may use the first occurrence of a duplicate key, while another component uses the last occurrence. For example the SQLite function extracting values will use the first occurrence of a key, while a dict created from the object will use the last occurrence.
The Python
jsonmodule, and APSW both let you see objects as lists of keys and values so you can do your own validation or other processing.
JSON5
JSON5 is a superset of JSON intended to be more human readable and writable. SQLite will accept JSON5 encoded text, but will never produce it. While SQLite parses JSON5, you can’t get back JSON5 output from a JSON5 input.
For example JSON5 allows comments, hexadecimal numbers, trailing commas, infinity and NaN, and omitting some quoting,
Using JSON
Python
The standard library json module provides all necessary
functionality, including turning Python objects into JSON text, and
JSON text into Python objects. You can read and write JSON text, or a
file object.
It deviates from the standard:
InfinityandNaNare produced and consumed by default, although there is a keyword argument to turn it offWhen producing JSON objects, keys that are numbers, None, or boolean are turned into their corresponding JSON text representation. When reading an object back, the reverse transformation is not done since there is no way to know if that is intended,
Various corner cases in Unicode / UTF8 are accepted such as unpaired surrogates and UTF8 encoded surrogates. This was done because other implementations at the time could produce this kind of encoding. Attempting to encode the resulting strings as UTF-8 again will result in exceptions.
You can see a full list of JSON issues.
SQLite
SQLite has over 30 functions for consuming, extracting, iterating, and producing JSON. You will need to ensure that what you get back is what is intended. You can usually get back the JSON text representation of values, or the SQLite value. For example a SQLite string is the same as a Python string, while the JSON text representation includes double quotes around it and various quoting inside. (Example)
You can store JSON text directly in the database, but there is no way
to differentiate it from any other text value. For example the number
2 in JSON is text 2. The json_valid function may help - for
example as a CHECK constraint on a
column.
Infinity and NaN
SQLite accepts infinity but represents it as the floating point value
9e999 and accepts NaN representing it as null (None). Unfortunately
9e999 is a valid value for decimal.Decimal as well as
numpy.float128, so you won’t be able to tell if infinity was the
original value
JSONB
SQLite has a binary format for representing JSON - named JSONB. It is specified here. It is significantly faster to use because JSON text requires finding matching quotes around strings, square brackets around arrays, curly braces around objects, and ensuring numeric values are valid. JSONB has already done all that processing so accessing and extracting members is a lot quicker. It also saves some space.
In most cases using SQLite JSON text functions results in SQLite doing an internal conversion to JSONB (which is cached) and then operating on that. JSONB internally stores values as a binary tag and length, then the UTF8 text so producing JSON text again is quick.
You can store JSONB to the database, and again can use the json_valid function as CHECK constraint with the
value 8.
Note
JSONB does not have a version number or any header explicitly identifying binary data as JSONB. There is no checksum or similar validation. As an example a single byte whose value is 0 through 12 is valid JSONB.
As byte sequences get longer, the likelihood they are valid JSONB decreases. This table lists what proportion of all byte sequences of each length are also valid JSONB.
Sequence length |
Proportion valid JSONB |
|---|---|
1 byte |
3.52% |
2 bytes |
0.71% |
3 bytes |
0.35% |
4 bytes |
0.18% |
5 bytes |
0.10% |
6 bytes |
0.05% |
7 bytes |
0.03% |
8 bytes |
0.02% |
9 bytes |
0.01% |
10+ bytes |
very low |
APSW
APSW provides 2 functions for working directly with JSONB, and a validation function. This is for performance reasons so that there is no need for an intermediate step representing objects as JSON text. The validation function is stricter than SQLite’s equivalent to avoid false positives.
Performance testing was done using SQLite’s randomjson code to create a large object with many nested values- your objects will be different.
Converts a Python object directly to JSONB. The alternative is two steps using
jsonto convert to JSON text and then SQLite’s internal JSON text to JSONB.
Test results (CPU time) 0.13 seconds
APSW Python object direct to JSONB
1.20 seconds
jsonsame Python object to JSON text0.80 seconds
SQLite that JSON text to JSONB
The same parameters as
json.dumps()are available, with more providing control over how non-string object keys are converted, type matching, and direct conversion to JSONB types can be done.
Converts JSONB directly back to a Python object. The alternative is two steps using SQLite’s internal JSONB to JSON text and then
jsonto convert the JSON text to a Python object.
Test results (CPU time) 0.48 seconds
APSW JSONB direct to Python object
0.22 seconds
SQLite same JSONB to JSON text
1.35 seconds
jsonthat JSON text to Python objectThe same parameters as
json.loads()are available, with an additional hook for arrays (lists).
Returns a boolean if some binary data is valid JSONB. If this returns
Truethen SQLite will always produce valid JSON from the JSONB.SQLite’s json_valid only checks the various internal type and length fields are consistent and items seem reasonable. It does not check all corner cases, or the UTF8 encoding, and so can produce invalid JSON even if json_valid said it was valid JSONB.
Notes
Because SQLite has a 2GB limit on text or blobs (binary data), it can’t work with individual JSON text or JSONB data over that size.
JSON as a SQLite value type
You can make SQLite automatically support JSON as though it was a natively supported type. Example code that does these steps.
SQLite’s binary JSON representation is stored as a binary blob in the database. This is necessary because JSON text can’t easily be distinguished from other text, while a blob is far less ambiguous.
SQLite often operates on JSONB internally when using the JSON functions, and its
jsonfunction can turn JSONB into JSON text format if needed.SQLite’s functions provide full access to all the values inside JSON and JSONB for reading, iterating, qnd modifying.
Convert bindings
The
convert_bindingcallback for SQLite unknown types canencode themas JSONB.
Convert JSONB
The
convert_jsonbcallback is called when a blob would be returned and is also valid JSONB. You candecode itor return the blob. The cursor is provided so you can examine thedescriptionto help decide.
JSONB API
- apsw.jsonb_decode(data: Buffer, *, object_pairs_hook: Callable[[list[tuple[str, JSONBTypes | Any]]], Any] | None = None, object_hook: Callable[[dict[str, JSONBTypes | Any]], Any] | None = None, array_hook: Callable[[list[JSONBTypes | Any]], Any] | None = None, parse_int: Callable[[str], Any] | None = None, parse_float: Callable[[str], Any] | None = None) Any
Decodes JSONB binary data into a Python object. It is like
json.loads()but operating on JSONB binary source instead of a JSON text source.- Parameters:
data – Binary data to decode
object_pairs_hook – Called after a JSON object has been decoded with a list of tuples, each consisting of a
strand corresponding value, and should return a replacement value to use instead.object_hook – Called after a JSON object has been decoded into a Python
dictand should return a replacement value to use instead.array_hook – Called after a JSON array has been decoded into a list, and should return a replacement value to use instead.
parse_int – Called with a
strof the integer, and should return a value to use. The default isint. If the integer is hexadecimal then it will be called with a second parameter of 16.parse_float – Called with a
strof the float, and should return a value to use. The default isfloat.
Only one of
object_hookorobject_pairs_hookcan be provided.object_pairs_hookis useful when you want something other than a dict, care about the order of keys, want to convert them first (eg case, numbers, normalization), want to handle duplicate keys etc.The array, int, and float hooks let you use alternate implementations. For example if you are using numpy then you could use numpy arrays instead of lists, or numpy’s float128 to get higher precision floating numbers with greater exponent range than the builtin float type.
If you use
types.MappingProxyTypeasobject_hookandtupleasarray_hookthen the overall returned value will be immutable (read only).Note
The data is always validated during decode. There is no need to separately call
jsonb_detect().
- apsw.jsonb_detect(data: Buffer) bool
Returns
Trueif data is valid JSONB, otherwiseFalse. If this returnsTruethen SQLite will produce valid JSON from it.SQLite’s json_valid only checks the various internal type and length fields are consistent and items seem reasonable. It does not check all corner cases, or the UTF8 encoding, and so can produce invalid JSON even if json_valid said it was valid JSONB.
Note
jsonb_decode()always validates the data as it decodes, so there is no need to call this function separately. This function is useful for determining if some data is valid, and not some other binary format such as an image.
- apsw.jsonb_encode(obj: Any, *, skipkeys: bool = False, sort_keys: bool = False, check_circular: bool = True, exact_types: bool = False, default: Callable[[Any], JSONBTypes | Buffer] | None = None, default_key: Callable[[Any], str] | None = None, allow_nan: bool = True) bytes
Encodes a Python object as JSONB. It is like
json.dumps()except it produces JSONB bytes instead of JSON text.- Parameters:
obj – Object to encode
skipkeys – If
Trueand a non-string dict key is encountered then it is skipped. OtherwiseValueErroris raised. DefaultFalse. Likejson.dumps()keys that are bool, int, float, and None are always converted to string.sort_keys – If
Truethen objects (dict) will be output with the keys sorted. This produces deterministic output. DefaultFalse.check_circular – Detects if containers contain themselves (even indirectly) and raises
ValueError. IfFalseand there is a circular reference, you eventually getRecursionError(or run out of memory or similar).default –
Called if an object can’t be encoded, and should return an object that can be encoded. If not provided a
TypeErroris raised.It can also return binary data in JSONB format. For example
decimalvalues can be encoded as a full precision JSONB float.apsw.ext.make_jsonb()can be used.default_key – Objects (dict) must have string keys. If a non-string key is encountered, it is skipped if
skipkeysisTrue. Otherwise this is called. If not supplied the default matches the standard libraryjsonwhich converts None, bool, int and float to their string JSON equivalents and uses those. This callback is useful if you want to raise an exception, or use a different way of generating the key string.allow_nan – If
True(default) then following SQLite practise, infinity is converted to float9e999and NaN is converted toNone. IfFalseaValueErroris raised.exact_types – By default subclasses of int, float, list (including tuple), dict (including
collections.abc.Mapping), andstrare converted the same as the parent class. This is usually what you want. However sometimes you are using a subclass and want them converted by thedefaultfunction with an example beingenum.IntEnum. If this parameter isTruethen only the exact types are directly converted and subclasses will be passed todefaultordefault_key.
You will get a
TooBigErrorif the resulting JSONB will exceed 2GB because SQLite can’t handle it.