Organize docs: front and back end; custom predicates. (#96)
* Organize docs: front and back end; custom predicates. * Whoops forgot to hit save before git commit last time -- delete stuff moved out of values.md * Update book/src/values.md --------- Co-authored-by: Ahmad Afuni <root@ahmadafuni.com>
This commit is contained in:
parent
7373b959f6
commit
c9f7427967
8 changed files with 112 additions and 64 deletions
|
|
@ -16,9 +16,9 @@
|
||||||
- [Operations](./operations.md)
|
- [Operations](./operations.md)
|
||||||
- [Simple example](./simpleexample.md)
|
- [Simple example](./simpleexample.md)
|
||||||
- [Custom statements and custom operations](./custom.md)
|
- [Custom statements and custom operations](./custom.md)
|
||||||
- [Version 1](./custom1.md)
|
- [Defining custom predicates](./custompred.md)
|
||||||
- [Version 1, written again](./custom1b.md)
|
|
||||||
- [Custom statement example](./customexample.md)
|
- [Custom statement example](./customexample.md)
|
||||||
|
- [How to hash a custom statement](./customhash.md)
|
||||||
- [POD types](./podtypes.md)
|
- [POD types](./podtypes.md)
|
||||||
- [SignedPOD](./signedpod.md)
|
- [SignedPOD](./signedpod.md)
|
||||||
- [MainPOD](./mainpod.md)
|
- [MainPOD](./mainpod.md)
|
||||||
|
|
|
||||||
|
|
@ -1 +1,41 @@
|
||||||
# Backend types
|
# Backend types
|
||||||
|
|
||||||
|
On the backend, there is only a single type: `Value`.
|
||||||
|
|
||||||
|
A `Value` is simply a tuple of field elements. With the plonky2 backend, a `Value` is a tuple of 4 field elements. In general, the backend will expose a constant `VALUE_SIZE`, and a `Value` will be a tuple of `VALUE_SIZE` field elements.
|
||||||
|
|
||||||
|
## Integers and booleans
|
||||||
|
|
||||||
|
The backend encoding stores integers in such a way that arithmetic operations (addition, multiplication, comparison) are inexpensive to verify in-circuit.
|
||||||
|
|
||||||
|
In the case of the Plonky2 backend, an integer $x$ is decomposed as
|
||||||
|
$$x = x_0 + x_1 \cdot 2^{32}$$
|
||||||
|
with $0 \leq x_0, x_1 < 2^{32}$ and represented as
|
||||||
|
$$\texttt{map}\ \iota\ [x_0, x_1, 0, 0],$$
|
||||||
|
where $\iota:\mathbb{N}\cup\{0\}\rightarrow\texttt{GoldilocksField}$ is the canonical projection.
|
||||||
|
|
||||||
|
On the backend, a boolean is stored as an integer, either 0 or 1; so logical operations on booleans are also inexpensive.
|
||||||
|
|
||||||
|
## Strings
|
||||||
|
|
||||||
|
The backend encoding stores strings as hashes, using a hash function that might not be zk-friendly. For this reason, string operations (substrings, accessing individual characters) are hard to verify in-circuit. The POD2 system does not provide methods for manipulating strings.
|
||||||
|
|
||||||
|
In other words: As POD2 sees it, two strings are either equal or not equal. There are no other relationships between strings.
|
||||||
|
|
||||||
|
In the case of the Plonky2 backend, a string is converted to a sequence of bytes with the byte `0x01` appended as padding, then the bytes are split into 7-byte chunks starting from the left, these chunks then being interpreted as integers in little-endian form, each of which is naturally an element of `GoldilocksField`, whence the resulting sequence may be hashed via the Poseidon hash function. Symbolically, given a string $s$, its hash is defined by
|
||||||
|
|
||||||
|
$$\texttt{poseidon}(\texttt{map}\ (\iota\circ\jmath_\texttt{le-bytes->int})\ \texttt{chunks}_7(\jmath_\texttt{string->bytes}(s)\ \texttt{++}\ [\texttt{0x01}])),$$
|
||||||
|
|
||||||
|
where `poseidon` is the Poseidon instance used by Plonky2, $\iota$ is as above, $\texttt{chunks}_{n}:[\texttt{u8}]\rightarrow [[\texttt{u8}]]$ is defined such that[^aux]
|
||||||
|
|
||||||
|
$$\texttt{chunks}_n(v) = \textup{if}\ v = [\ ]\ \textup{then}\ [\ ]\ \textup{else}\ [\texttt{take}_n v]\ \texttt{++}\ \texttt{chunks}_n(\texttt{drop}_n v),$$
|
||||||
|
|
||||||
|
the mapping $\jmath_\texttt{le-bytes->int}: [u8] \rightarrow{N}\cup\{0\}$ is given by
|
||||||
|
|
||||||
|
$$[b_0,\dots,b_{N-1}]\mapsto \sum_{i=0}^{N-1} b_i \cdot 2^{8i},$$
|
||||||
|
|
||||||
|
and $\jmath_\texttt{string->bytes}$ is the canonical mapping of a string to its UTF-8 representation.
|
||||||
|
|
||||||
|
## Compound types
|
||||||
|
|
||||||
|
The three front-end compound types (`Dictionary`, `Array`, `Set`) are all represented as Merkle roots on the backend. The details of the representation are explained on a separate [Merkle tree](./merkletree.md) page.
|
||||||
|
|
@ -10,9 +10,33 @@ At the backend level, every definition of a predicate is either a conjunction or
|
||||||
|
|
||||||
On the backend, custom predicates are defined in _groups_. A group can contain one or more custom predicates and their associated sub-predicates. Recursive definition is only possible within a group: the definition of a predicate in a group can only depend on previously existing predicates, itself, and other predicates in the same group.
|
On the backend, custom predicates are defined in _groups_. A group can contain one or more custom predicates and their associated sub-predicates. Recursive definition is only possible within a group: the definition of a predicate in a group can only depend on previously existing predicates, itself, and other predicates in the same group.
|
||||||
|
|
||||||
## Custom predicates and their IDs
|
## Arguments of custom predicates
|
||||||
|
|
||||||
|
The definition of a custom predicate might also be called an _operation_ or _deduction rule_. It includes two (or, potentially, say, five) statement templates as conditions. The arguments to the statement templates are decomposed as (origin, key) pairs: if statements are allowed to have arity at most 4, then the statement templates in a deduction rule will have at most 8 arguments (4 origins and 4 keys). The same holds for the output statement.
|
||||||
|
|
||||||
|
Each argument (origin or key) to an statement template is either a wildcard or a literal. In the backend, the wildcard arguments will be identified as *1, *2, *3, ....
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
See [examples](./customexample.md)
|
||||||
|
|
||||||
|
## Hashing and predicate IDs
|
||||||
|
|
||||||
|
Each custom predicate is defined as part of a _group_ of predicates. The definitions of all statements in the group are laid out consecutively (see [examples](./customexample.md)) and hashed. For more details, see the pages on [hashing custom statements](./customhash.md) and [custom predicates](./custompred.md).
|
||||||
|
|
||||||
|
## How to prove an application of an operation
|
||||||
|
|
||||||
|
The POD proof format is inspired by "two-column proofs" (for an example, see [Wikipedia](https://en.wikipedia.org/wiki/Mathematical_proof)). A POD contains a "tabular proof", in which each row includes a "statement" and an "operation". The "operation" is the "reason" that justifies the statement: it is everything the circuit needs as a witness to verify the statement.
|
||||||
|
|
||||||
|
For a custom statement, the "reason" includes the following witnesses and verifications:
|
||||||
|
- the definition of the statement, serialized (see [examples](./customexample.md))
|
||||||
|
- if the statement is part of a group, the definition of the full group, serialized
|
||||||
|
- verify that the hash of the definition is the statement ID
|
||||||
|
- the definition will have some number of "wildcards" (*1, *2, ...) as arguments to statement templates; a value for each wildcard must be provided as a witness (each will be either an origin ID or key)
|
||||||
|
- the circuit must substitute the claimed values for the wildcards, and the resulting statements (true statements with origins and keys) will appear as witnesses
|
||||||
|
- the circuit must verify that all the input statement templates (with origins and keys) appear in the previous statements (in higher rows of the table)
|
||||||
|
- the circuit also substitutes the claimed values for the wildcards in the output statement, and verifies that it matches the claimed output statement
|
||||||
|
|
||||||
A custom predicate, like a built-in predicate, is identified by a _name_ on the front end and an _identifier_ on the back end. The identifier is a cryptographic hash of the definition of the group.
|
|
||||||
|
|
||||||
|
|
||||||
[^inherit]: What to call this? One POD "inherits" from another?
|
[^inherit]: What to call this? One POD "inherits" from another?
|
||||||
|
|
|
||||||
|
|
@ -1,36 +0,0 @@
|
||||||
# Custom operations (or: how to define a custom predicate), version 1, written again
|
|
||||||
|
|
||||||
[All constant integers in this spec are determined by circuit size and subject to change.]
|
|
||||||
|
|
||||||
On the frontend, a _custom predicate_ is defined as a combination of conjunctions and disjunctions of other custom predicates and possibly the predicate itself.
|
|
||||||
|
|
||||||
On the backend, _custom predicates_ are defined in groups of up to 10. Each custom predicate in the group is either the AND or OR of five predicates which are either pre-existing, or defined in the same group.
|
|
||||||
|
|
||||||
[Note: We could potentially allow the AND or OR of, say, two predicates instead of five. To make this work, we might need to have access to pod ID as a virtual key, see github issue #60]
|
|
||||||
|
|
||||||
## Arguments of custom predicates
|
|
||||||
|
|
||||||
The definition of a custom predicate might also be called an _operation_ or _deduction rule_. It includes two (or, potentially, say, five) statement templates as conditions. The arguments to the statement templates are decomposed as (origin, key) pairs: if statements are allowed to have arity at most 4, then the statement templates in a deduction rule will have at most 8 arguments (4 origins and 4 keys). The same holds for the output statement.
|
|
||||||
|
|
||||||
Each argument (origin or key) to an statement template is either a wildcard or a literal. In the backend, the wildcard arguments will be identified as *1, *2, *3, ....
|
|
||||||
|
|
||||||
## Examples
|
|
||||||
|
|
||||||
See [examples](./customexample.md)
|
|
||||||
|
|
||||||
## Hashing and predicate IDs
|
|
||||||
|
|
||||||
Each custom predicate is defined as part of a _group_ of predicates. The definitions of all statements in the group are laid out consecutively (see [examples](./customexample.md)) and hashed.
|
|
||||||
|
|
||||||
## How to prove an application of an operation
|
|
||||||
|
|
||||||
A POD contains a "tabular proof", in which each row includes a "statement" and a "reason". The "reason" is everything the circuit needs as a witness to verify the statement.
|
|
||||||
|
|
||||||
For a custom statement, the "reason" includes the following witnesses and verifications:
|
|
||||||
- the definition of the statement, serialized (see [examples](./customexample.md))
|
|
||||||
- if the statement is part of a group, the definition of the full group, serialized
|
|
||||||
- verify that the hash of the definition is the statement ID
|
|
||||||
- the definition will have some number of "wildcards" (*1, *2, ...) as arguments to statement templates; a value for each wildcard must be provided as a witness (each will be either an origin ID or key)
|
|
||||||
- the circuit must substitute the claimed values for the wildcards, and the resulting statements (true statements with origins and keys) will appear as witnesses
|
|
||||||
- the circuit must verify that all the input statement templates (with origins and keys) appear in the previous statements (in higher rows of the table)
|
|
||||||
- the circuit also substitutes the claimed values for the wildcards in the output statement, and verifies that it matches the claimed output statement
|
|
||||||
20
book/src/customhash.md
Normal file
20
book/src/customhash.md
Normal file
|
|
@ -0,0 +1,20 @@
|
||||||
|
# How to hash a custom predicate
|
||||||
|
|
||||||
|
Every predicate, native or custom, is identified on the backend by a predicate ID.
|
||||||
|
|
||||||
|
The native predicates are numbered with small integers, sequentially. The ID of a custom predicate is a hash of its definition; this guarantees that two different predicates cannot have the same ID (aside from the miniscule probability of a hash collision).
|
||||||
|
|
||||||
|
This document explains in some detail how the definition of a custom predicate is serialized and hashed.
|
||||||
|
|
||||||
|
Custom predicates are defined in _groups_ (also known as _batches_); see an [example](./customexample.md). The definition of a custom predicate in a group involves other predicates, which may include:
|
||||||
|
- native predicates
|
||||||
|
- previously-defined custom predicates
|
||||||
|
- other predicates in the same group.
|
||||||
|
|
||||||
|
Predicate hashing is recursive: in order to hash a group of custom predicates, we need to know IDs for all the previously-defined custom predicates it depends on.
|
||||||
|
|
||||||
|
The definition of the whole group of custom predicates is serialized (as explained below), and that serialization is hashed (using a zk-friendly hash -- in the case of the plonky2 backend, Poseidon) to give a _group ID_. Each predicate in the group is then referenced by
|
||||||
|
```
|
||||||
|
predicate_ID = (group_ID, idx)
|
||||||
|
```
|
||||||
|
(here `idx` is simply the index of the predicate in the group).
|
||||||
|
|
@ -1,4 +1,4 @@
|
||||||
# Custom operations (or: how to define a custom predicate): VERSION 1
|
# Defining custom predicates
|
||||||
|
|
||||||
(Note: At the moment, we consider a "custom operation" to be exactly the same thing as the "definition of a custom predicate.")
|
(Note: At the moment, we consider a "custom operation" to be exactly the same thing as the "definition of a custom predicate.")
|
||||||
|
|
||||||
|
|
@ -1,4 +1,17 @@
|
||||||
# Frontend and backend
|
# Frontend and backend
|
||||||
|
|
||||||
The frontend is what we want the user to see.
|
The POD2 system consists of a frontend and a backend, connected by a middleware. This page outlines some design principles for deciding which components go where.
|
||||||
The backend is what we want the circuit to see.
|
|
||||||
|
```
|
||||||
|
user -- frontend -- middleware -- backend -- ZK circuit
|
||||||
|
```
|
||||||
|
|
||||||
|
The frontend is what we want the user to see; the backend is what we want the circuit to see.
|
||||||
|
|
||||||
|
## Circuit and proving system
|
||||||
|
|
||||||
|
The first implementation of POD2 uses Plonky2 as its proving system. In principle, a future implementation could use some other proving system. The frontend and middleware should not be aware of what proving system is in use: anything specific to the proving system belongs to the backend.
|
||||||
|
|
||||||
|
## User-facing types versus in-circuit types
|
||||||
|
|
||||||
|
The frontend type system exposes human-readable types to POD developers: strings, ints, bools, and so forth. On the backend, all types are build out of field elements. The middleware should handle the conversion.
|
||||||
|
|
|
||||||
|
|
@ -1,7 +1,9 @@
|
||||||
# POD value types
|
# POD value types
|
||||||
From the frontend perspective, POD values may be one of the following[^type] types: two atomic types
|
From the frontend perspective, POD values may be one of the following[^type] types: four atomic types
|
||||||
- `Integer`
|
- `Integer`
|
||||||
|
- `Bool`
|
||||||
- `String`
|
- `String`
|
||||||
|
- `Raw`
|
||||||
|
|
||||||
and three compound types
|
and three compound types
|
||||||
- `Dictionary`
|
- `Dictionary`
|
||||||
|
|
@ -18,31 +20,16 @@ $$\texttt{HashOut<GoldilocksField>}\simeq\texttt{[GoldilocksField; 4]}.$$
|
||||||
## `Integer`
|
## `Integer`
|
||||||
In the frontend, this type is none other than `u64`[^i64]. In the backend, it will be appropriately embedded into the codomain of the canonical hash function.
|
In the frontend, this type is none other than `u64`[^i64]. In the backend, it will be appropriately embedded into the codomain of the canonical hash function.
|
||||||
|
|
||||||
In the case of the Plonky2 backend, this is done by decomposing such an integer $x$ as
|
## `Bool`
|
||||||
$$x = x_0 + x_1 \cdot 2^{32}$$
|
In the frontend, this is a simple bool. In the backend, it will have the same encoding as an `Integer` `0` (for `false`) or `1` (for `true`).
|
||||||
with $0 \leq x_0, x_1 < 2^{32}$ and representing it as
|
|
||||||
$$\texttt{map}\ \iota\ [x_0, x_1, 0, 0],$$
|
|
||||||
where $\iota:\mathbb{N}\cup\{0\}\rightarrow\texttt{GoldilocksField}$ is the canonical projection.
|
|
||||||
|
|
||||||
|
|
||||||
## `String`
|
## `String`
|
||||||
In the frontend, this type corresponds to the usual `String`. In the backend, the string will be mapped to a sequence of field elements and hashed with the hash function employed there, thus being represented by its hash.
|
In the frontend, this type corresponds to the usual `String`. In the backend, the string will be mapped to a sequence of field elements and hashed with the hash function employed there, thus being represented by its hash.
|
||||||
|
|
||||||
In the case of the Plonky2 backend, the string is converted to a sequence of bytes with the byte `0x01` appended as padding, then the bytes are split into 7-byte chunks starting from the left, these chunks then being interpreted as integers in little-endian form, each of which is naturally an element of `GoldilocksField`, whence the resulting sequence may be hashed via the Poseidon hash function. Symbolically, given a string $s$, its hash is defined by
|
## `Raw`
|
||||||
|
"Raw" is short for "raw value". A `Raw` exposes a backend value on the frontend.
|
||||||
$$\texttt{poseidon}(\texttt{map}\ (\iota\circ\jmath_\texttt{le-bytes->int})\ \texttt{chunks}_7(\jmath_\texttt{string->bytes}(s)\ \texttt{++}\ [\texttt{0x01}])),$$
|
|
||||||
|
|
||||||
where `poseidon` is the Poseidon instance used by Plonky2, $\iota$ is as above, $\texttt{chunks}_{n}:[\texttt{u8}]\rightarrow [[\texttt{u8}]]$ is defined such that[^aux]
|
|
||||||
|
|
||||||
$$\texttt{chunks}_n(v) = \textup{if}\ v = [\ ]\ \textup{then}\ [\ ]\ \textup{else}\ [\texttt{take}_n v]\ \texttt{++}\ \texttt{chunks}_n(\texttt{drop}_n v),$$
|
|
||||||
|
|
||||||
the mapping $\jmath_\texttt{le-bytes->int}: [u8] \rightarrow{N}\cup\{0\}$ is given by
|
|
||||||
|
|
||||||
$$[b_0,\dots,b_{N-1}]\mapsto \sum_{i=0}^{N-1} b_i \cdot 2^{8i},$$
|
|
||||||
|
|
||||||
and $\jmath_\texttt{string->bytes}$ is the canonical mapping of a string to its UTF-8 representation.
|
|
||||||
|
|
||||||
|
|
||||||
|
With the plonky2 backend, a `Raw` is a tuple of 4 elements of the Goldilocks field.
|
||||||
|
|
||||||
## Dictionary, array, set
|
## Dictionary, array, set
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue