Extend the work of https://github.com/0xPARC/pod2/pull/487 to the Containers (Dictionary, Set, Array).
The merkle tree only stores `RawValue` for both the key and the value, so it is the responsibility of the Container to store the rich value.
In order to handle containers with persistent storage efficiently (which means, cloning them or updating them should not cause an O(n) data copy) I figured we need to have a database of `Value`s indexed by their raw value; as this gives us deduplication and free cloning of containers.
The issue with this approach is that in the current design we have collisions between Value's of different types: https://github.com/0xPARC/pod2/issues/426 and the current API relies on the single type of values.
To resolve this issue I decided to change the API, instead of assuming that a Value has a fixed type, let the value be possibly multiple compatible types and let the user of the library try casting the Value to a particular type.
For this I deprecated the public access of everything related to `TypedValue` and I propose for it to be considered an implementation detail and a blackbox from the external developer point of view. The `Value` type is now used like this:
- To create a new Value use `Value::from(...)` where you can pass any compatible type (the same types as before)
- To access the Value in typed form you cast it like `value.as_foo()` which returns `Option<Foo>`.
Previously we had a collision between `true` and `1` (and `false` and `0`). Now it doesn't matter whether a value holds a `true` or a `1`, both should be seen as the same and both return `Some` when doing `as_int` and `as_bool`.
Similarly we had collisions with containers. For example `set(0, 1, 2) == array[0, 1, 2]` and `set("a", "b") = dict("a": "a", "b": "b")`. Now any container can be casted to any of `set, array, dict`. There's a caveat here: each of these types expects a particular encoding of keys, so casting to the wrong type will return errors on some operations.
With this design it no longer matters what is being stored and recovered because the API requires the user to express the expected type and any type with collisions for particular values can be casted to the right type.
There's only one case where it's not desirable to swap one `TypedValue` for another: the `TypedValue::Raw`. If a non-`RawValue` in the DB is replaced by the corresponding `RawValue` we erase the required information to recover the rich value. For this reason the implementations of the database treat the `RawValue` as a special case: if an value is stored in non-`RawValue`, the corresponding `RawValue` can never overwrite it. If a value is stored in `RawValue`, a matching non-`RawValue` will overwrite it (promoting it to a rich value). This way we never lose data.
A consequence of this is that the serialization, `Display` and `Debug` of a container is not stable. At any point any of the entries can be swapped for a "compatible" one if they share the storage with other containers that introduce collisions.
I rewrote all containers as wrapper to a generic `Container` which holds a `Map` from `Value` to `Value`. The serialization of each container now uses the single implementation of the generic `Container`.
* refactor merkletree to work with disk keyvalue database (wip)
* various fixes post reimplementation; pending delete leaf
* add delete operation case for the new in db tree approach
* polish tree update & delete; everything works (pending polishing)
* polish panics into errs, prints, etc
* Implement iterator
* Lint
* fix case no-siblings
* case delete with semi-empty branch
* polishing
* starting to add rocksdb & heeddb for the DB & Txn traits
* Satisfy the borrow checker
* abstract merkletree tests to use the various available DBs
* update store_node interface (rm hash input), rm heed.rs
* polishing
* typos
* Ditch transactions
* add feature for rocksdb, return errs at new_with_db, remove empty leaf case in Leaf::new
* intermediate instead of leaf in empty node when deleting leaf
---------
Co-authored-by: Ahmad <root@ahmadafuni.com>
Adds nicer errors for Podlang code, using the `annotate_snippets` crate, the same crate used by the Rust compiler to generate contextual errors. This prints a short snippet of the code containing the error within the error message, highlighting the part that needs to be fixed.
It also includes a change to the `load_module` function, changing a `Vec` function argument to a slice.
* First pass at removing batch splitting
* Refactor to separate module loading from request parsing
* Consolidate module functionality
* Tidy up comments
* Use array of modules instead of HashMap
* Formatting
* Use module hashes when importing modules
* merkletree: custom poseidon to use flag as initial state.
This allows to do the merkletree related hashing in 1 gate instead of 2,
reducing ~23% of gates per merkle proof.
| tree levels | 10 | 16 | 32 | 40 | 64 | 128 | 130 | 250 | 256 |
|---------------|----|----|-----|-----|-----|-----|-----|------|------|
| old num gates | 50 | 76 | 144 | 178 | 280 | 554 | 564 | 1076 | 1102 |
| new num gates | 39 | 59 | 111 | 137 | 215 | 425 | 433 | 825 | 845 |
* update docs with new tree hashing approach
* add inline comment stating clear how the flag is used in the state permutation
Resolve https://github.com/0xPARC/pod2/issues/466
Now batches are identified by the root of a merkle tree that contains all the predicates (using sequential indices as keys). This means that the format to identify a custom predicate reference is still a hash + index, but the calculation of the hash is different.
The MainPod circuit now isn't limited by number of batches but instead number of custom predicates; and for each one we verify a merkle proof to verify the batch id.
I've removed a bunch of tests from lang that were testing splitting into multiple batches because there's no longer any need for that. In a future PR we'll remove the code that handles batch splitting.
Each custom predicate needs 148.2 gates (which is very close to my estimate of 142.7 in https://github.com/0xPARC/pod2/issues/466#issuecomment-3823531286 where I actually made a mistake and considered 5 predicates per batch instead of 4 in the previous Params).
I thought it would be nice to have a Predicate for the typed value so that the developer can work with predicates as values comfortably. Then I noticed that hashing a predicate required `Params` which would have been annoying for converting a `TypedValue::Predicate` to `RawValue` and this led to a small refactor over how `Params` work.
We already had some fields in the `Params` struct that determine compatibility between encoded data. They can be seen as determining a kind of ABI compatibility. In general it's better if those parameters don't change so that different circuit configurations can still verify proofs from each other. So I decided to force those parameters to be constant in the code base and not allow the user of our library to change them. Many field element serialization/deserialization functions in our code depended on those parameters, and since now they are constant many functions get rid of the `Params` argument, which simplifies the code. This includes the serialization of a `Predicate` which was required to calculate its hash.
* Multi-batch splitting
* Invoke split predicates by name, passing in full argument list
* Reorder batches to prevent failure of forward references where possible
* Rename APIs for clarity
* Simplify example
* Add more docs
* Review updates
* Remove duplicate code
* Comment topological sort algorithm
Implement support for first order predicates in the backend.
Now a statement template can have a predicate hash or a wildcard.
## predicate <-> predicate hash constraints
To build the custom predicate table we need to calculate the custom predicate batch id, which uses the serialization of the statement templates before normalization. This serialization uses the predicate hash when the template uses a predicate (instead of a wildcard). Then in normalization we recalculate the predicate hash if it was a Batch Self.
This means that the relation between hash and predicate must be checked before and after normalization when the template is not using a wildcard. How this is achieved:
- Before normalization: the constructor of StatementTmplTarget forces that if we keep a predicate, it's hash must be equal to the pred_hash when the template has a predicate (and not a wildcard)
- After normalization: the predicate hash is calculated in the normalization and replaced in the case of the template using a predicate and it being a BatchSelf. If it was a predicate but not batch self, the old value was used which was constrained via the constructor.
See `CircuitBuilder::add_virtual_statement_tmpl` and `normalize_st_tmpl_circuit`
## Wildcard predicate resolution
It is done via `make_predicate_from_template_circuit` and is fairly simple as it's contains similar logic to `make_statement_arg_from_template_circuit` but simpler.
Resolve#448
Previously a predicate was 6 elements. Now it grows to 8 elements; and the hash is 4 elements.
Some parts of the circuit require only require equality checks with the predicate: that works with the predicate hash. Other parts require inspecting or working with particular elements in the predicate, those need the preimage of the predicate hash.
Both `StatementTarget` and `StatementTmplTarget` have been updated to include the predicate hash and optionally the predicate. When the predicate is included, constraints are automatically generated for `pred_hash = hash(pred)`. We only include the predicate when needed.
This simplifies the MerkleTree (and container) API.
Defer the max depth check when assigning the witness (merkle proof siblings) to the merkle tree circuit.
In this implementation the native Merkle Tree branches grow as much as they needed. There are no checks of max depth in the merkle tree. All keys are 256 bits (I added a debug_assert for this); so in the worst case a path will have depth 256. It can't have a longer depth because the `insert` method calls `prove_nonexistence` which errors if the key already exists; another one may exist which must be different and thus require a path <= 256 depth.
Resolve#436
- Update formula in `estimate_verif_num_gates` after the update in the recursive verification from https://github.com/0xPARC/pod2/pull/397
- Update the parameters to get better utilization of 2^16 rows
- Update metrics report to be more compact
- middleware:
- Add `Statement::Intro`
- Add `SignedBy` native predicate and operation. The signature is auxiliary data to the operation
- Rename `PodSigner` to `Signer` with a new API (just for signing `RawValue`)
- Removed `NewEntry` operation. Use `ContainsFromEntries` instead
- Remove `KEY_SIGNER` and `KEY_TYPE` which are no longer used
- Merge `RecursivePod` and `Pod` traits
- Change the `Pod::deserialize_data` method to use `Self` instead of `Box<dyn Pod>`
- Extend `Pod` trait with these methods:
- `is_main`: when the pod is Main, in a (recursive) verification its vk will be checked to exist in the vd_set but not if it's intro pod
- `is_mock`: skip some verifications in the recursive mock MainPod verification
- `verifier_data_hash`
- `pod_id` renamed to `statements_hash`
- AnchoredKeys are now a pair of dictionary root and key
- Entry statements are now defined as Contains with literal arguments
- Operations that take Entries now use Contains statements with literal arguments
- frontend:
- Rename `SignedPod` to `SignedDict` (which now contains the dict, public key and signature, and can still `verify(self)`ed)
- The `SignedDict` keeps the method `get_statement` for convenience but now it returns a `Contains` statement that proves the existence of the key in the dict
- The `MainPodBuilder` automatically inserts a `Contains` statement when an operation is added that uses an entry as argument that was not yet "opened".
- Removed the `literal` methods from the `MainPodBuilder` that were loading literals to anchored keys: that was no longer needed after we introduced literal arguments
- backend
- Only verify inclusion of the verifying key into the vd_set if the pod is MainPod. A pod is not MainPod if the first statement is Intro.
- Reject intro pods that have non-intro statements
- Empty pod now returns an intro statement
- Don't insert a type statement automatically in MainPod and MockMainPod. We get rid of the type entry.
- Implement `SignedBy` operation, which uses the muxed table to store signature verifications
- Rename `PodId` to `statements_hash` or `sts_hash` for short. Now this is only used as a hash of the statements for the circuits public inputs.
- Refactor normalization of `self` statements:
- Before: replace values that contain `SELF` by the given pod_id
- After: place the verifying key hash into the Intro predicates
Changed the middleware to only allow comparison of integers and to
use the implementation of Ord for i64. This matches the backend
behavior.
Also fixed a separate bug where LtEqFromEntries was producing a
NotEquals statement.
* Add container update ops
* Update src/middleware/operation.rs
Co-authored-by: Eduard S. <eduardsanou@posteo.net>
* Update src/backends/plonky2/mainpod/mod.rs
Co-authored-by: Eduard S. <eduardsanou@posteo.net>
* Code review
---------
Co-authored-by: Eduard S. <eduardsanou@posteo.net>
- Add a function to calculate the hash of the `CommonCircuitData`. The hash uniquely identify the `CommonCircuitData` used for a circuit/proof. Serializing the struct is not enough because the polynomial identities of the custom gates are not serialized (only their parameters are); so I made a function to extract "fingerprints" of the custom gates by evaluating them over a predefined list of uniform values, and then doing a random linear combination over the results.
- Store the full verifier only circuit data of a proof in the MainPod so that we can verify pods from old circuits in new circuits and code
- Store the hash of the `CommonCircuitData` in the MainPod so that we can reject verifying old pods that use a different `CommonCircuitData` than the current one. This has two goals
- If the `CommonCircuitData` changes it's very likely that the verification will fail, but it will be hard to debug. Doing this early check helps identify the origin of the verification failure as early as possible
- There's a chance that the verification could succeed when the `CommonCircuitData` changes, and that could be dangerous because the verification will be doing different checks than the ones intended for the original proof, so we may be skipping some constraints that could lead to exploiting the system. For this reason, whenever the common circuit data hash changes, all previous verifying keys should be discarded (that is, not included in the VDSet)
- The fingerprint only has ~64 bits and the "random evaluation point" is fixed. The assumption is that the pod developers are not malicious and are not changing the gates such that different gates give the same fingerprint. With this assumption, I find it reasonable to assume that with high probability if a gate changes, its fingerprint changes as well.
- Add a github action that updates a wiki page with a table that contains: date, commit, params hash (with a link to the actual params), verifier data only circuit data hash and common circuit data hash. This will make it easy to track when the common circuit data changes as well as track the verifier data corresponding to various versions (identified by commit)
- The edited page is this one https://github.com/0xPARC/pod2/wiki/MainPod-circuit-info
Resolve https://github.com/0xPARC/pod2/issues/386
Summary of breaking changes:
- The `RecursivePod` trait has a new method `common_hash` that needs to return the result of `hash_common_data` on the `CommonCircuitData` that the circuit uses.
- Extend the `Flattenable` trait to include a `size` method that returns the number of `Target`s the type requires. This is used in the table to figure out the max length of an array that must fit all entry types.
- Move the circuit methods to precalculate hash states and do hashes started from a precomputed state to a new module
- Introduce `MuxTableTarget` which allows easy multiplexing of tables where each sub-table may have entries of different lengths. The table access is done via hashing + unhashing automatically (via use of a generator)
- Use the `MuxTableTarget` to access merkle tree claims and custom predicate verification, which where previously in different tables and accessed with independent random accesses each
- Move the public key derivation for the PublicKeyOf operation check to the same multiplexed table. Now we can choose how many of those operations a circuit supports.
Resolve https://github.com/0xPARC/pod2/issues/357
Resolve https://github.com/0xPARC/pod2/issues/361
Support arrays up to 256 elements (hardcoded maximum just to avoid abuse) by combining multiple random_accesses.
The index is now split into low and high parts. It's a bit more inconvenient than using a single Target but this allows avoiding bit decomposition.
Add the missing gates and generator in the serializer that were added
with the PublicKeyOf operation.
Add a test for CircuitData serialization+deserialization to avoid these
kind of bugs in the future.
* wrote some initial code
* added way to input private key into circuit
* TypedValue::SecretKey hashed as 10 32-bit limbs
* Check PublicKeyOf in Frontend and Middleware
* Diff review
* PR review
* Finish utest
* Fix bounds check
* added giving secret key witness to circuit
* Test & doc improvements
* added private key comparison to circuit and added test cases
* cargo fmt
* Add frontend tests for PublicKeyOf
* Add public_key_of and hash_of to op! macro
* Add ownership check to ticket example
* Group order checking in tests
* More negative test cases at circuit level
* Cleanups after self review
* clippy fixes
* Fixes after merge. Temporarily remove plonky2 commit hash
* Add a nullifier to the ticket test example
* Test PublicKeyOf with a real prover (not mock)
* plonky-u32 dependency
* feat: optimize operation checks
Skip the circuits that verify operation checks other than None, Copy or
NewEntry for the public statements. This works because public
statements are created by copying private statements, so we never use
the other operation checks in those slots.
---------
Co-authored-by: Andrew Twyman <artwyman@gmail.com>
Co-authored-by: Eduard S. <eduardsanou@posteo.net>
- Bump rust version to `nightly-2025-07-02` because some of the nightly features we were using have been stabilized.
- Introduce feature `disk_cache` which enables caching to disk. Each time an artifact is retrieved from the cache it will be read and deserialized. On a cache miss the artifact will be created, serialized and stored to disk.
- Introduce feature `mem_cache` which enables caching to memory. All cached artifacts are kept in memory after they are created. The mem cache implementation avoids cloning of artifacts by extending their lifetime to `'static`. This is `unsafe` code, but I argue that this usage is safe.
- Add a `build.rs`
- When the feature `disk_cache` is enabled, the `build.rs` will inject env variables to the process with the git commit information, which is used to index the cached artifacts
- Replace all previous cached artifacts from `LazyStatic` methods that call the cache API
- Derive `Serialize, Deserialize` for all `*Target` types so that they can be serialized for caching to disk
- Add finer level of caching: now we cache the `CircuitData` and `VerifierData` independently. The reason for this is that `CircuitData` is a very big artifact which is not needed for verification. So by only accessing `VerifierData` in verification we don't pay a big overhead for reading from disk and deserializing
- Add missing artifacts to the cache: like the `CircuitData` for the `MainPod` indexed by `Params`
- Add helper types to serialize and deserialize `CircuitData`, `CommonData` and `VerifierData` with the set of gates and generators used in the recursive MainPod circuit
- Tweak the ids of our custom gates so that they remain unique when their generic parameters change
- Bugfix: several tests were using the standard `vd_set` but were using MainPod circuits with non-default parameters. This was working before because there was a bug: the MainPod circuit was reporting that the used verifier data was the standard one instead of picking the one corresponding to it's own Params.
Summary of breaking changes:
- One and only one of the features `mem_cache` or `disk_cache` need to be enabled. By default it's `mem_cache`
- To enable the `disk_cache` you need to disable the default features like this: `--no-default-features --features=backend_plonky2,zk,disk_cache`
- Removed `DEFAULT_PARAMS`, instead use `Params::default()`
- Removed `STANDARD_REC_MAIN_POD_CIRCUIT_DATA`, instead use `cache_get_standard_rec_main_pod_common_circuit_data`
- The library is now using `nightly-2025-07-02`. Some rust language features are unstable in previous versions.
* implement merkletree insert & insert-proof-verification
* add merkletree circuit to verify insertion proof
wip
* fix merkletree's GraphViz generation for cases with empty siblings
* implement tree insert-verif circuit siblings checks
Note: I've implemented also an alternative version which instead of
inputting a witness value 'divergence_level' it inputs a bitmask. Both
approaches (divergence_level and divergence_bitmask) take the same
amount of constraints (336 constraints for a tree of 32 levels, and for
an hybrid approach it takes 331 constraints but the code gets a bit less
readable). So I've kept with the current implementation (using
divergence_level) which is more easy to follow.
* [tree] modify the strategy for the insert-proof (out-circuit)
* re-implement insert-proof verification circuit
* add pending checks and polish
* add tests for disabled(&enabled) cases that should fail
* update typos.toml config
* Add test with tampering
* add check 5.3, to prevent tampering (at insertion proof circuit)
* move old_leaf_hash computation outside the loop, simplify check 5.3 booleans
* apply @ed255 review suggestions
---------
Co-authored-by: Ahmad <root@ahmadafuni.com>
In this commit I remove all `*Gadget` types and instead implement the naming convention defined here https://github.com/0xPARC/pod2/issues/181#issuecomment-3051954321
The biggest changes can be summarized by:
- a) Removal of `*Gadget` types and their `eval_*` methods in favour of `verb_object_circuit` functions.
- b) The above functions don't create targets that need to be witness-assigned later. Instead they receive those as arguments. This clearly shows what's the circuit input and output.
I'm specially happy about the changes from b), I think they make the flow of data in the circuit more clear.
Missing things that I did not address in this PR
- The RecursiveCircuit still uses some old naming conventions like `build`.
- We have some `*Target` types that have methods that define constraints. I think we can keep those as they are convenient and I don't see them as strongly breaking the new convention: I see them as the object-oriented way to apply the convention. In those cases the `object` can be omitted from the method when it's implied by the type name, and the `_circuit` suffix doesn't appear because it's implied by the fact that the type is a `*Target`. Examples are: `SignatureTarget::verify -> BoolTarget`, `StatementTarget::has_native_type -> BoolTarget` or `OperationTypeTarget::as_custom -> (BoolTarget, HashOutTarget, Target)`.
The PodSigner trait was taking `&mut self` in the `sign` method, but the
signer doesn't need mutation in the Shcnorr implementation. Remove the
`mut`.
Previously the PodProver trait was also taking `&mut self` in the
`prove` method, and we had many tests creating a `mut Prover/mut
MockProver`. Remove all those `mut`.
Breaking change: `PodSigner` trait method `sign` replaces `&mut self` by
`&self`
- serialize the signer in base58 both as Value and as the signer embedded
in the SignedPod json data field.
- Implement serialization/deserialization for Signature