Support persistent storage in Containers (#493)

Extend the work of https://github.com/0xPARC/pod2/pull/487 to the Containers (Dictionary, Set, Array).

The merkle tree only stores `RawValue` for both the key and the value, so it is the responsibility of the Container to store the rich value.

In order to handle containers with persistent storage efficiently (which means, cloning them or updating them should not cause an O(n) data copy) I figured we need to have a database of `Value`s indexed by their raw value; as this gives us deduplication and free cloning of containers.
The issue with this approach is that in the current design we have collisions between Value's of different types: https://github.com/0xPARC/pod2/issues/426 and the current API relies on the single type of values.

To resolve this issue I decided to change the API, instead of assuming that a Value has a fixed type, let the value be possibly multiple compatible types and let the user of the library try casting the Value to a particular type.
For this I deprecated the public access of everything related to `TypedValue` and I propose for it to be considered an implementation detail and a blackbox from the external developer point of view.  The `Value` type is now used like this:
- To create a new Value use `Value::from(...)` where you can pass any compatible type (the same types as before)
- To access the Value in typed form you cast it like `value.as_foo()` which returns `Option<Foo>`.

Previously we had a collision between `true` and `1` (and `false` and `0`).  Now it doesn't matter whether a value holds a `true` or a `1`, both should be seen as the same and both return `Some` when doing `as_int` and `as_bool`.

Similarly we had collisions with containers.  For example `set(0, 1, 2) == array[0, 1, 2]` and `set("a", "b") = dict("a": "a", "b": "b")`.  Now any container can be casted to any of `set, array, dict`.  There's a caveat here: each of these types expects a particular encoding of keys, so casting to the wrong type will return errors on some operations.

With this design it no longer matters what is being stored and recovered because the API requires the user to express the expected type and any type with collisions for particular values can be casted to the right type.

There's only one case where it's not desirable to swap one `TypedValue` for another: the `TypedValue::Raw`.  If a non-`RawValue` in the DB is replaced by the corresponding `RawValue` we erase the required information to recover the rich value.  For this reason the implementations of the database treat the `RawValue` as a special case: if an value is stored in non-`RawValue`, the corresponding `RawValue` can never overwrite it.  If a value is stored in `RawValue`, a matching non-`RawValue` will overwrite it (promoting it to a rich value).  This way we never lose data.

A consequence of this is that the serialization, `Display` and `Debug` of a container is not stable.  At any point any of the entries can be swapped for a "compatible" one if they share the storage with other containers that introduce collisions.

I rewrote all containers as wrapper to a generic `Container` which holds a `Map` from `Value` to `Value`.  The serialization of each container now uses the single implementation of the generic `Container`.
This commit is contained in:
Eduard S. 2026-03-23 12:31:28 +01:00 committed by GitHub
parent 32f45872d7
commit 13cabdb511
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
22 changed files with 1187 additions and 621 deletions

View file

@ -13,10 +13,11 @@ use serde::{Deserialize, Serialize};
pub use serialization::SerializedMainPod;
use crate::middleware::{
self, check_custom_pred, containers::Dictionary, fill_wildcard_values, hash_op, max_op,
prod_op, sum_op, AnchoredKey, Hash, Key, MainPodInputs, MainPodProver, NativeOperation,
OperationAux, OperationType, Params, PublicKey, RawValue, Signature, Signer, Statement,
StatementArg, VDSet, Value, ValueRef,
self, check_custom_pred,
containers::{Container, Dictionary},
fill_wildcard_values, hash_op, max_op, prod_op, sum_op, AnchoredKey, Hash, Key, MainPodInputs,
MainPodProver, NativeOperation, OperationAux, OperationType, Params, PublicKey, RawValue,
Signature, Signer, Statement, StatementArg, VDSet, Value, ValueRef, EMPTY_VALUE,
};
mod custom;
@ -92,8 +93,11 @@ impl fmt::Display for SignedDict {
// https://0xparc.github.io/pod2/merkletree.html will not need it since it will be
// deterministic based on the keys values not on the order of the keys when added into the
// tree.
for (k, v) in self.dict.kvs().iter().sorted_by_key(|kv| kv.0.hash()) {
writeln!(f, " - {} = {}", k, v)?;
for kv in self.dict.iter() {
match kv {
Ok((k, v)) => writeln!(f, " - {} = {}", k, v)?,
Err(e) => writeln!(f, " - ERR: {}", e)?,
}
}
Ok(())
}
@ -106,16 +110,13 @@ impl SignedDict {
.then_some(())
.ok_or(Error::custom("Invalid signature!"))
}
pub fn kvs(&self) -> &HashMap<Key, Value> {
self.dict.kvs()
}
pub fn get(&self, key: impl Into<Key>) -> Option<&Value> {
self.kvs().get(&key.into())
pub fn get(&self, key: impl Into<Key>) -> Option<Value> {
self.dict.get(&key.into()).unwrap()
}
// Returns the Contains statement that defines key if it exists.
pub fn get_statement(&self, key: impl Into<Key>) -> Option<Statement> {
let key: Key = key.into();
self.kvs().get(&key).map(|value| {
self.dict.get(&key).unwrap().map(|value| {
Statement::Contains(
ValueRef::Literal(Value::from(self.dict.clone())),
ValueRef::Literal(Value::from(key.name())),
@ -156,6 +157,11 @@ impl fmt::Display for MainPodBuilder {
}
}
fn as_container_or_err(v: &Value) -> Result<Container> {
v.as_container()
.ok_or_else(|| Error::custom(format!("{v} not a container")))
}
impl MainPodBuilder {
pub fn new(params: &Params, vd_set: &VDSet) -> Self {
Self {
@ -347,11 +353,12 @@ impl MainPodBuilder {
.ok_or(Error::custom(format!(
"Invalid key argument for op {}.",
op
)))?;
)))?
.raw();
let proof = if op_type == &Native(ContainsFromEntries) {
container.prove_existence(key)?.1
as_container_or_err(container)?.prove(key)?.1
} else {
container.prove_nonexistence(key)?
as_container_or_err(container)?.prove_nonexistence(key)?
};
Ok(Operation(op_type.clone(), op.1, OpAux::MerkleProof(proof)))
}
@ -375,18 +382,16 @@ impl MainPodBuilder {
let value =
op.1.get(3)
.and_then(|arg| arg.value())
.ok_or(Error::custom(format!(
"Invalid key argument for op {}.",
op
)));
.cloned()
.unwrap_or(Value::from(EMPTY_VALUE));
let proof = match op_type {
Native(ContainerInsertFromEntries) => {
old_container.prove_insertion(key, value?)?
as_container_or_err(old_container)?.insert(key.clone(), value)?
}
Native(ContainerUpdateFromEntries) => {
old_container.prove_update(key, value?)?
as_container_or_err(old_container)?.update(key.raw(), value)?
}
_ => old_container.prove_deletion(key)?,
_ => as_container_or_err(old_container)?.delete(key.raw())?,
};
Ok(Operation(
op_type.clone(),

View file

@ -4,7 +4,7 @@ use crate::{
frontend::SignedDict,
middleware::{
containers::Dictionary, root_key_to_ak, CustomPredicateRef, NativeOperation, OperationAux,
OperationType, Signature, Statement, TypedValue, Value, ValueRef,
OperationType, Signature, Statement, Value, ValueRef,
},
};
@ -39,10 +39,9 @@ impl OperationArg {
}
pub(crate) fn int_value_and_ref(&self) -> Option<(ValueRef, i64)> {
self.value_and_ref().and_then(|(r, v)| match v.typed() {
&TypedValue::Int(i) => Some((r, i)),
_ => None,
})
self.value_and_ref()
.and_then(|(r, v)| v.as_int().map(|i| Some((r, i))))
.flatten()
}
}
@ -71,7 +70,7 @@ impl From<&Value> for OperationArg {
impl From<(&Dictionary, &str)> for OperationArg {
fn from((dict, key): (&Dictionary, &str)) -> Self {
// TODO: Use TryFrom
let value = dict.get(&key.into()).cloned().unwrap();
let value = dict.get(&key.into()).unwrap().unwrap();
Self::Statement(Statement::Contains(
dict.clone().into(),
key.into(),

View file

@ -83,7 +83,7 @@ mod tests {
middleware::{
self,
containers::{Array, Dictionary, Set},
Params, Signer as _, TypedValue, DEFAULT_VD_LIST,
Params, Signer as _, Value, DEFAULT_VD_LIST,
},
};
@ -91,48 +91,46 @@ mod tests {
fn test_value_serialization() {
// Pairs of values and their expected serialized representations
let values = vec![
(TypedValue::String("hello".to_string()), "\"hello\""),
(TypedValue::Int(42), "{\"Int\":\"42\"}"),
(TypedValue::Bool(true), "true"),
(Value::from("hello"), "\"hello\""),
(Value::from(42), "{\"Int\":\"42\"}"),
(Value::from(true), r#"{"Int":"1"}"#),
(
TypedValue::Array(Array::new(vec!["foo".into(), false.into()])),
"{\"array\":[\"foo\",false]}",
Value::from(Array::new(vec![Value::from("foo"), Value::from(false)])),
r#"{"inner":[[{"Int":"0"},"foo"],[{"Int":"1"},{"Int":"0"}]]}"#,
),
(
TypedValue::Dictionary(
Dictionary::new(HashMap::from([
// The set of valid keys is equal to the set of valid JSON keys
("foo".into(), 123.into()),
// Empty strings are valid JSON keys
(("".into()), "baz".into()),
// Keys can contain whitespace
((" hi".into()), false.into()),
// Keys can contain special characters
(("!@£$%^&&*()".into()), "".into()),
// Keys can contain _very_ special characters
(("\0".into()), "".into()),
// Keys can contain emojis
(("🥳".into()), "party time!".into()),
]))
),
"{\"kvs\":{\"\":\"baz\",\"\\u0000\":\"\",\" hi\":false,\"!@£$%^&&*()\":\"\",\"foo\":{\"Int\":\"123\"},\"🥳\":\"party time!\"}}",
Value::from(Dictionary::new(HashMap::from([
// The set of valid keys is equal to the set of valid JSON keys
("foo".into(), 123.into()),
// Empty strings are valid JSON keys
(("".into()), "baz".into()),
// Keys can contain whitespace
((" hi".into()), false.into()),
// Keys can contain special characters
(("!@£$%^&&*()".into()), "".into()),
// Keys can contain _very_ special characters
(("\0".into()), "".into()),
// Keys can contain emojis
(("🥳".into()), "party time!".into()),
]))),
r#"{"inner":[["!@£$%^&&*()",""],["🥳","party time!"],[" hi",{"Int":"0"}],["foo",{"Int":"123"}],["\u0000",""],["","baz"]]}"#,
),
(
TypedValue::Set(Set::new(HashSet::from(["foo".into(), "bar".into()]))),
"{\"set\":[\"bar\",\"foo\"]}",
Value::from(Set::new(HashSet::from(["foo".into(), "bar".into()]))),
r#"{"inner":[["bar"],["foo"]]}"#,
),
];
for (value, expected) in values {
let serialized = serde_json::to_string(&value).unwrap();
assert_eq!(serialized, expected);
let deserialized: TypedValue = serde_json::from_str(&serialized).unwrap();
let deserialized: Value = serde_json::from_str(&serialized).unwrap();
assert_eq!(
value, deserialized,
"value {:#?} should equal deserialized {:#?}",
value, deserialized
);
let expected_deserialized: TypedValue = serde_json::from_str(expected).unwrap();
let expected_deserialized: Value = serde_json::from_str(expected).unwrap();
assert_eq!(value, expected_deserialized);
}
}
@ -177,7 +175,10 @@ mod tests {
"deserialized: {}",
serde_json::to_string_pretty(&deserialized).unwrap()
);
assert_eq!(signed_dict.dict.kvs(), deserialized.dict.kvs());
assert_eq!(
signed_dict.dict.dump().unwrap(),
deserialized.dict.dump().unwrap()
);
assert_eq!(signed_dict.public_key, deserialized.public_key);
assert_eq!(signed_dict.signature, deserialized.signature);
assert_eq!(signed_dict.verify().is_ok(), deserialized.verify().is_ok());