Feat/disk cache (#354)

- Bump rust version to `nightly-2025-07-02` because some of the nightly features we were using have been stabilized.
- Introduce feature `disk_cache` which enables caching to disk.  Each time an artifact is retrieved from the cache it will be read and deserialized.  On a cache miss the artifact will be created, serialized and stored to disk.
- Introduce feature `mem_cache` which enables caching to memory.  All cached artifacts are kept in memory after they are created.  The mem cache implementation avoids cloning of artifacts by extending their lifetime to `'static`.  This is `unsafe` code, but I argue that this usage is safe.
- Add a `build.rs`
  - When the feature `disk_cache` is enabled, the `build.rs` will inject env variables to the process with the git commit information, which is used to index the cached artifacts
- Replace all previous cached artifacts from `LazyStatic` methods that call the cache API
- Derive `Serialize, Deserialize` for all `*Target` types so that they can be serialized for caching to disk
- Add finer level of caching: now we cache the `CircuitData` and `VerifierData` independently.  The reason for this is that `CircuitData` is a very big artifact which is not needed for verification.  So by only accessing `VerifierData` in verification we don't pay a big overhead for reading from disk and deserializing
- Add missing artifacts to the cache: like the `CircuitData` for the `MainPod` indexed by `Params`
- Add helper types to serialize and deserialize `CircuitData`, `CommonData` and `VerifierData` with the set of gates and generators used in the recursive MainPod circuit
- Tweak the ids of our custom gates so that they remain unique when their generic parameters change
- Bugfix: several tests were using the standard `vd_set` but were using MainPod circuits with non-default parameters.  This was working before because there was a bug: the MainPod circuit was reporting that the used verifier data was the standard one instead of picking the one corresponding to it's own Params.

Summary of breaking changes:
- One and only one of the features `mem_cache` or `disk_cache` need to be enabled.  By default it's `mem_cache`
  - To enable the `disk_cache` you need to disable the default features like this: `--no-default-features --features=backend_plonky2,zk,disk_cache`
- Removed `DEFAULT_PARAMS`, instead use `Params::default()`
- Removed `STANDARD_REC_MAIN_POD_CIRCUIT_DATA`, instead use `cache_get_standard_rec_main_pod_common_circuit_data`
- The library is now using `nightly-2025-07-02`.  Some rust language features are unstable in previous versions.
This commit is contained in:
Eduard S. 2025-07-24 12:15:31 +02:00 committed by GitHub
parent 745d654048
commit 8429cd224d
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
35 changed files with 831 additions and 207 deletions

83
src/cache/mem.rs vendored Normal file
View file

@ -0,0 +1,83 @@
use std::{
any::Any,
collections::HashMap,
ops::Deref,
sync::{LazyLock, Mutex},
thread, time,
};
use serde::{de::DeserializeOwned, Serialize};
use sha2::{Digest, Sha256};
#[allow(clippy::type_complexity)]
static CACHE: LazyLock<Mutex<HashMap<String, Option<Box<dyn Any + Send + 'static>>>>> =
LazyLock::new(|| Mutex::new(HashMap::new()));
pub struct CacheEntry<T: 'static> {
value: &'static T,
}
impl<T> Deref for CacheEntry<T> {
type Target = T;
fn deref(&self) -> &Self::Target {
self.value
}
}
/// Get the artifact named `name` from the memory cache. If it doesn't exist, it will be built by
/// calling `build_fn` and stored.
/// The artifact is indexed by `params: P`.
pub(crate) fn get<T: Serialize + DeserializeOwned + Send + 'static, P: Serialize>(
name: &str,
params: &P,
build_fn: fn(&P) -> T,
) -> Result<CacheEntry<T>, Box<dyn std::error::Error>> {
let params_json = serde_json::to_string(params)?;
let params_json_hash = Sha256::digest(&params_json);
let params_json_hash_str_long = format!("{:x}", params_json_hash);
let key = format!("{}/{}", &params_json_hash_str_long[..32], name);
log::debug!("getting {} from the mem cache", name);
loop {
let mut cache = CACHE.lock()?;
if let Some(entry) = cache.get(&key) {
if let Some(boxed_data) = entry {
if let Some(data) = boxed_data.downcast_ref::<T>() {
log::debug!("found {} in the mem cache", name);
// The data is now in the heap (boxed), and will never go away because we can
// only insert into the CACHE if there's no entry, we can't delete nor update.
// Since it's not going away, not moving, and the CACHE is 'static, it's safe
// to extend the lifetime of data to 'static.
let data_static = unsafe { std::mem::transmute::<&T, &'static T>(data) };
return Ok(CacheEntry { value: data_static });
} else {
panic!(
"type={} doesn't match the type in the cached boxed value with name={}",
std::any::type_name::<T>(),
name
);
}
} else {
// Another thread is building this entry, let's retry again in 100 ms
drop(cache); // release the lock
thread::sleep(time::Duration::from_millis(100));
continue;
}
}
// No entry in the cache, let's put a `None` to signal that we're building the
// artifact, release the lock, build the artifact and insert it. We do this to avoid
// locking for a long time.
cache.insert(key.clone(), None);
drop(cache); // release the lock
log::info!("building {} and storing to the mem cache", name);
let start = std::time::Instant::now();
let data = build_fn(params);
let elapsed = std::time::Instant::now() - start;
log::debug!("built {} in {:?}", name, elapsed);
CACHE.lock()?.insert(key, Some(Box::new(data)));
// Call `get` again and this time we'll retrieve the data from the cache
return get(name, params, build_fn);
}
}