Foresight Ventures: What the heck is zkML?

Question

> zkML is a bridge between AI and the blockchain. Its significance is to enable the blockchain to perceive the physical world, enable smart contracts to make decisions, and run AI models with privacy protection.## **Summary*** ZKML (Zero knowledge machine learning) is **a technology that uses zero-knowledge proofs for machine learning**, **ZKML is a bridge between AI and blockchain**. ZKML can solve the problem of **privacy protection** of AI model/input and **verifiable reasoning process**, so that the ZKP of **small model or reasoning can be uploaded to the chain**. The significance of the model/reasoning proof chain is:* **Let the blockchain perceive the physical world**. For example: a face recognition model running on the blockchain can perceive the face for the blockchain, and the AI model on the blockchain can understand that the face may be a woman, how old it is, etc.* **Enabling smart contracts to make decisions**. For example: the WETH price prediction model on the chain can help smart contracts make transaction decisions.* **Run AI models privately**. For example, an enterprise spends a lot of computing power to train a model, hoping to provide inference services in a privacy-preserving manner, or the user's input wants to ensure privacy. Using ZKML can not only **guarantee the privacy of the model/input**, but also prove to the user that the reasoning is carried out correctly, realizing **trustless reasoning**.* Application of ZKML* **AI on the chain**: Put the AI model/AI reasoning proof on the chain, so that smart contracts can use AI to make decisions. For example, an on-chain trading system is used for on-chain investment decisions.* **Self-improving blockchain: **Let the blockchain use the ability of AI to continuously improve and revise strategies based on historical data. For example, an AI-based on-chain reputation system.* **AIGC on-chain**: Content/artwork generated by AIGC, on-chain Mint into NFT, ZK can prove the correctness of the process, no copyrighted images are used in the data set, etc.* **Biometric authentication (KYC) of the wallet**: The proof of face recognition is uploaded to the chain, and the wallet completes KYC.* **AI Security**: Use AI for fraud detection, Sybil attack prevention, etc.* **ZKML games on the chain**: artificial intelligence chess players on the chain, NFT characters driven by neural networks, etc.* ZKML technically* **Goal: Transform neural network into ZK circuit**. Difficulties: 1. ZK circuit does not support floating point numbers, 2. Too large a neural network is difficult to convert.* Current Progress:* The earliest ZKML library was 2 years ago, and the development history of the entire technology is very short. At present, the latest ZKML library supports some **simple neural network ZK, and applied to the blockchain**. It is said that the basic linear regression model can be uploaded to the chain, and other types of smaller neural network models can support proof chaining. But I saw very few Demos, only a handwritten digit recognition.* **Some tools claim to support 100M parameters, and some claim to convert GPT2 into a ZK circuit to generate a ZK proof. *** Direction of development:* **Network Quantization (network quantization)**, convert the floating-point numbers in the neural network to fixed-point numbers, and lightweight the neural network (ZK friendly).* Attempt to convert the neural network with **large-scale parameters into a ZK circuit**, and improve the proof efficiency (expand ZK capability).* Summarize:* **ZKML is a bridge between AI and blockchain**, its significance is to enable blockchain to perceive the physical world, enable smart contracts to make decisions, and run AI models with privacy protection**, **is a very Promising technology.* The history of this technology is very short but it is developing very fast. At present, some simple neural network models can be transformed into ZK circuits, which can be used for model chaining or reasoning proof chaining. But the language is relatively difficult. At present, Ddkang/zkml claims that it can generate the ZK version of GPT2, Bert and Diffusion natural language processing model, but it is not clear about the actual effect. It can run but may not be able to be uploaded to the chain. **I believe that with the development of network volume technology, ZK technology, and blockchain expansion technology, the ZKML language model will soon become available**.#### **1. Background**(If you know something about ZK, ML, you can skip this chapter).* **Zero-knowledge proof (ZK): **Zero-knowledge proof means that the prover can convince the verifier that a certain assertion is correct without providing any useful information to the verifier. ZK is mainly used to prove that the calculation process is performed correctly and to protect privacy.* **Prove the correctness of the calculation process**: Take ZK-rollup as an example, the operation of ZK-rollup is simply to package multiple transactions together, publish them on L1, and issue a proof at the same time (using zero-knowledge proof technology ) to claim that these transactions are valid, once it is verified on L1 that they are valid, the state of zk-rollup will be updated.* **Privacy Protection**: Taking the Aztec protocol as an example, the assets on Aztec’s zk.money exist in the form of bills, similar to Bitcoin’s UTXO, the amount of the bill is encrypted, when the user needs to transfer money, the bill needs to be destroyed and Create new notes to the payee as well as to yourself (change). Zero-knowledge proof is used to protect privacy to prove that the amount of the destroyed note is the same as that of the newly created note, and the user has the right to control the note.* **Machine Learning:** Machine Learning is a branch of Artificial Intelligence. Machine learning theory is mainly to design and analyze some algorithms that allow computers to "learn" automatically. Machine learning algorithms automatically analyze and obtain laws from data, and use the laws to predict unknown data. Machine learning has been widely used in computer vision, natural language processing, biometric recognition, search engines, medical diagnosis, detecting credit card fraud, securities market analysis, DNA sequencing, speech and handwriting recognition, games and robotics.## **2. What problem does ZKML solve? **ZKML is an area of research and development that has caused a stir in the cryptography community in the past two years. Using zero-knowledge proof for machine learning**, the main goal of this **technology is to use zero-knowledge proof to solve the privacy protection and verifiable problems of machine learning**. In this way, the small model or ZKP of reasoning can be uploaded to the chain and become a bridge between AI and blockchain:* **Model chaining**: ML models can be converted into ZK circuits, and small ZKML models can be stored in the smart contract of the blockchain. Users can use the model by calling smart contract methods. For example, RockyBot of Modulus Labs made an AI model on the chain to predict the price of WETH for trading decisions.* **Model reasoning proof, etc. on the chain**: convert the ML model into a ZK circuit, perform reasoning off the chain, and generate a ZK proof. ZK proofs can prove that the reasoning process is performed correctly. The reasoning results and ZK proof are submitted to the chain for reference by the caller and smart contract verification proof.** What is the significance of the model/reasoning proof on the chain? *** **Let the blockchain perceive the physical world**. For example: a face recognition model running on the blockchain can perceive the face for the blockchain, and the AI model on the blockchain can understand that the face may be a woman, how old it is, etc.* **Enabling smart contracts to make decisions**. For example: the WETH price prediction model on the chain can help smart contracts make transaction decisions.* **Run AI models privately**. For example, an enterprise spends a lot of computing power to train a model, hoping to provide inference services in a privacy-preserving manner, or the user's input wants to ensure privacy. Using ZKML can not only **guarantee the privacy of the model/input**, but also prove to the user that the reasoning is carried out correctly, realizing **trustless reasoning**.**Zero-knowledge proof proves the role in ZKML**:**1. Privacy protection: Protect the privacy of the input data in the ML model or prediction process. *** **Data Privacy (Public Model + Private Data): **I have some sensitive data, such as medical data, face images, etc. I can use ZKML to protect the privacy of the input data, run the public neural network model on this data, and get the result. For example, face recognition models,* **Model Privacy (Private Model + Public Data)**: For example, I spend a lot of money to train the model. I don't want to expose my model, so I need to protect the privacy of the model. I can use ZKML to run a private neural network model that preserves privacy, and this model can infer public input to get output.**2. Verifiability: ZKP is used to prove the correct execution of the ML reasoning process, making the machine learning process verifiable. *** Suppose, the execution of the model is not on my server, but I need to ensure that the speculation is executed correctly. I can use ZKML to perform an inference on an input and a model, it produces an output, ZKP can prove that this process is executed correctly, even if the running process is not on my computer, I can verify that the inference is correct by verifying the ZKP implemented, and thus believe in the result.## **3. Use Cases for ZKML*** ** COMPUTING COMPLETENESS *** **On-chain AI (On-chain AI): **Deploy the AI model on the blockchain, so that smart contracts can have decision-making capabilities through the AI model.* Modulus Labs: RockyBot On-chain verifiable ML trading bot (a verifiable machine learning trading robot on the blockchain)* **Self-improving blockchain**: Let the blockchain use the ability of AI to continuously improve and correct strategies based on historical data.* Enhanced Lyra finance's AMM with artificial intelligence.* Create an AI based reputation system for Astraly.* Create smart contract-level AI-based compliance functions for the Aztec protocol* Modulus Labs:Blockchains that self-improve (link):* **AIGC on-chain**: Content/artwork generated by AIGC, on-chain Mint into NFT, ZK can prove the correctness of the process, no copyrighted images are used in the data set, etc.* **ML as a Service**(MLaaS) transparency (link)* **AI Security**: Use AI for fraud detection, Sybil attack prevention, etc. The AI anomaly detection model is trained according to the smart contract data, and the contract is suspended if the indicator is abnormal, and ZK is used for anomaly detection to prove that it is on-chain.* **ZKML games on the chain**: artificial intelligence chess players on the chain, NFT characters driven by neural networks, etc.* **Verifiable AI model benchmark test**: Use ZK to provide model benchmark test proof, and provide verifiability for the test results of the model's performance and effect.* **Correctness proof of model training**: Since model training is very resource-intensive, the correctness proof of model training with ZK is currently not available, but many people think that the technology is feasible and try to use ZK To prove that the model uses a certain data / does not use a certain data to solve the copyright issue of AIGC.* **privacy protection*** **Biometric Authentication/Digital Identity for Wallets***WordCoin is scanning the iris with the biometric device Orb, providing users with a unique verifiable digital identity. WorldCoin is working on zkml, which is planned to be used to upgrade World ID. After the upgrade, users will be able to autonomously keep their signature biometrics in the encrypted storage of their mobile devices, download the ML model generated by the iris code, and create zero-knowledge proofs locally, proving that Its iris code was indeed generated from the signature image using the correct model.* **Blockchain-based machine learning bounty platform*** The company issues rewards and provides public and private data. Public data is used to train the model and private data is used to predict. Some AI service parties train the model and turn it into a ZK circuit. Encrypt the model and submit it to the contract for verification. For private data, make predictions, get results, and generate ZK proofs, which are submitted to the contract for verification. AI service providers receive bounties after completing a series of operations. zkML: Demo for circomlib-ml on Goerli testnet* **Privacy-preserving reasoning**: For example, using private patient data for medical diagnosis and then sending sensitive inferences (such as cancer detection results) to the patient. (vCNN paper, page 2/16)## **4. Layout of ZKML**Judging from the ZKML layout organized by SevenX Ventures.* **Hardware Acceleration**: Many organizations are actively developing ZKP hardware acceleration, which is also conducive to the development of ZKML. Generally, FPGA, GPU and ASIC chips are used to accelerate the generation of ZKP. For example: Accseal is developing ASIC chips for ZKP hardware acceleration, and Ingonyama is building a ZK acceleration library ICIClE, which is designed for GPUs that support CUDA. Supranational focuses on GPU acceleration, Cysic and Ulvetanna focus on FPGA acceleration.* **Input**: To use on-chain data input, Axiom, Herodotus, Hyper Oracle, Lagrange will improve user access to blockchain data and provide more complex on-chain data views. The ML input data can then be extracted from the imported historical data* **reasoning**: ModulusLabs is developing a new zkSNARK system specifically for ZKML. This part can be merged with the ZKML tool set, mainly for the ZKization of the model and the toolset needed in the ZKization process. Giza is a StarkNet-based machine learning platform* that *focuses on *fully on-chain model* deployment scaling.* **Compute**: Focus on building a decentralized computing network for training AI models accessible to everyone. They allow people to use edge computing resources to train AI models at a lower cost.* **Decentralized training/computing power**: Focus on building a decentralized computing network for training AI models that everyone can access. They allow people to use edge computing resources to train AI models at a lower cost.* **ZKML Toolset**: See Chapter 5 Technology Development History. ZAMA in the figure mainly uses fully homomorphic encryption (FHE) for privacy protection of machine learning. Compared with ZKML, FHEML only does privacy and does not do trustless verification.* **Use Case**: Worldcoin, using ZKML for digital identity authentication. The biometrics of the user's signature are encrypted and stored in the user's device, and the machine learning model of ZK-based iris recognition runs the model during identity recognition to verify whether the biometrics match. Use ZKP to prove the correctness of the running process. Modulars Labs is an AI trading robot on the chain. Cathie's EIP7007, zkML AIGC-NFT standard. Artificial intelligence chess players on the chain, NFT characters driven by neural networks, etc.![](https://img.gateio.im/social/moments-aa7e7524fb-a420519c30-dd1a6f-e5a980)## **5. ZKML technology development history**The main challenges in turning a neural network into a ZK circuit are:1. Circuits require fixed-point operations, but floating-point numbers are widely used in neural networks.2. The problem of model size, the conversion of large models is difficult and the circuit is large.The development history of the ZKML library is as follows:1.2021, zk-ml/linear-regression-demo, Peiyuan Liao* Realized the linear regression circuit. Linear regression is a very basic prediction algorithm, which assumes a linear relationship between the output variable and the input variable, and is suitable for predicting numerical variables and studying the relationship between two or more variables. . For example: predicting house prices based on house size and other characteristics, or predicting future sales based on historical sales data, etc.2.2022 年, 0xZKML/zk-mnist, 0xZKML* Made a neural network ZK circuit based on the MNIST data set, which can recognize handwritten numbers. For example: handwriting a number 2, the handwriting is recognized as 2, and a reasoning process proof** is generated. **The proof can be uploaded to the chain, and the proof on the chain can be verified with ethers + snarkjs.* In fact, the zk-mnist library currently only converts the last layer into a circuit, but does not convert the complete neural network into a circuit.3. 2022, socathie/zkML, Cathie* Compared with zk-mnist, ZKML converts a complete neural network into a circuit. Cathie's zkMachineLearning provides multiple ZKML toolkits cirocmlib-ml and keras2circom to help ML engineers convert models into circuits.4. November 2022, zk-ml/uchikoma, Peiyuan Liao* Transfer floating-point operations in neural networks to fixed-point operations. Created and open-sourced a general tool and framework that converts almost any machine learning algorithm into a zero-knowledge proof circuit that is easily integrated with blockchains.* Visual Model -> AIGC* Language model -> chatbot, writing assistant* Linear models and decision trees -> Fraud detection, Sybil attack prevention* Multimodal model -> recommender system* Trained a blockchain-friendly content generation machine learning model (AIGC) and converted it to a ZK circuit. **Use it to generate artwork, generate concise ZK proofs, and finally Mint the artwork into NFT**.5. July 2022, updated March 2023, zkonduit/ezkl* ezkl is a library and command-line tool for inference on deep learning models and other computational graphs in zk-snark (ZKML). Use Halo2 as proof system.* It is possible to define a calculation graph, such as a neural network, and then use ezkl to generate a ZK-SNARK circuit. ZKPs generated for inference can be verified with smart contracts.* It is said to be a model that can support 100M parameters, but it may consume a lot of resources.6. May 2023, Ddkang/zkml (Link)* zkml claims to use ZK to convert GPT2, Bert and Diffusion models. But it may use a lot of memory, and it is not clear whether the proof can be stored in the smart contract.* zkml can verify model execution to **92.4% accuracy** on ImageNet, and can also prove an MNIST model with 99% accuracy in **in four seconds**.7. May 2023, zkp-gravity/0g* Lightweight neural network, supporting private data + public model.**In general, we can see the current exploration direction of ZKML technology:**1. **Network Quantization**, convert the floating-point numbers in the neural network to fixed-point numbers, and lightweight the neural network (ZK friendly).2. Try to convert the neural network with **large-scale parameters into a ZK circuit**, and improve the proof efficiency (expand ZK capability).#### **6. Summary**1. **ZKML is a bridge between AI and blockchain**. Its significance is to enable **blockchain to perceive the physical world, enable smart contracts to make decisions, and run AI models with privacy protection. It is a Very promising technology.2. The history of ZKML is very short and its development is very fast. At present, some simple neural network models can be converted into ZK circuits, and models can be uploaded to the chain or reasoning proofs can be uploaded to the chain. The language model is relatively difficult. At present, Ddkang/zkml claims to be able to generate the ZK version of GPT2, Bert and Diffusion model. **I believe that with the development of network volume technology, ZK technology, and blockchain expansion technology, the ZKML language model will soon become available**.