Bachelor/Master Thesis

Refereed Delegation for Machine Learning

Training machine learning programs can be expensive in computational and memory resources. A common solution to this is to delegate tasks such as training, fine-tuning, or inference to external compute providers [2, 3]. This is usually done without any correctness guarantees, dishonest servers might return incorrect outputs. Given the high cost of these computations, there exists a high incentive for this type of behavior. Proposed solutions to these problems include using cryptographic proof systems [4] or heuristics [5], however, they have been found to be inefficient or vulnerable to attacks [6].

A recent solution [1] inspired from the Byzantine fault tolerance literature involves using referees. The client delegates the task to multiple compute providers and if they provide conflicting outputs, a trusted third party (the referee) is used to solve conflicts. This approach can solve the problem in a more efficient way, however, due to the nature of machine learning programs, new problems arise. These are related to the huge program states of neural networks and the difficulty in ensuring bitwise reproducibility among hardware setups due to their reliance on floating point operations.

In this project we will study the literature surrounding the delegation of machine learning tasks to external compute providers, evaluate the solution using refereed delegation of computation [1], and study improvements that can be applied to the existing solutions. Experience training machine learning models is preferred.

References

[1] Verde: Verification via Refereed Delegation for Machine Learning Programs

[2] gensyn

[3] CoreWeave

[4] zkLLM: Zero Knowledge Proofs for Large Language Models

[5] Proof-of-Learning: Definitions and Practice

[6] Proof-of-Learning is Currently More Broken Than You Think

Contact Michael Senn and Juan Villacis for more information.

Nature of the project: Theory 50%, Systems 50%.