Markus Roggenbach: An Empirical Study on Generating Proven-to-be-Correct Dafny Programs

The speaker for the next week will be our own Markus.

Title: An Empirical Study on Generating Proven-to-be-Correct Dafny Programs

Abstract:

Generative AI (GenAI) for software production is, if not already in use, at least considered to be a viable option. This poses the question: can GenAI reliably produce software that meets requirements? We sharpen this question and ask further: can GenAI reliably produce software together with ‘proof instructions’ that, together, allow for a fully automated correctness proof?

To investigate this question, we are presenting an empirical study using the Dafny specification, programming, and verification framework. In Dafny, methods can be specified using a contract. Dafny methods are implemented using imperative programming. Dafny offers automatic verification, which can be supported by code annotations, e.g., in the form of loop invariants. Using textbook programming exercises, we systematically experiment with GenAI to establish if it can produce software that Dafny can automatically verify.

This is a test talk for a workshop presentation. The research presented has been conducted by Liam O’Reilly [0000-0002-4894-2158], Filippos Pantekis [0000-0001-7817-6450], Alma Rahat [0000-0002-5023-1371], Markus Roggenbach [0000-0002-3819-2787], and Mukesh Tiwari [0000-0001-5373-9659]