Enabling Bitwise Reproducibility for the Unstructured Computational Motif

In this paper we identify the causes of numerical non-reproducibility in the unstructured mesh computational motif, a class of algorithms commonly used for the solution of PDEs. We introduce a number of parallel and distributed algorithms to address nondeterminism in the order of floating-point comp...

Full description

Bibliographic Details
Main Authors: Siklósi Bálint
Mudalige Gihan R.
Reguly Istvan Z.
Format: Article
Published: 2024
Series:APPLIED SCIENCES-BASEL 14 No. 2
Subjects:
mtmt:34617187
Online Access:https://publikacio.ppke.hu/1848
Description
Summary:In this paper we identify the causes of numerical non-reproducibility in the unstructured mesh computational motif, a class of algorithms commonly used for the solution of PDEs. We introduce a number of parallel and distributed algorithms to address nondeterminism in the order of floating-point computations, in particular, a new graph coloring scheme that produces identical coloring results regardless of how many parts the graph is partitioned to. We implement these in the OP2 domain specific language (DSL) and show how it can be automatically deployed to any application that uses OP2 without user intervention. We contrast differences in results without reproducibility and then demonstrate how bitwise reproducibility can be gained using our methods on a variety of applications including a production CFD application used at Rolls-Royce. We evaluate the performance and overheads of enforcing bitwise reproducibility on a cluster of CPUs and GPUs.
ISSN:2076-3417