Hostname: page-component-745bb68f8f-s22k5 Total loading time: 0 Render date: 2025-01-12T10:04:10.016Z Has data issue: false hasContentIssue false

PAEAN: Portable and scalable runtime support for parallel Haskell dialects

Published online by Cambridge University Press:  13 July 2016

JOST BERTHOLD*
Affiliation:
Dept. of Computer Science (DIKU), University of Copenhagen, Commonwealth Bank of Australia, Sydney (e-mail: [email protected])
HANS-WOLFGANG LOIDL
Affiliation:
School of Mathematical and Computer Sciences, Heriot-Watt University (e-mail: [email protected])
KEVIN HAMMOND
Affiliation:
School of Computer Science, University of St.Andrews (e-mail: [email protected])
*
*Corresponding author. Reported work performed while at the University of Copenhagen (DIKU).
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Over time, several competing approaches to parallel Haskell programming have emerged. Different approaches support parallelism at various different scales, ranging from small multicores to massively parallel high-performance computing systems. They also provide varying degrees of control, ranging from completely implicit approaches to ones providing full programmer control. Most current designs assume a shared memory model at the programmer, implementation and hardware levels. This is, however, becoming increasingly divorced from the reality at the hardware level. It also imposes significant unwanted runtime overheads in the form of garbage collection synchronisation etc. What is needed is an easy way to abstract over the implementation and hardware levels, while presenting a simple parallelism model to the programmer. The PArallEl shAred Nothing runtime system design aims to provide a portable and high-level shared-nothing implementation platform for parallel Haskell dialects. It abstracts over major issues such as work distribution and data serialisation, consolidating existing, successful designs into a single framework. It also provides an optional virtual shared-memory programming abstraction for (possibly) shared-nothing parallel machines, such as modern multicore/manycore architectures or cluster/cloud computing systems. It builds on, unifies and extends, existing well-developed support for shared-memory parallelism that is provided by the widely used GHC Haskell compiler. This paper summarises the state-of-the-art in shared-nothing parallel Haskell implementations, introduces the PArallEl shAred Nothing abstractions, shows how they can be used to implement three distinct parallel Haskell dialects, and demonstrates that good scalability can be obtained on recent parallel machines.

Type
Articles
Copyright
Copyright © Cambridge University Press 2016 

References

Acar, U. A., Charguéraud, A. & Rainey, M. (2012 January) Efficient primitives for creating and scheduling parallel computations. In Workshop contribution for DAMP'12. Available at http://chargueraud.org/research/2012/damp/damp2012_primitives.pdf. [Retrieved 14/12/2015]Google Scholar
Acar, U. A., Charguéraud, A. & Rainey, M. (2013) Scheduling parallel programs by work stealing with private deques. ACM SIGPLAN Notices, vol. 48. (18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP). ACM, New York, pp. 219228.Google Scholar
Aditya, S., Arvind, Augustsson L., Maessen, J.-W. & Nikhil, R. S. (1995) Semantics of pH: A parallel dialect of Haskell. In Proceedings of the Haskell Workshop, Hudak, P. (ed), La Jolla, USA, pp. 35–49.Google Scholar
Al Zain, A. D., Trinder, P. W., Michaelson, G. J. & Loidl, H.-W. (2008) Evaluating a high-level parallel language (GpH) for computational GRIDs. IEEE Trans. Parallel and Distrib. Syst. 19 (2), 219233.Google Scholar
Aljabri, M., Trinder, P. & Loidl, H.-W. (2012 August) The design of a GUMSMP: A multilevel parallel Haskell implementation. In Proceedings of IFL'12: 24th Symposium on Implementation and Application of Functional Languages. Oxford, UK (draft proceedings), pp. 75–90.Google Scholar
Aljabri, M., Loidl, H.-W. & Trinder, P. W. (2013) The design and implementation of GUMSMP: A multilevel parallel Haskell implementation. In Proceedings of ACM SIGPLAN Symposium on Implementation and Application of Functional Languages (IFL'13). ACM, New York, pp. 37–48.Google Scholar
Aljabri, M., Loidl, H.-W. & Trinder, P. (2014) Balancing shared and distributed heaps on NUMA architectures. In Proceedings of TFP'14: Symposium on Trends in Functional Programming. LNCS 8843. Springer, Berlin/Heidelberg, pp. 1–17.Google Scholar
Aljabri, M. S. (2015 October) GUMSMP: A Scalable Parallel Haskell Implementation. Ph.D. thesis, School of Computing Science, University of Glasgow.Google Scholar
Appel, A. W. (1989) Simple generational garbage collection and fast allocation. Software: Pract. Exp. 19 (2), 171183.Google Scholar
Aswad, M. K. (2012 April) Architecture Aware Parallel Programming in Glasgow Parallel Haskell. Ph.D. thesis, School of Mathematical and Computer Sciences, Heriot-Watt University.Google Scholar
Berthold, J. (2008 June) Explicit and Implicit Parallel Functional Programming: Concepts and Implementation. PhD Thesis, Philipps-Universität Marburg, Germany.Google Scholar
Berthold, J. (2011) Orthogonal serialisation for Haskell. In IFL'10: Implementation and Application of Functional Languages, Hage, J. & Morazan, M. (eds), LNCS, vol. 6647. Springer, Berlin/Heidelberg, pp. 3853.Google Scholar
Berthold, J. & Loogen, R. (2005) Skeletons for recursively unfolding process topologies. In Proceedings of ParCo 2005, Joubert, G. R., Nagel, W. E., Peters, F. J., Plata, O. G., Tirado, P. & Zapata, E. L. (eds), John von Neumann Institute of Computing Series, vol. 33. Central Institute for Applied Mathematics, Jülich, Germany, pp. 835–843.Google Scholar
Berthold, J. & Loogen, R. (2007) Parallel coordination made explicit in a functional setting. In IFL'06: Implementation and Application of Functional Languages, Horváth, Z. & Zsók, V. (eds), LNCS, vol. 4449. Springer, Berlin/Heidelberg, pp. 7390.Google Scholar
Berthold, J., Klusik, U., Loogen, R., Priebe, S. & Weskamp, N. (2003) High-level Process Control in Eden. In EuroPar 2003 – Parallel Processing, Kosch, H., Böszörményi, L. & Hellwagner, H. (eds), LNCS, vol. 2790. Springer, Berlin/Heidelberg, pp. 732741.Google Scholar
Berthold, J., Loidl, H.-W. & Al Zain, A. D. (2008) Scheduling light-weight parallelism in ArTCoP. In PADL'08 — Practical Aspects of Declarative Languages, Hudak, P. & Warren, D. (eds), LNCS, vol. 4902. Springer, Berlin/Heidelberg, pp. 214229.Google Scholar
Bevan, D. I. (1987) Distributed garbage collection using reference counting. In PARLE'87 — Parallel Architectures and Languages Europe, LNCS, vol. 259. Springer, Berlin/Heidelberg, pp. 176187.Google Scholar
Breitinger, S., Klusik, U., Loogen, R., Ortega Mallén, Y. & Peña Marí, R. (1997) DREAM — the DistRibuted Eden abstract machine. In IFL'97: 9th International Workshop on the Implementation of Functional Languages, LNCS, vol. 1467. Springer, Berlin/Heidelberg, pp. 250269.Google Scholar
Breitinger, S., Klusik, U. & Loogen, R. (1998) From (Sequential) Haskell to (Parallel) Eden: An implementation point of view. In Proceedings of the 10th International Symposium on Principles of Declarative Programming, LNCS, vol. 1490. Springer, Berlin/Heidelberg, pp. 318–334.Google Scholar
Cejtin, H., Jagannathan, S. & Kelsey, R. (1995) Higher-order distributed objects. ACM Trans. Program. Lang. Syst. (TOPLAS) 17 (5), 704739.Google Scholar
Chakravarty, M. M. T., Leshchinskiy, R., Peyton, Jones S., Keller, G. & Marlow, S. (2007) Data parallel Haskell: A status report. In DAMP'07, Workshop on Declarative Aspects of Multicore Programming. ACM, New York, pp. 1018.Google Scholar
Cole, M. I. (1989) Algorithmic Skeletons: Structured Management of Parallel Computation, Research Monographs in Parallel and Distributed Computing. Cambridge(MA), USA: MIT Press.Google Scholar
Dieterle, M., Horstmeyer, T. & Loogen, R. (2010) Skeleton composition using remote data. In PADL 2010: Practical Aspects of Declarative Languages, LNCS, vol. 5937. Springer, Berlin/Heidelberg, pp. 7387.Google Scholar
Dieterle, M., Horstmeyer, T., Berthold, J. & Loogen, R. (2013) Iterating Skeletons – structured parallelism by composition. In IFL'12: 24th Symposium on Implementation and Application of Functional Languages, Hinze, R. & Gill, A. (eds), LNCS, vol. 8241. Springer, Berlin/Heidelberg, pp. 1836.CrossRefGoogle Scholar
Du Bois, A. R., Loidl, H.-W. & Trinder, P. W. (2002) Thread migration in a parallel graph reducer. In IFL'02: International Workshop on the Implementation of Functional Languages, LNCS, vol. 2670. Springer, Berlin/Heidelberg, pp. 199214.Google Scholar
Epstein, J., Black, A. P. & Peyton-Jones, S. (2011) Towards Haskell in the cloud. In Proceedings of the 4th ACM Symposium on Haskell (Haskell'11). ACM, New York, pp. 118–129.Google Scholar
Fluet, M., Rainey, M., Reppy, J., Shaw, A. & Xiao, Y. (2007) Manticore: A heterogeneous parallel language. In DAMP 2007: Workshop on Declarative Aspects of Multicore Programming. ACM, New York, pp. 3744.Google Scholar
Fluet, M., Rainey, M., Reppy, J. & Shaw, A. (2010) Implicitly threaded parallelism in Manticore. J. Funct. Programm. 20 (5–6), 537576.Google Scholar
Foltzer, A., Kulkarni, A., Swords, R., Sasidharan, S., Jiang, E. & Newton, R. (2012) A Meta-scheduler for the Par-monad: Composable scheduling for the heterogeneous cloud. In ICFP'12: 17th ACM SIGPLAN International Conference on Functional Programming. ACM, New York, pp. 235246.Google Scholar
Geist, Al. (2011) Parallel virtual machine. In Encyclopedia of Parallel Computing, Padua, D. (ed), Heidelberg/Berlin: Springer, pp. 16471651.Google Scholar
Gray, J. (1985) Why Do Computers Stop and What Can Be Done About It? Tandem Computers, Technical Report 85.7.Google Scholar
Hallgren, T., Jones, M. P., Leslie, R. & Tolmach, A. (2005) A principled approach to operating system construction in Haskell. In ICFP'05: 10th ACM SIGPLAN International Conference on Functional Programming, Danvy, O. & Pierce, B. C. (eds), ACM, New York, pp. 116128.Google Scholar
Hammond, K. (1993 September) Getting a GRIP. IFL'93: International Workshop on the Parallel Implementation of Functional Languages. Nijmegen, the Netherlands (draft proceedings).Google Scholar
Hammond, K. (2011) Glasgow parallel Haskell (GpH). In Encyclopedia of Parallel Computing, Padua, D. (ed), Heidelberg/Berlin: Springer, pp. 768779.Google Scholar
Hammond, K. & Peyton, Jones S. L. (1990) Some early experiments on the GRIP parallel reducer. In IFL'90: International Workshop on the Parallel Implementation of Functional Languages. TR 90-16, Department of Informatics, University of Nijmegen, pp. 5172.Google Scholar
Hammond, K. & Peyton, Jones S. L. (1992 September) Profiling scheduling strategies on the GRIP multiprocessor. In IFL'92: International. Workshop on the Parallel Implementation of Functional Languages. vol. 92–19, Aachener Informatik-Berichte, pp. 7398.Google Scholar
Klusik, U., Ortega-Mallén, Y. & Peña, Marí R. (1999) Implementing Eden – or: Dreams become reality. In IFL'98: 10th International Workshop on the Implementation of Functional Languages, LNCS, vol. 1595. Springer, Bertln/Heidelberg, pp. 103119.Google Scholar
Lameter, C. (2013) NUMA (non-uniform memory access): An overview. Acm Queue 11 (7), 40:4040:51.Google Scholar
Li, P., Marlow, S., Peyton Jones, S. & Tolmach, A. (2007) Lightweight concurrency primitives for GHC. In ACM SIGPLAN Workshop on Haskell (Haskell'07). ACM, New York, pp. 107118.Google Scholar
Loidl, H.-W. (1998 (March) Granularity in Large-Scale Parallel Functional Programming. PhD Thesis, Dept. of Computing Science, Univ. of Glasgow.Google Scholar
Loidl, H.-W. (2001) Load balancing in a parallel graph reducer. In SFP'01 — Scottish Functional Programming Workshop, Hammond, K. & Curtis, S. (eds), Trends in Functional Programming, vol. 3. Intellect, Bristol, pp. 6374.Google Scholar
Loidl, H.-W. & Hammond, K. (1994 September) GRAPH for PVM: Graph reduction for distributed hardware. In IFL'94: International Workshop on the Implementation of Functional Languages. Norwich, England (draft proceedings).Google Scholar
Loidl, H.-W. & Hammond, K. (1996) Making a packet: Cost-effective communication for a parallel graph reducer. In IFL'96: International Workshop on the Implementation of Functional Languages, LNCS, vol. 1268. Springer, Bertln/Heidelberg, pp. 184199.Google Scholar
Loogen, R., Ortega-Mallén, Y. & Peña-Marí, R. (2005) Parallel functional programming in Eden. J. Funct. Programm. 15 (3), 431475.Google Scholar
Maier, P. & Trinder, P. (2012) Implementing a high-level distributed-memory parallel Haskell in Haskell. In IFL'12: 24th Symposium on Implementation and Application of Functional Languages, Gill, A. & Hage, J. (eds) LNCS 7257. Springer, Bertln/Heidelberg, pp. 3550.Google Scholar
Maier, P., Livesey, D., Loidl, H.-W. & Trinder, P. (2014a) High-performance computer algebra — a parallel Hecke algebra case study. In EuroPar'14: Parallel Processing, LNCS, vol. 8632. Springer, Bertln/Heidelberg, pp. 415426.Google Scholar
Maier, P., Stewart, R. & Trinder, P. (2014b) The HdpH DSLs for scalable reliable computation. In Proceedings of the 2014 ACM SIGPLAN Symposium on Haskell (Haskell'14). ACM, New York, pp. 65–76.Google Scholar
Marlow, S. & Peyton, Jones S. (2011) Multicore garbage collection with local heaps. In ISMM '11: Proceedings of the 10th International Symposium on Memory Management. ACM, New York, pp. 21–32.Google Scholar
Marlow, S., Peyton, Jones S. & Singh, S. (2009) Runtime support for multicore Haskell. In ICFP'09: 14th ACM SIGPLAN International Conference on Functional Programming. ACM, New York, pp. 6578.Google Scholar
Marlow, S., Maier, P., Loidl, H.-W., Aswad, M. K. & Trinder, P. (2010) Seq no More: Better Strategies for Parallel Haskell. In Proceedings of the Third ACM Haskell Symposium (Haskell'10). ACM, New York, pp. 91–102.Google Scholar
Marlow, S., Newton, R. & Peyton, Jones S. (2011) A monad for deterministic parallelism. In Proceedings of the 4th ACM Haskell Symposium (Haskell'11). ACM, New York, pp. 71–82.Google Scholar
Mohr, E., Kranz, D. A. & Halstead, R. H. Jr. (1991) Lazy task creation: A technique for increasing the granularity of parallel programs. IEEE Trans. Parallel Distrib. Syst. 2 (3), 264280.Google Scholar
MPI Forum (ed). (2012) MPI: A Message-Passing Interface Standard, Version 3.0. High Performance Computing Center Stuttgart (HLRS). Available at: http://www.mpi-forum.org/docs/. [Retrieved 14/12/2015]Google Scholar
Peyton, Jones S., Clack, C., Salkild, J. & Hardie, M. (1987) GRIP — a high-performance architecture for parallel graph reduction. In Intl. Conf. on Functional Programming Languages and Computer Architecture (FPCA'87), LNCS, vol. 274. Springer, Bertln/Heidelberg, pp. 98112.Google Scholar
Reppy, J., Russo, C. & Xiao, Y. (2009) Parallel concurrent ML. In ICFP'09: 14th ACM SIGPLAN International Conference on Functional Programming. ACM, New York, pp. 257268.Google Scholar
Sivaramakrishnan, K. C., Harris, T., Marlow, S. & Peyton, Jones S. (2013) Composable Scheduler Activations for Haskell. Technical Report, Microsoft Research, Cambridge.Google Scholar
Stewart, R., Trinder, P. & Maier, P. (2012) Supervised workpools for reliable massively parallel computing. In TFP12: International Symposium on Trends in Functional Programming, LNCS, vol. 7829. Springer, Bertln/Heidelberg, pp. 247262.Google Scholar
Totoo, P. & Loidl, H.-W. (2014) Parallel Haskell implementations of the N-body problem. Concurrency Comput.: Pract. Exp. 26 (4), 9871019.Google Scholar
Trinder, P. W., Hammond, K., Loidl, H.-W. & Peyton, Jones S. (1998) Algorithm + Strategy = Parallelism. J. Funct. Programm. 8 (1), 2360.Google Scholar
Trinder, P. W., Hammond, K., Mattson, J. S. Jr., Partridge, A. S. & Peyton, Jones S. L. (1995) GUM: A portable parallel implementation of Haskell. In IFL'95: 7th International Workshop on the Implementation of Functional Languages. Båstad, Sweden (draft proceedings).Google Scholar
Submit a response

Discussions

No Discussions have been published for this article.