A Case for Stale Synchronous Distributed Model for Declarative Recursive Computation

ARIYAM DAS; CARLO ZANIOLO

doi:10.1017/S1471068419000358

A Case for Stale Synchronous Distributed Model for Declarative Recursive Computation

Published online by Cambridge University Press: 20 September 2019

ARIYAM DAS

and

CARLO ZANIOLO

Show author details

ARIYAM DAS: Affiliation:
Department of Computer Science, University of California, Los Angeles, USA (e-mails: [email protected], [email protected])
CARLO ZANIOLO: Affiliation:
Department of Computer Science, University of California, Los Angeles, USA (e-mails: [email protected], [email protected])

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

A large class of traditional graph and data mining algorithms can be concisely expressed in Datalog, and other Logic-based languages, once aggregates are allowed in recursion. In fact, for most BigData algorithms, the difficult semantic issues raised by the use of non-monotonic aggregates in recursion are solved by Pre-Mappability ( ${\cal P}$ reM), a property that assures that for a program with aggregates in recursion there is an equivalent aggregate-stratified program. In this paper we show that, by bringing together the formal abstract semantics of stratified programs with the efficient operational one of unstratified programs, $\[{\cal P}\]$ reM can also facilitate and improve their parallel execution. We prove that $\[{\cal P}\]$ reM-optimized lock-free and decomposable parallel semi-naive evaluations produce the same results as the single executor programs. Therefore, $\[{\cal P}\]$ reM can be assimilated into the data-parallel computation plans of different distributed systems, irrespective of whether these follow bulk synchronous parallel (BSP) or asynchronous computing models. In addition, we show that non-linear recursive queries can be evaluated using a hybrid stale synchronous parallel (SSP) model on distributed environments. After providing a formal correctness proof for the recursive query evaluation with $\[{\cal P}\]$ reM under this relaxed synchronization model, we present experimental evidence of its benefits.

Keywords

Datalog Deductive Databases Recursive Query Stale Synchronous Parallel Model Bulk Synchronous Parallel Model Parallel and Distributed Computing

Type: Original Article
Information: Theory and Practice of Logic Programming , Volume 19 , Special Issue 5-6: 35th International Conference on Logic Programming , September 2019 , pp. 1056 - 1072

DOI: https://doi.org/10.1017/S1471068419000358 [Opens in a new window]
Copyright: © Cambridge University Press 2019

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Ameloot, T. J. 2014. Declarative networking: Recent theoretical work on coordination, correctness, and declarative semantics. SIGMOD Rec. 43, 2, 5–16.CrossRef Google Scholar

Ameloot, T. J., Geck, G., Ketsman, B., Neven, F., and Schwentick, T. 2017. Parallel-correctness and transferability for conjunctive queries. J. ACM 64, 5, 36:1–36:38.Google Scholar

Ameloot, T. J., Ketsman, B., Neven, F., and Zinn, D. 2015. Weaker forms of monotonicity for declarative networking: A more fine-grained answer to the calm-conjecture. ACM Trans. Database Syst. 40, 4, 21:1–21:45.Google Scholar

Ameloot, T. J., Neven, F., and Van Den Bussche, J. 2013. Relational transducers for declarative networking. J. ACM 60, 2, 15:1–15:38.Google Scholar

Ananthanarayanan, G., Kandula, S., Greenberg, A., Stoica, I., Lu, Y., Saha, B., and Harris, E. 2010. Reining in the outliers in map-reduce clusters using mantri. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation. OSDI’10. 265–278.Google Scholar

Aref, M., ten Cate, B., Green, T. J., Kimelfeld, B., Olteanu, D., Pasalic, E., Veldhuizen, T. L., and Washburn, G. 2015. Design and implementation of the logicblox system. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. 1371–1382.Google Scholar

Beckman, P., Iskra, K., Yoshii, K., and Coghlan, S. 2006. The influence of operating systems on the performance of collective operations at extreme scale. In 2006 IEEE International Conference on Cluster Computing. 1–12.Google Scholar

Cipar, J., Ho, Q., Kim, J. K., Lee, S., Ganger, G. R., Gibson, G., Keeton, K., and Xing, E. 2013. Solving the straggler problem with bounded staleness. In Proceedings of the 14th USENIX Conference on Hot Topics in Operating Systems. HotOS’13. 22–22.Google Scholar

Condie, T., Das, A., Interlandi, M., Shkapsky, A., Yang, M., and Zaniolo, C. 2018. Scaling-up reasoning and advanced analytics on bigdata. TPLP 18 , 5-6, 806–845.Google Scholar

Cui, H., Cipar, J., Ho, Q., Kim, J. K., Lee, S., Kumar, A., Wei, J., Dai, W., Ganger, G. R., Gibbons, P. B., Gibson, G. A., and Xing, E. P. 2014. Exploiting bounded staleness to speed up big data analytics. In USENIX ATC. 37–48.Google Scholar

Das, A., Gandhi, S. M., and Zaniolo, C. 2018. Astro: A datalog system for advanced stream reasoning. In CIKM’18. 1863–1866.Google Scholar

Das, A. and Zaniolo, C. 2019. A case for stale synchronous distributed model for declarative recursive computation. CoRR abs/1907.10278.Google Scholar

Ganguly, S., Silberschatz, A., and Tsur, S. 1992. Parallel bottom-up processing of datalog queries. J. Log. Program. 14, 1-2, 101–126.Google Scholar

Gu, J., Watanabe, Y., Mazza, W., Shkapsky, A., Yang, M., Ding, L., and Zaniolo, C. 2019. Rasql: Greater power and performance for big data analytics with recursive-aggregate-sql on spark. In SIGMOD’19.Google Scholar

Ho, Q., Cipar, J., Cui, H., Kim, J. K., Lee, S., Gibbons, P. B., Gibson, G. A., Ganger, G. R., and Xing, E. P. 2013. More effective distributed ml via a stale synchronous parallel parameter server. In NIPS. 1223–1231.Google Scholar

Interlandi, M. and Tanca, L. 2018. A datalog-based computational model for coordination-free, data-parallel systems. Theory and Practice of Logic Programming 18, 5-6, 874–927.Google Scholar

Krevat, E., Tucek, J., and Ganger, G. R. 2011. Disks are like snowflakes: No two are alike. In Proceedings of the 13th USENIX Conference on Hot Topics in Operating Systems. HotOS’13. 14–14.Google Scholar

Lee, S., Kim, J. K., Zheng, X., Ho, Q., Gibson, G. A., and Xing, E. P. 2014. On model parallelization and scheduling strategies for distributed machine learning. In NIPS. 2834–2842.Google Scholar

Low, Y., Bickson, D., Gonzalez, J., Guestrin, C., Kyrola, A., and Hellerstein, J. M. 2012. Distributed graphlab: A framework for machine learning and data mining in the cloud. Proc. VLDB Endow. 5, 8, 716–727.Google Scholar

Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, I., Leiser, N., and Czajkowski, G. 2010. Pregel: A system for large-scale graph processing. In SIGMOD’10. 135–146.Google Scholar

Mazuran, M., Serra, E., and Zaniolo, C. 2013. Extending the power of datalog recursion. The VLDB Journal 22, 4, 471–493.Google Scholar

Seo, J., Park, J., Shin, J., and Lam, M. S. 2013. Distributed socialite: A datalog-based language for large-scale graph analysis. Proc. VLDB Endow. 6, 14, 1906–1917.Google Scholar

Shkapsky, A., Yang, M., Interlandi, M., Chiu, H., Condie, T., and Zaniolo, C. 2016. Big data analytics with datalog queries on spark. In SIGMOD. ACM, New York, NY, USA, 1135–1149.Google Scholar

Wang, J., Balazinska, M., and Halperin, D. 2015. Asynchronous and fault-tolerant recursive datalog evaluation in shared-nothing engines. Proc. VLDB Endow. 8, 12, 1542–1553.Google Scholar

Yan, D., Cheng, J., Lu, Y., and Ng, W. 2015. Effective techniques for message reduction and load balancing in distributed graph computation. In WWW. 1307–1317.Google Scholar

Yang, M., Shkapsky, A., and Zaniolo, C. 2017. Scaling up the performance of more powerful datalog systems on multicore machines. VLDB J. 26, 2, 229–248.Google Scholar

Zaniolo, C., Yang, M., Das, A., and Interlandi, M. 2016. The magic of pushing extrema into recursion: Simple, powerful datalog programs. In AMW.Google Scholar

Zaniolo, C., Yang, M., Interlandi, M., Das, A., Shkapsky, A., and Condie, T. 2017. Fixpoint semantics and optimization of recursive Datalog programs with aggregates. TPLP 17, 5-6, 1048–1065.Google Scholar

Zaniolo, C., Yang, M., Interlandi, M., Das, A., Shkapsky, A., and Condie, T. 2018. Declarative bigdata algorithms via aggregates and relational database dependencies. In AMW.Google Scholar

Das and Zaniolo supplementary material

Appendix

PDF 42.8 KB

Article contents

A Case for Stale Synchronous Distributed Model for Declarative Recursive Computation

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

References

Das and Zaniolo supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests