Hostname: page-component-586b7cd67f-2plfb Total loading time: 0 Render date: 2024-11-24T19:07:28.187Z Has data issue: false hasContentIssue false

A Case for Stale Synchronous Distributed Model for Declarative Recursive Computation

Published online by Cambridge University Press:  20 September 2019

ARIYAM DAS
Affiliation:
Department of Computer Science, University of California, Los Angeles, USA (e-mails: [email protected], [email protected])
CARLO ZANIOLO
Affiliation:
Department of Computer Science, University of California, Los Angeles, USA (e-mails: [email protected], [email protected])

Abstract

A large class of traditional graph and data mining algorithms can be concisely expressed in Datalog, and other Logic-based languages, once aggregates are allowed in recursion. In fact, for most BigData algorithms, the difficult semantic issues raised by the use of non-monotonic aggregates in recursion are solved by Pre-Mappability ( ${\cal P}$ reM), a property that assures that for a program with aggregates in recursion there is an equivalent aggregate-stratified program. In this paper we show that, by bringing together the formal abstract semantics of stratified programs with the efficient operational one of unstratified programs, $\[{\cal P}\]$ reM can also facilitate and improve their parallel execution. We prove that $\[{\cal P}\]$ reM-optimized lock-free and decomposable parallel semi-naive evaluations produce the same results as the single executor programs. Therefore, $\[{\cal P}\]$ reM can be assimilated into the data-parallel computation plans of different distributed systems, irrespective of whether these follow bulk synchronous parallel (BSP) or asynchronous computing models. In addition, we show that non-linear recursive queries can be evaluated using a hybrid stale synchronous parallel (SSP) model on distributed environments. After providing a formal correctness proof for the recursive query evaluation with $\[{\cal P}\]$ reM under this relaxed synchronization model, we present experimental evidence of its benefits.

Type
Original Article
Copyright
© Cambridge University Press 2019 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Ameloot, T. J. 2014. Declarative networking: Recent theoretical work on coordination, correctness, and declarative semantics. SIGMOD Rec. 43, 2, 516.CrossRefGoogle Scholar
Ameloot, T. J., Geck, G., Ketsman, B., Neven, F., and Schwentick, T. 2017. Parallel-correctness and transferability for conjunctive queries. J. ACM 64, 5, 36:136:38.Google Scholar
Ameloot, T. J., Ketsman, B., Neven, F., and Zinn, D. 2015. Weaker forms of monotonicity for declarative networking: A more fine-grained answer to the calm-conjecture. ACM Trans. Database Syst. 40, 4, 21:121:45.Google Scholar
Ameloot, T. J., Neven, F., and Van Den Bussche, J. 2013. Relational transducers for declarative networking. J. ACM 60, 2, 15:115:38.Google Scholar
Ananthanarayanan, G., Kandula, S., Greenberg, A., Stoica, I., Lu, Y., Saha, B., and Harris, E. 2010. Reining in the outliers in map-reduce clusters using mantri. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation. OSDI’10. 265278.Google Scholar
Aref, M., ten Cate, B., Green, T. J., Kimelfeld, B., Olteanu, D., Pasalic, E., Veldhuizen, T. L., and Washburn, G. 2015. Design and implementation of the logicblox system. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. 13711382.Google Scholar
Beckman, P., Iskra, K., Yoshii, K., and Coghlan, S. 2006. The influence of operating systems on the performance of collective operations at extreme scale. In 2006 IEEE International Conference on Cluster Computing. 112.Google Scholar
Cipar, J., Ho, Q., Kim, J. K., Lee, S., Ganger, G. R., Gibson, G., Keeton, K., and Xing, E. 2013. Solving the straggler problem with bounded staleness. In Proceedings of the 14th USENIX Conference on Hot Topics in Operating Systems. HotOS’13. 2222.Google Scholar
Condie, T., Das, A., Interlandi, M., Shkapsky, A., Yang, M., and Zaniolo, C. 2018. Scaling-up reasoning and advanced analytics on bigdata. TPLP 18 , 5-6, 806845.Google Scholar
Cui, H., Cipar, J., Ho, Q., Kim, J. K., Lee, S., Kumar, A., Wei, J., Dai, W., Ganger, G. R., Gibbons, P. B., Gibson, G. A., and Xing, E. P. 2014. Exploiting bounded staleness to speed up big data analytics. In USENIX ATC. 3748.Google Scholar
Das, A., Gandhi, S. M., and Zaniolo, C. 2018. Astro: A datalog system for advanced stream reasoning. In CIKM’18. 18631866.Google Scholar
Das, A. and Zaniolo, C. 2019. A case for stale synchronous distributed model for declarative recursive computation. CoRR abs/1907.10278.Google Scholar
Ganguly, S., Silberschatz, A., and Tsur, S. 1992. Parallel bottom-up processing of datalog queries. J. Log. Program. 14, 1-2, 101126.Google Scholar
Gu, J., Watanabe, Y., Mazza, W., Shkapsky, A., Yang, M., Ding, L., and Zaniolo, C. 2019. Rasql: Greater power and performance for big data analytics with recursive-aggregate-sql on spark. In SIGMOD’19.Google Scholar
Ho, Q., Cipar, J., Cui, H., Kim, J. K., Lee, S., Gibbons, P. B., Gibson, G. A., Ganger, G. R., and Xing, E. P. 2013. More effective distributed ml via a stale synchronous parallel parameter server. In NIPS. 12231231.Google Scholar
Interlandi, M. and Tanca, L. 2018. A datalog-based computational model for coordination-free, data-parallel systems. Theory and Practice of Logic Programming 18, 5-6, 874927.Google Scholar
Krevat, E., Tucek, J., and Ganger, G. R. 2011. Disks are like snowflakes: No two are alike. In Proceedings of the 13th USENIX Conference on Hot Topics in Operating Systems. HotOS’13. 1414.Google Scholar
Lee, S., Kim, J. K., Zheng, X., Ho, Q., Gibson, G. A., and Xing, E. P. 2014. On model parallelization and scheduling strategies for distributed machine learning. In NIPS. 28342842.Google Scholar
Low, Y., Bickson, D., Gonzalez, J., Guestrin, C., Kyrola, A., and Hellerstein, J. M. 2012. Distributed graphlab: A framework for machine learning and data mining in the cloud. Proc. VLDB Endow. 5, 8, 716727.Google Scholar
Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, I., Leiser, N., and Czajkowski, G. 2010. Pregel: A system for large-scale graph processing. In SIGMOD’10. 135146.Google Scholar
Mazuran, M., Serra, E., and Zaniolo, C. 2013. Extending the power of datalog recursion. The VLDB Journal 22, 4, 471493.Google Scholar
Seo, J., Park, J., Shin, J., and Lam, M. S. 2013. Distributed socialite: A datalog-based language for large-scale graph analysis. Proc. VLDB Endow. 6, 14, 19061917.Google Scholar
Shkapsky, A., Yang, M., Interlandi, M., Chiu, H., Condie, T., and Zaniolo, C. 2016. Big data analytics with datalog queries on spark. In SIGMOD. ACM, New York, NY, USA, 11351149.Google Scholar
Wang, J., Balazinska, M., and Halperin, D. 2015. Asynchronous and fault-tolerant recursive datalog evaluation in shared-nothing engines. Proc. VLDB Endow. 8, 12, 15421553.Google Scholar
Yan, D., Cheng, J., Lu, Y., and Ng, W. 2015. Effective techniques for message reduction and load balancing in distributed graph computation. In WWW. 13071317.Google Scholar
Yang, M., Shkapsky, A., and Zaniolo, C. 2017. Scaling up the performance of more powerful datalog systems on multicore machines. VLDB J. 26, 2, 229248.Google Scholar
Zaniolo, C., Yang, M., Das, A., and Interlandi, M. 2016. The magic of pushing extrema into recursion: Simple, powerful datalog programs. In AMW.Google Scholar
Zaniolo, C., Yang, M., Interlandi, M., Das, A., Shkapsky, A., and Condie, T. 2017. Fixpoint semantics and optimization of recursive Datalog programs with aggregates. TPLP 17, 5-6, 10481065.Google Scholar
Zaniolo, C., Yang, M., Interlandi, M., Das, A., Shkapsky, A., and Condie, T. 2018. Declarative bigdata algorithms via aggregates and relational database dependencies. In AMW.Google Scholar
Supplementary material: PDF

Das and Zaniolo supplementary material

Appendix

Download Das and Zaniolo supplementary material(PDF)
PDF 42.8 KB