Hostname: page-component-cd9895bd7-dzt6s Total loading time: 0 Render date: 2024-12-25T05:57:42.216Z Has data issue: false hasContentIssue false

A datalog-based computational model for coordination-free, data-parallel systems

Published online by Cambridge University Press:  05 September 2018

MATTEO INTERLANDI
Affiliation:
Microsoft (e-mail: [email protected])
LETIZIA TANCA
Affiliation:
Politecnico di Milano (e-mail: [email protected])
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Cloud computing refers to maximizing efficiency by sharing computational and storage resources, while data-parallel systems exploit the resources available in the cloud to perform parallel transformations over large amounts of data. In the same line, considerable emphasis has been recently given to two apparently disjoint research topics: data-parallel, and eventually consistent, distributed systems. Declarative networking has been recently proposed to ease the task of programming in the cloud, by allowing the programmer to express only the desired result and leave the implementation details to the responsibility of the run-time system. In this context, we deem it appropriate to propose a study on a logic-programming-based computational model for eventually consistent, data-parallel systems, the keystone of which is provided by the recent finding that the class of programs that can be computed in an eventually consistent, coordination-free way is that of monotonic programs. This principle is called Consistency and Logical Monotonicity (CALM) and has been proven by Ameloot et al. for distributed, asynchronous settings. We advocate that CALM should be employed as a basic theoretical tool also for data-parallel systems, wherein computation usually proceeds synchronously in rounds and where communication is assumed to be reliable. We deem this problem relevant and interesting, especially for what concerns parallel dataflow optimizations. Nowadays, we are in fact witnessing an increasing concern about understanding which properties distinguish synchronous from asynchronous parallel processing, and when the latter can replace the former. It is general opinion that coordination-freedom can be seen as a major discriminant factor. In this work, we make the case that the current form of CALM does not hold in general for data-parallel systems, and show how, using novel techniques, the satisfiability of the CALM principle can still be obtained although just for the subclass of programs called connected monotonic queries. We complete the study with considerations on the relationships between our model and the one employed by Ameloot et al., showing that our techniques subsume the latter when the synchronization constraints imposed on the system are loosened.

Type
Original Article
Copyright
Copyright © Cambridge University Press 2018 

Footnotes

*

Work partially done while at University of California, Los Angeles.

References

Abiteboul, S., Hull, R. and Vianu, V. 1995. Foundations of Databases. Addison-Wesley, Boston, MA, USA.Google Scholar
Abiteboul, S., Vianu, V., Fordham, B. and Yesha, Y. 2000. Relational transducers for electronic commerce. Journal of Computer and System Sciences 61, 2, 236269.Google Scholar
Afrati, F., Cosmadakis, S. S. and Yannakakis, M. 1995. On datalog vs. polynomial time. Journal of Computer and System Sciences 51, 2, 177196.Google Scholar
Afrati, F. N., Borkar, V., Carey, M., Polyzotis, N. and Ullman, J. D. 2011. Map-reduce extensions and recursive queries. In Proc. of the 14th International Conference on Extending Database Technology, EDBT/ICDT '11. ACM, New York, NY, USA, 1–8.Google Scholar
Afrati, F. N. and Ullman, J. D. 2010. Optimizing joins in a map-reduce environment. In Proc. of the 13th International Conference on Extending Database Technology, EDBT '10. ACM, New York, NY, USA, 99–110.Google Scholar
Alexandrov, A., Bergmann, R., Ewen, S., Freytag, J.-C., Hueske, F., Heise, A., Kao, O., Leich, M., Leser, U., Markl, V., Naumann, F., Peters, M., Rheinländer, A., Sax, M., Schelter, S., Höger, M., Tzoumas, K. and Warneke, D. 2014. The stratosphere platform for big data analytics. International Journal on Very Large Data Bases, 23, 939964.Google Scholar
Alvaro, P., Conway, N., Hellerstein, J. and Marczak, W. R. 2011. Consistency analysis in bloom: A calm and collected approach. In Proc. Conference on Innovative Data Systems Research CIDR, 249–260.Google Scholar
Alvaro, P., Conway, N., Hellerstein, J. M. and Maier, D. 2014. Blazes: Coordination analysis for distributed programs. In To appear in Proc. of the IEEE 30th International Conference on Data Engineering, ICDE '14. IEEE Computer Society, Washington, DC, USA.Google Scholar
Ameloot, T. J. 2014. Declarative networking: Recent theoretical work on coordination, correctness, and declarative semantics. SIGMOD Record 43, 2, 516.Google Scholar
Ameloot, T. J., Geck, G., Ketsman, B., Neven, F. and Schwentick, T. 2015. Parallel-correctness and transferability for conjunctive queries. In Proc. of ACM Symposium on Principles of Database Systems.Google Scholar
Ameloot, T. J., Ketsman, B., Neven, F. and Zinn, D. 2014. Weaker forms of monotonicity for declarative networking: A more fine-grained answer to the calm-conjecture. In Proc. of ACM Symposium on Principles of Database Systems. ACM, 64–75.Google Scholar
Ameloot, T. J., Ketsman, B., Neven, F. and Zinn, D. 2015. Datalog queries distributing over components. In Proc. of International Conference on Database Theory ICDT, 308–323.Google Scholar
Ameloot, T. J., Neven, F. and Van Den Bussche, J. 2013. Relational transducers for declarative networking. Journal of the ACM 60, 2, 15:115:38.Google Scholar
Armbrust, M., Xin, R. S., Lian, C., Huai, Y., Liu, D., Bradley, J. K., Meng, X., Kaftan, T., Franklin, M. J., Ghodsi, A. and Zaharia, M. 2015. Spark sql: Relational data processing in spark. In Proc. of the Special Interest Group on Management of Data SIGMOD, 1383–1394.Google Scholar
Babaoğlu, O. and Marzullo, K. 1993. Consistent global states of distributed systems: fundamental concepts and mechanisms. In Distributed systems, 2nd ed. ACM Press/Addison-Wesley Publishing Co., New York, NY, USA, 5596.Google Scholar
Beeri, C., Naqvi, S. A., Ramakrishnan, R., Shmueli, O. and Tsur, S. 1987. Sets and negation in a logic data base language (ldl1). In Proc. of ACM Symposium on Principles of Database Systems. ACM, 21–37.Google Scholar
Ben-Zvi, I. and Moses, Y. 2010. Beyond lamport's happened-before: On the role of time bounds in synchronous systems. In DISC., Vol. 6343. Lynch, N. A. and Shvartsman, A. A., Eds. Lecture Notes in Computer Science. Springer, 421436.Google Scholar
Ben-Zvi, I. and Moses, Y. 2011. On interactive knowledge with bounded communication. Journal of Applied Non-Classical Logics 21, 3–4, 323354.Google Scholar
Ben-Zvi, I. and Moses, Y. 2014. Beyond lamport's happened-before: On time bounds and the ordering of events in distributed systems. Journal of the ACM 61, 2, 13:1–13:26.Google Scholar
Borkar, V., Carey, M., Grover, R., Onose, N. and Vernica, R. 2011. Hyracks: A flexible and extensible foundation for data-intensive computing. In Proc. of the International Conference on Data Engineering, ICDE, 1151–1162.Google Scholar
Brewer, E. A. 2000. Towards robust distributed systems. In Proc. of the 19th Annual ACM Symposium on Principles of Distributed Computing, PODC '00. ACM, New York, NY, USA, 7–.Google Scholar
Calì, A., Gottlob, G., Lukasiewicz, T. and Pieris, A. 2011. Datalog+-: A family of languages for ontology querying. In Proc. of Datalog Reloaded – 1st International Workshop, Datalog 2010, Oxford, UK, March 16–19, 2010. Revised Selected Papers, 351–368.Google Scholar
Chandy, K. M. and Lamport, L. 1985. Distributed snapshots: Determining global states of distributed systems. ACM Transactions on Computer Systems 3, 1, 6375.Google Scholar
Condie, T., Conway, N., Alvaro, P., Hellerstein, J. M., Elmeleegy, K. and Sears, R. 2010. Mapreduce online. In NSDI. USENIX Association. 313328.Google Scholar
Cui, H., Cipar, J., Ho, Q., Kim, J. K., Lee, S., Kumar, A., Wei, J., Dai, W., Ganger, G. R., Gibbons, P. B., Gibson, G. A. and Xing, E. P. 2014. Exploiting bounded staleness to speed up big data analytics. In Proc. of the USENIX Conference on USENIX Annual Technical Conference, USENIX ATC'14. USENIX Association, Berkeley, CA, USA, 37–48.Google Scholar
Dean, J. and Ghemawat, S. 2008. Mapreduce: Simplified data processing on large clusters. CACM 51, 1, 107113.Google Scholar
DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P. and Vogels, W. 2007. Dynamo: Amazon's highly available key-value store. SIGOPS Operating Systems Review 41, 6, 205220.Google Scholar
Fagin, R., Halpern, J. Y., Moses, Y. and Vardi, M. Y. 2003. Reasoning About Knowledge. MIT Press, Cambridge, MA, USA.Google Scholar
Foster, I. 1995. Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA.Google Scholar
Furche, T., Gottlob, G., Grasso, G., Guo, X., Orsi, G., Schallhart, C. and Wang, C. 2014. DIADEM: Thousands of websites to a single database. PVLDB 7, 14, 18451856.Google Scholar
Gaifman, H., Mairson, H., Sagiv, Y. and Vardi, M. Y. 1993. Undecidable optimization problems for database logic programs. Journal of the ACM 40, 3, 683713.Google Scholar
Guessarian, I. 1990. Deciding boundedness for uniformly connected datalog programs. In International Conference on Database Theory ICDT, Vol. 470, Abiteboul, S. and Kanellakis, P., Eds. Lecture Notes in Computer Science, 395405.Google Scholar
Han, M. and Daudjee, K. 2015. Giraph unchained: Barrierless asynchronous parallel execution in pregel-like graph processing systems. VLDB 8, 9, 950961.Google Scholar
Hellerstein, J. M. 2010. The declarative imperative: Experiences and conjectures in distributed logic. SIGMOD Record 39, 519.Google Scholar
Hunt, P., Konar, M., Junqueira, F. P. and Reed, B. 2010. Zookeeper: Wait-free coordination for internet-scale systems. In Proc. of the USENIX Conference on USENIX Annual Technical Conference, USENIXATC'10. USENIX Association, Berkeley, CA, USA, 11–11.Google Scholar
Interlandi, M. and Tanca, L. 2015. On the CALM principle for BSP computation. In Proc. of the Alberto Mendelzon International Workshop on Foundations of Data Management.Google Scholar
Interlandi, M. and Tanca, L. 2017. On the CALM principle for bulk synchronous parallel computation. CoRR abs/1405.7264.Google Scholar
Interlandi, M., Tanca, L. and Bergamaschi, S. 2013. Datalog in time and space, synchronously. In Proc. of the Alberto Mendelzon International Workshop on Foundations of Data Management.Google Scholar
Kindler, E. 1994. Safety and liveness properties: A survey. Bulletin of the European Association for Theoretical Computer Science 53, 268272.Google Scholar
Koutris, P. and Suciu, D. 2011. Parallel evaluation of conjunctive queries. In Proc. of ACM Symposium on Principles of Database Systems, PODS '11. ACM, New York, NY, USA, 223–234.Google Scholar
Lamport, L. 1978. Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21, 7, 558565.Google Scholar
Lamport, L. 1984. Using time instead of timeout for fault-tolerant distributed systems. ACM Transactions on Programming Languages and Systems 6, 2, 254280.Google Scholar
Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, I., Leiser, N. and Czajkowski, G. 2010. Pregel: A system for large-scale graph processing. In Proc. of the ACM SIGMOD International Conference on Management of data, SIGMOD '10. ACM, New York, NY, USA, 135–146.Google Scholar
Mazuran, M., Serra, E. and Zaniolo, C. 2013. Extending the power of datalog recursion. VLDBJ 22, 4, 471493.Google Scholar
Mumick, I. and Shmueli, O. 1995. How expressive is stratified aggregation? Annals of Mathematics and Artificial Intelligence 15, 3–4, 407435.Google Scholar
Niu, F., Recht, B., Re, C. and Wright, S. J. 2011. Hogwild!: A lock-free approach to parallelizing stochastic gradient descent. In Proc. of the 24th International Conference on Neural Information Processing Systems, NIPS'11. Curran Associates Inc., USA, 693–701.Google Scholar
Olston, C., Reed, B., Srivastava, U., Kumar, R. and Tomkins, A. 2008. Pig latin: A not-so-foreign language for data processing. In Proc. of the Special Interest Group on Management of Data, SIGMOD. ACM, 1099–1110.Google Scholar
Ramakrishnan, R., Beeri, C. and Krishnamurthy, R. 1988. Optimizing existential datalog queries. In Proc. of Symposium on Principles of Database Systems. ACM, 89–102.Google Scholar
Ramakrishnan, R. and Ullman, J. D. 1995. A survey of deductive database systems. Journal of Logical Programming 23, 2, 125149.Google Scholar
Ross, K. A. and Sagiv, Y. 1997. Monotonic aggregation in deductive databases. Journal of Computer and System Sciences 54, 1, 7997.Google Scholar
Seib, J. and Lausen, G. 1991. Parallelizing datalog programs by generalized pivoting. In Proc. of ACM Symposium on Principles of Database Systems, 241–251.Google Scholar
Shkapsky, A., Yang, M., Interlandi, M., Chiu, H., Condie, T. and Zaniolo, C. 2016. Big data analytics with datalog queries on spark. In Proc. of the International Conference on Management of Data, SIGMOD '16. ACM, New York, NY, USA, 1135–1149.Google Scholar
Thusoo, A., Sarma, J. S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P. and Murthy, R. 2009. Hive: A warehousing solution over a map-reduce framework. VLDB 2, 2, 16261629.Google Scholar
Valiant, L. G. 1990. A bridging model for parallel computation. CACM 33, 8, 103111.Google Scholar
Vogels, W. 2009. Eventually consistent. Communications of the ACM 52, 1, 4044.Google Scholar
Wolfson, O. and Ozeri, A. 1990. A new paradigm for parallel and distributed rule-processing. In Proc. of the Special Interest Group on Management of Data SIGMOD, 133–142.Google Scholar
Wolfson, O. and Silberschatz, A. 1988. Distributed processing of logic programs. In Proc. of the Special Interest Group on Management of Data SIGMOD, 329–336.Google Scholar
Xie, C., Chen, R., Guan, H., Zang, B. and Chen, H. 2015. Sync or async: Time to fuse for distributed graph-parallel computation. In Proc. of the Principles and Practice of Parallel Programming PPoPP, 194–204.Google Scholar
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M. J., Shenker, S. and Stoica, I. 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proc. of the NSDI.Google Scholar
Zaniolo, C., Yang, M., Das, A. and Interlandi, M. 2016. The magic of pushing extrema into recursion: Simple, powerful datalog programs. In Proc. of the 10th Alberto Mendelzon International Workshop on Foundations of Data Management, Panama City, Panama, May 8–10, 2016.Google Scholar
Zaniolo, C., Yang, M., Das, A., Shkapsky, A., Condie, T. and Interlandi, M. 2017. Fixpoint semantics and optimization of recursive datalog programs with aggregates. TPLP 17, 5–6, 10481065.Google Scholar
Zaniolo, C., Yang, M., Interlandi, M., Das, A., Shkapsky, A. and Condie, T. 2018. Declarative bigdata algorithms via aggregates and relational database dependencies. In Proc. of the 12th Alberto Mendelzon International Workshop on Foundations of Data Management, Cali, Colombia, May 21–25, 2018.Google Scholar
Zinn, D., Green, T. J. and Ludäscher, B. 2012. Win-move is coordination-free (sometimes). In International Conference on Database Theory, ICDT. ACM, 99–113.Google Scholar