Hostname: page-component-cd9895bd7-7cvxr Total loading time: 0 Render date: 2024-12-25T20:18:52.695Z Has data issue: false hasContentIssue false

Storing massive Resource Description Framework (RDF) data: a survey

Published online by Cambridge University Press:  07 December 2016

Zongmin Ma
Affiliation:
College of Computer Science & Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China e-mail: [email protected] Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 210023, China e-mail: [email protected]
Miriam A. M. Capretz
Affiliation:
Department of Electrical and Computer Engineering, Western University, London, Canada, ON N6A 5B9 e-mail: [email protected]
Li Yan
Affiliation:
College of Computer Science & Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China e-mail: [email protected] Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 210023, China e-mail: [email protected]

Abstract

The Resource Description Framework (RDF) is a flexible model for representing information about resources on the Web. As a W3C (World Wide Web Consortium) Recommendation, RDF has rapidly gained popularity. With the widespread acceptance of RDF on the Web and in the enterprise, a huge amount of RDF data is being proliferated and becoming available. Efficient and scalable management of RDF data is therefore of increasing importance. RDF data management has attracted attention in the database and Semantic Web communities. Much work has been devoted to proposing different solutions to store RDF data efficiently. This paper focusses on using relational databases and NoSQL (for ‘not only SQL (Structured Query Language)’) databases to store massive RDF data. A full up-to-date overview of the current state of the art in RDF data storage is provided in the paper.

Type
Survey Article
Copyright
© Cambridge University Press, 2016 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abadi, D. J., Marcus, A., Madden, S. & Hollenbach, K. 2007. Scalable semantic web data management using vertical partitioning. In Proceedings of the 33th International Conference on Very Large Data Bases, 411–422.Google Scholar
Abadi, D. J., Marcus, A., Madden, S. & Hollenbach, K. 2009. SW-Store: a vertically partitioned DBMS for semantic web data management. VLDB Journal 18(2), 385406.CrossRefGoogle Scholar
Angles, R., Boncz, P. A., Larriba-Pey, J.-L., Fundulaki, I., Neumann, T., Erling, O., Neubauer, P., Martinez-Bazan, N., Kotsev, V. & Toma, I. 2014. The Linked Data Benchmark Council: a graph and RDF industry benchmarking effort. SIGMOD Record 43(1), 2731.CrossRefGoogle Scholar
Angles, R. & Gutierrez, C. 2005. Querying RDF data from a graph database perspective. In Proceedings of the Second European Semantic Web Conference, 346–360.Google Scholar
Angles, R. & Gutierrez, C. 2008. Survey of graph database models. ACM Computing Surveys 40, 1:11:39.CrossRefGoogle Scholar
Anguita, A., Martin, L., Garcia-Remesal, M. & Maojo, V. 2013. RDFBuilder: a tool to automatically build RDF-based interfaces for MAGE-OM microarray data sources. Computer Methods and Programs in Biomedicine III, 220227.CrossRefGoogle Scholar
Apweiler, R., Bairoch, A., Wu, C. H., Barker, W. C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin, M. J., Natale, D. A., O’Donovan, C., Redaschi, N. & Yeh, L. S. 2004. UniProt: the universal protein knowledge base. Nucleic Acids Research 32, D115D119.CrossRefGoogle Scholar
Berners-Lee, T., Hendler, J. & Lassila, O. 2001. The semantic web. Scientific American 284(5), 3443.Google Scholar
Bishop, B., Kiryakov, A., Ognyanoff, D., Peikov, I., Tashev, Z. & Velkov, R. 2011. OWLIM: a family of scalable semantic repositories. Semantic Web 2(1), 110.CrossRefGoogle Scholar
Bishop, B., Kiryakov, A., Tashev, Z., Damova, M. & Simov, K. I. 2012. OWLIM reasoning over FactForge. In Proceedings of the 1st International Workshop on OWL Reasoner Evaluation.Google Scholar
Bizer, C., Heath, T. & Berners-Lee, T. 2009. Linked data—the story so far. International Journal of Semantic Web and Information Systems 5(3), 122.Google Scholar
Bizer, C. & Schultz, A. 2009. The Berlin SPARQL benchmark. International Journal on Semantic Web and Information Systems 5(2), 124.Google Scholar
Bonstrom, V., Hinze, A. & Schweppe, H. 2003. Storing RDF as a graph. In Proceedings of the First Conference on Latin American Web Congress, 27–36.Google Scholar
Bornea, M. A., Dolby, J., Kementsietsidis, A., Srinivas, K., Dantressangle, P., Udrea, O. & Bhattacharjee, B. 2013. Building an efficient RDF store over a relational database. In Proceedings of the 2013 ACM International Conference on Management of Data, 121–132.Google Scholar
Broekstra, J., Kampman, A. & van Harmelen, F. 2002. Sesame: a generic architecture for storing and querying RDF and RDF schema. In Proceedings of the 2002 International Semantic Web Conference, 54–68.Google Scholar
Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A. & Gruber, R. E. 2008. BigTable: a distributed storage system for structured data. ACM Transactions on Computer Systems 26(2), 4:14:26.Google Scholar
Chao, C.-M. 2007a. An object-oriented approach for storing and retrieving RDF/RDFS documents. Tamkang Journal of Science and Engineering 10(3), 275286.Google Scholar
Chao, C.-M. 2007b. An object-oriented approach to storage and retrieval of RDF/XML documents. In Proceedings of the 19th International Conference on Software Engineering & Knowledge Engineering, 586–591.Google Scholar
Chebotko, A., Abraham, J., Brazier, P., Piazza, A., Kashlev, A. & Lu, S. 2013. Storing, indexing and querying large provenance data sets as RDF graphs in Apache HBase. In Proceedings of IEEE Ninth World Congress on Services, 1–8.Google Scholar
Choi, P., Jung, J. & Lee, K.-H. 2013. RDFChain: chain centric storage for scalable join processing of RDF graphs using MapReduce and HBase. In Proceeding of the 2013 International Semantic Web Conference, 249–252.Google Scholar
Cudre-Mauroux, P., Enchev, I., Fundatureanu, S., Groth, P., Haque, A., Harth, A., Keppmann, F. L., Miranker, D. P., Sequeda, J. F. & Wylot, M. 2013. NoSQL databases for RDF: an empirical evaluation. In Proceedings of the 12th International Semantic Web Conference, 310–325.Google Scholar
DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P. & Vogels, W. 2007. Dynamo: Amazon’s highly available key-value store. In Proceedings of the 21st ACM Symposium on Operating Systems Principles, 205–220.Google Scholar
Dell’Aglio, D., Calbimonte, J.-P., Balduini, M., Corcho, O. & Valle, E. D. 2013. On correctness in RDF stream processor benchmarking. In Proceedings of the 12th International Semantic Web Conference, 326–342.Google Scholar
Duan, S., Kementsietsidis, A., Srinivas, K. & Udrea, O. 2011. Apples and oranges: a comparison of RDF benchmarks and real RDF datasets. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, 145–156.Google Scholar
Erling, O. & Mikhailov, I. 2007. RDF support in the Virtuoso DBMS. In Proceedings of the 1st Conference on Social Semantic Web, 59–68.Google Scholar
Erling, O. & Mikhailov, I. 2009. Virtuoso: RDF support in a native RDBMS. In Semantic Web Information Management, De Virgilio, R., Giunchiglia, F. & Tanca, L. (eds). Springer-Verlag, 501–519.Google Scholar
Franke, C., Morin, S., Chebotko, A., Abraham, J. & Brazier, P. 2011. Distributed semantic web data management in HBase and MySQL Cluster. In Proceedings of the 2011 IEEE International Conference on Cloud Computing, 105–112.Google Scholar
Garbis, G., Kyzirakos, K. & Koubarakis, M. 2013. Geographica: a benchmark for geospatial RDF stores. In Proceedings of the 12th International Semantic Web Conference, 343–359.Google Scholar
Grolinger, K., Higashino, W. A., Tiwari, A. & Capretz, M. A. M. 2013. Data management in cloud environments: NoSQL and NewSQL data stores. Journal of Cloud Computing: Advances, Systems and Applications 2, 22.CrossRefGoogle Scholar
Gueret, C., Kotoulas, S. & Groth, P. 2011. TripleCloud: an infrastructure for exploratory querying over web-scale RDF data. In Proceedings of the 2011 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology—Workshops, 245–248.Google Scholar
Guo, Y., Pan, Z. & Heflin, J. 2005. LUBM: a benchmark for OWL knowledge base systems. Journal of Web Semantics 3(2–3), 158182.CrossRefGoogle Scholar
Harris, S. & Gibbins, N. 2003. 3store: efficient bulk RDF storage. In Proceedings of the First International Workshop on Practical and Scalable Semantic Systems.Google Scholar
Harris, S., Lamb, N. & Shadbolt, N. 2009. 4store: the design and implementation of a clustered RDF store. In Proceedings of the 5th International Workshop on Scalable Semantic Web Knowledge Base Systems, 94–109.Google Scholar
Harris, S. & Shadbolt, N. 2005. SPARQL query processing with conventional relational database systems. In Proceedings of the International Workshop on Scalable Semantic Web Knowledge Base Systems, 235–244.Google Scholar
Harth, A., Umbrich, J., Hogan, A. & Decker, S. 2007. YARS2: a federated repository for querying graph structured data from the web. In Proceedings of the 6th International Semantic Web Conference, 211–224.Google Scholar
Hassanzadeh, O., Kementsietsidis, A. & Velegrakis, Y. 2012. Data management issues on the semantic web. In Proceedings of the 2012 IEEE International Conference on Data Engineering, 1204–1206.Google Scholar
Hayes, J. & Gutierrez, C. 2004. Bipartite graphs as intermediate model for RDF. In Proceedings of the 2004 International Semantic Web Conference, 47–61.Google Scholar
Huang, J., Abadi, D. J & Ren, K. 2011. Scalable SPARQL querying of large RDF graphs. Proceedings of the VLDB Endowment 4(11), 11231134.CrossRefGoogle Scholar
Husain, M., McGlothlin, J., Masud, M., Khan, L. & Thuraisingham, B. 2011. Heuristics-based query processing for large RDF graphs using cloud computing. IEEE Transactions on Knowledge and Data Engineering 23(9), 13121327.Google Scholar
Husain, M. F., Doshi, P., Khan, L. & Thuraisingham, B. 2009. Storage and retrieval of large RDF graph using Hadoop and MapReduce. In Proceedings of the First International Conference on Cloud Computing, 680–686.Google Scholar
Karvounarakis, G., Alexaki, S., Christophides, V., Plexousakis, D. & Scholl, M. 2002. RQL: a declarative query language for RDF. In Proceedings of the 11th International Conference on World Wide Web, 592–603.Google Scholar
Khadilkar, V., Kantarcioglu, M., Thuraisingham, B. M. & Castagna, P. 2012. Jena-HBase: a distributed, scalable and efficient RDF triple store. In Proceedings of the 2012 International Semantic Web Conference.Google Scholar
Kim, H. S., Ravindra, P. & Anyanwu, K. 2012. Scan-sharing for optimizing RDF graph pattern matching on MapReduce. In Proceedings of the 2012 IEEE Fifth International Conference on Cloud Computing, 139–146.Google Scholar
Kim, S. W. 2006. Hybrid storage scheme for RDF data management in semantic web. Journal of Digital Information Management 4(1), 3236.Google Scholar
Kolas, D. 2008. A benchmark for spatial semantic web systems. In Proceedings of the 2008 International Workshop on Scalable Semantic Web Knowledge Base Systems.Google Scholar
Lakshman, A. & Malik, P. 2010. Cassandra: a decentralized structured storage system. ACM SIGOPS Operating System Review 44(2), 3540.Google Scholar
Lee, K. & Liu, L. 2013. Scaling queries over big RDF graphs with semantic hash partitioning. Proceedings of the VLDB Endowment 6(14), 18941905.Google Scholar
Le-Phuoc, D., Dao-Tran, M., Pham, M.-D., Boncz, P., Eiter, T. & Fink, M. 2012. Linked stream data processing engines: facts and figures. In Proceedings of the 11th International Semantic Web Conference, 300–312.Google Scholar
Levandoski, J. J. & Mokbel, M. F. 2009. RDF data-centric storage. In Proceedings of the 2009 IEEE International Conference on Web Services, 911–918.Google Scholar
Libkin, L., Reutter, J. L. & Vrgoc, D. 2013. Trial for RDF: adapting graph query languages for RDF data. In Proceedings of the 32nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, 201–212.Google Scholar
Luo, Y., Picalausa, F., Fletcher, G. H. L., Hidders, J. & Vansummeren, S. 2012. Storing and indexing massive RDF datasets. In Semantic Search Over the Web, De Virgilio, R., Guerra, F. & Velegrakis, Y. (eds). Springer-Verlag, 31–60.Google Scholar
Manola, F. & Miller, E. 2004. RDF primer, W3C Recommendation. http://www.w3.org/TR/2004/REC-rdf-primer-20040210/.Google Scholar
Matono, A., Amagasa, T., Yoshikawa, M. & Uemura, S. 2005. A path-based relational RDF database. In Proceedings of the 16th Australasian Database Conference, 95–103.Google Scholar
Matono, A. & Kojima, I. 2012. Paragraph tables: a storage scheme based on RDF document structure. In Proceedings of the 23rd International Conference on Database and Expert Systems Applications, 231–247.Google Scholar
McBride, B. 2002. Jena: a semantic web toolkit. IEEE Internet Computing 6(6), 5559.Google Scholar
Minack, E., Siberski, W. & Nejdl, W. 2009. Benchmarking fulltext search performance of RDF stores. In Proceedings of the 6th European Semantic Web Conference, 81–95.Google Scholar
Morsey, M., Lehmann, J., Auer, S. & Ngomo, A. C. N. 2011. DBpedia SPARQL benchmark-performance assessment with real queries on real data. In Proceedings of the 10th International Semantic Web Conference, 454–469.Google Scholar
Morsey, M., Lehmann, J., Auer, S. & Ngomo, A. C. N. 2012. Usage-centric benchmarking of RDF triple stores. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, 2134–2140.Google Scholar
Mulay, K. & Kumar, P. S. 2012. SPOVC: a scalable RDF store using horizontal partitioning and column oriented DBMS. In Proceedings of the 4th International Workshop on Semantic Web Information Management.Google Scholar
Neumann, T. & Moerkotte, G. 2011. Characteristic sets: accurate cardinality estimation for RDF queries with multiple joins. In Proceedings of the 27th International Conference on Data Engineering, 984–994.Google Scholar
Neumann, T. & Weikum, G. 2008. RDF-3X: a RISC-style engine for RDF. Proceedings of the VLDB Endowment 1(1), 647659.CrossRefGoogle Scholar
Neumann, T. & Weikum, G. 2010. The RDF-3X engine for scalable management of RDF data. The VLDB Journal 19(1), 91113.Google Scholar
Owens, A., Seaborne, A., Gibbins, N. & Schraefel, M. 2009. Clustered TDB: a clustered triple store for Jena. In Proceedings of the 13th International Conference on World Wide Web.Google Scholar
Papailiou, N., Konstantinou, I., Tsoumakos, D., Karras, P. & Koziris, N. 2013. H2RDF+: high-performance distributed joins over large-scale RDF graphs. In Proceedings of the 2013 IEEE International Conference on Big Data, 255–263.Google Scholar
Papailiou, N., Konstantinou, I., Tsoumakos, D. & Koziris, N. 2012. H2RDF: adaptive query processing on RDF data in the cloud. In Proceedings of the 21st World Wide Web Conference, 397–400.Google Scholar
Patni, H., Henson, C. & Sheth, A. 2010. Linked sensor data. In Proceedings of the 2010 International Symposium on Collaborative Technologies and Systems, 362–370.Google Scholar
Przyjaciel-Zablocki, M., Schatzle, A., Hornung, T., Dorner, C. & Lausen, G. 2012. Cascading map-side joins over HBase for scalable join processing. In CoRR 2012.Google Scholar
Ravindra, P., Kim, H. S. & Anyanwu, K. 2011. An intermediate algebra for optimizing RDF graph pattern matching on MapReduce. In Proceedings of the 8th Extended Semantic Web Conference, 46–61.Google Scholar
Rohloff, K. & Schantz, R. E. 2011. Clause-iteration with MapReduce to scalably query datagraphs in the SHARD graph-store. In Proceedings of the Fourth International Workshop on Data-Intensive Distributed Computing, 35–44.Google Scholar
Sakr, S. & Al-Naymat, G. 2009. Relational processing of RDF queries: a survey. SIGMOD Record 38(4), 2328.Google Scholar
Salvadores, M., Correndo, G., Harris, S., Gibbins, N. & Shadbolt, N. 2011. The design and implementation of minimal RDFS backward reasoning in 4store. In Proceedings of the 8th Extended Semantic Web Conference, 139–153.Google Scholar
Salvadores, M., Correndo, G., Omitola, T., Gibbins, N., Harris, S. & Shadbolt, N. 2010. 4s-reasoner: RDFS backward chained reasoning support in 4store. In Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and International Conference on Intelligent Agent Technology—Workshops, 261–264.Google Scholar
Schmidt, M., Hornung, T., Kuchlin, N., Lausen, G. & Pinkel, C. 2008. An experimental comparison of RDF data management approaches in a SPARQL Benchmark scenario. In Proceedings of the 7th International Semantic Web Conference, 82–97.Google Scholar
Schmidt, M., Hornung, T., Lausen, G. & Pinkel, C. 2009. SP2Bench: a SPARQL Performance Benchmark. In Proceedings of the 25th International Conference on Data Engineering, 222–233.Google Scholar
Sequeda, J. F., Tirmizi, S. H., Corcho, O. & Miranker, D. P. 2011. Survey of directly mapping SQL databases to the semantic web. Knowledge Engineering Review 26(4), 445486.Google Scholar
Sidirourgos, L., Goncalves, R., Kersten, M. L., Nes, N. & Manegold, S. 2008. Column-store support for RDF data management: not all swans are white. Proceedings of the VLDB Endowment 1(2), 15531563.Google Scholar
Sintek, M. & Kiesel, M. 2006. RDFBroker: a signature-based high-performance RDF store. In Proceedings of the 3rd European Semantic Web Conference, 363–377.Google Scholar
Sperka, S. & Smrz, P. 2012. Towards adaptive and semantic database model for RDF data stores. In Proceedings of the Sixth International Conference on Complex, Intelligent, and Software Intensive Systems, 810–815.Google Scholar
Stein, R. & Zachrias, V. 2010. RDF on cloud number nine. In Proceedings of the 4th Workshop on New Forms of Reasoning for the Semantic Web: Scalable & Dynamic, 11–23.Google Scholar
Stonebraker, M., Abadi, D. J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E., Rasin, A., Tran, N. & Zdonik, S. 2005. C-Store: a column-oriented DBMS. In Proceedings of the 31st International Conference on Very Large Data Bases, 553–564.Google Scholar
Suchanek, F. M., Kasneci, G. & Weikum, G. 2008. YAGO: a large ontology from Wikipedia and WordNet. Journal of Web Semantics 6(3), 203217.Google Scholar
Sun, J. L. & Jin, Q. 2010. Scalable RDF store based on HBase and MapReduce. In Proceedings of the 3rd International Conference Advanced Computer Theory and Engineering, V1-633–V1-636.Google Scholar
Theoharis, Y., Christophides, V. & Karvounarakis, G. 2005. Benchmarking database representations of RDF/S stores. In Proceedings of the 4th International Semantic Web Conference, 685–701.Google Scholar
Urbani, J., Kotoulas, S., Oren, E. & Harmelen, F. 2009. Scalable distributed reasoning using MapReduce. In Proceedings of the 8th International Semantic Web Conference, 634–649.Google Scholar
Wang, Y., Du, X. Y., Lu, J. H. & Wang, X. F. 2010. FlexTable: using a dynamic relation model to store RDF data. In Proceedings of the 15th International Conference on Database Systems for Advanced Applications, 580–594.Google Scholar
Weiss, C., Karras, P. & Bernstein, A. 2008. Hexastore: sextuple indexing for semantic web data management. Proceedings of the VLDB Endowment 1(1), 10081019.Google Scholar
Wilkinson, K. 2006. Jena property table implementation. Technical report HPL-2006-140, HP Labs.Google Scholar
Wilkinson, K., Sayers, C., Kuno, H. A. & Reynolds, D. 2003. Efficient RDF storage and retrieval in Jena2. In Semantic Web and Databases Workshop, 131–150.Google Scholar
Wolff, B. G. J., Fletcher, G. H. L. & Lu, J. J. 2015. An extensible framework for query optimization on TripleT-based RDF stores. In Proceedings of the Workshops of the EDBT/ICDT 2015 Joint Conference, 190–196.Google Scholar
Zeng, K., Yang, J. C., Wang, H. X., Shao, B. & Wang, Z. Y. 2013. A distributed graph engine for web scale RDF data. Proceedings of the VLDB Endowment 6(4), 265276.Google Scholar
Zhang, X. F., Chen, L. & Wang, M. 2012a. Towards efficient join processing over large RDF graph using MapReduce. In Proceedings of the 24th International Conference on Scientific and Statistical Database Management, 250–259.Google Scholar
Zhang, Y., Pham, M. D., Corcho, O. & Calbimonte, J. P. 2012b. SRBench: a streaming RDF/SPARQL benchmark. In Proceedings of the 11th International Semantic Web Conference, 641–657.Google Scholar