Book contents
- Frontmatter
- Dedication
- Contents
- List of Figures
- List of Tables
- Preface
- Acknowledgments
- 1 Beginning with Machine Learning
- 2 Introduction to Data Mining
- 3 Beginning with Weka and R Language
- 4 Data Preprocessing
- 5 Classification
- 6 Implementing Classification in Weka and R
- 7 Cluster Analysis
- 8 Implementing Clustering with Weka and R
- 9 Association Mining
- 10 Implementing Association Mining with Weka and R
- 11 Web Mining and Search Engines
- 12 Data Warehouse
- 13 Data Warehouse Schema
- 14 Online Analytical Processing
- 15 Big Data and NoSQL
- Index
- Colour Plates
3 - Beginning with Weka and R Language
Published online by Cambridge University Press: 26 April 2019
- Frontmatter
- Dedication
- Contents
- List of Figures
- List of Tables
- Preface
- Acknowledgments
- 1 Beginning with Machine Learning
- 2 Introduction to Data Mining
- 3 Beginning with Weka and R Language
- 4 Data Preprocessing
- 5 Classification
- 6 Implementing Classification in Weka and R
- 7 Cluster Analysis
- 8 Implementing Clustering with Weka and R
- 9 Association Mining
- 10 Implementing Association Mining with Weka and R
- 11 Web Mining and Search Engines
- 12 Data Warehouse
- 13 Data Warehouse Schema
- 14 Online Analytical Processing
- 15 Big Data and NoSQL
- Index
- Colour Plates
Summary
Chapter Objectives
✓ To learn to install Weka and the R language
✓ To demonstrate the use of Weka software
✓ To experiment with Weka on the Iris dataset
✓ To introduce basics of R language
✓ To experiment with R on the Iris dataset
About Weka
In this book, all data mining algorithms are explained with Weka and R language. The learner can perform and apply these algorithms easily using these well-know data mining tool and language. Let's first discuss the Weka tool.
Weka is an open-source software under the GNU General Public License System. It was developed by the Machine Learning Group, University of Waikato, New Zealand. Although named after a flightless New Zealand bird, ‘WEKA’ stands for Waikato Environment for Knowledge Analysis. The system is written using the object oriented language Java. Weka is data mining software and it is a set of machine learning algorithms that can be applied to a dataset directly, or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization.
The story of the development of Weka is very interesting. It was initially developed by students of University of Waikato, New Zealand, as part of their course work on data mining. They had implemented all major machine learning algorithms as part of lab work for this course. In 1993, the University of Waikato began development of the original version of Weka, which became a mix of Tcl/Tk, C, and Makefiles. In 1997, the decision was made to redevelop Weka from scratch in Java, including implementations of modeling algorithms. In 2006, Pentaho Corporation acquired an exclusive license to use Weka for business intelligence.
This chapter will cover the installation of Weka, datasets available and will guide the learner about how to start experimentation using Weka. Later on we will discuss another data mining tool, R. Let us first discuss the installation process for Weka, step-by-step.
Installing Weka
Weka is freely available and its latest version can be easily downloaded from https://www.cs.waikato. ac.nz/ml/weka/downloading.html as shown in Figure 3.1.
To work more smoothly, you must first download and install Java VM before downloading Weka.
- Type
- Chapter
- Information
- Data Mining and Data WarehousingPrinciples and Practical Techniques, pp. 28 - 54Publisher: Cambridge University PressPrint publication year: 2019