Accelerate Analysis, Insight & Data Mining (for non coders)

R and Python are most popular, powerful and well designed (reasonably for non-techies) open source command-line based language for analysis, data science or machine learning enthusiast. Getting started and moving to a proficient level in these programming language (or any) is a tedious and time-consuming process. However accelerated delivery of insight, analysis, modeling, and proof of concepts is a key characteristic of successful analytics team to validate the strategies that enable us to drive business decisions in the right direction. The aim of this article is to provide useful resources around tools that will help the analyst to accelerate the delivery of insight, analysis and modelling.

Lets first understand how the analysis, insight, and data mining process is performed.

How do we perform analysis/insight?

The objective for an analyst is to convert the Data/Information into an insight and recommend on a possible Action that the business can take. Insight is the mechanism to do this successfully and through the work, it is key that we always keep the Business Context in mind.

Insight_Model

How do we do data mining?

Mainly three data mining process frameworks have been most popular, and widely practiced by data mining experts/researchers to build machine learning systems. You’ll notice that the core phases are covered by all 3 frameworks, and is not much difference.

1. Knowledge Discovery Databases (KDD) process model

KDD

The concept of KDD is essentially integration of multiple technologies of data mining which was presented in a book by Fayyad in 1996, learn more from the link https://www.aaai.org/ojs/index.php/aimagazine/article/viewFile/1230/1131

2. CRoss Industrial Standard Process for Data Mining (CRISP-DM)

CRISP_DM

CRISP_DM, creates unbiased methodology that is not domain dependent and consolidates Data Mining best practices. CRISP-DM voted as leading methodology for Data Mining in polls of 2002, 2004 & 2007. You can learn more about CRISM-DM from the following link https://www.the-modeling-agency.com/crisp-dm.pdf

3. Sample, Explore, Modify, Model and Assess (SEMMA)

SEMMA

SEMMA is sequential steps of 5 to build machine learning models incorporated in SAS Enterprise Miner by SAS Inc., You can learn more about SEMMA through following link http://www2.sas.com/proceedings/sugi22/DATAWARE/PAPER128.PDF

Tools that you know & able to use = your skill level!

A Command-Line Interface (CLI) is way faster, powerful than User Interface based tools in terms of flexibility and features, however Graphical User Interface (GUI) is easy to use, interactive and has out of the box visualization. The aim here is not GUI vs CLI, but to be equipped enough with tools to understand and apply basic concepts, gain independence when it comes to data analysis and communicate it effectively. At the end of the day, the quick delivery skills of an analyst boils down to the tools that they are aware and able to use, so the goal here is to introduce you to the MUST KNOW GUI based open source tools that do not require much of coding to get you started to help accelerate the delivery of analytics (particularly during initial stage of your data science adaptation).

Screenshot from 2019-11-01 20-05-10

Tool Power to analyst

Here is a summary of general advantages and disadvantage of GUI based analytical tools

Pros:

  • GUI’s are click to do things so lots of fun
  • Great functionalities, many data mining packages, stunning out of the box visualizations
  • Supports key & cross-platform (Windows, Mac, Linux)

Cons:

  • Stores all of the data in RAM, so can crash/run slow with high volume of data

These tools to accelerate analysis, insight, and mining can be divided into two categories.

  1. R/Python aider’s: These are built to utilize the native capabilities of R/Python. Note that I have listed the most popular GUI for R/Python below (and the list does not necessarily cover all of the available tools)
    1. Rattle (R)
    2. JGB/Deducer (R)
    3. RCommander (R)
    4. Orange (Python)
  2. Independent platforms
    1. H2O
    2. KNIME

1.1 R: Rattle

How to install: run the below command in your RStudio

install.packages("rattle", repos="http://rattle.togaware.com", type="source")

How to launch:

library(rattle);rattle()

Full Tutorial: Click here!

Sample Screenshot(s):

rattle.png

1.2 JGB / Deducer

How to Install: run the below command in your RStudio

install.packages(c("JGB","Deducer","DeducerExtras"),dependencies=T)

How to launch: 

 library(jgb); jgb()

Tutorials: Link -1 here!, Link-2 here!

Sample Screenshots:

JGB.png

1.3 RCommander

How to install: run the below command in your RStudio

install.packages("Rcmdr", dependencies=T)

How to launch:

library(Rcmdr)

Tutorials: Link-1 here!, Link-2 here!

Sample Screenshots:

RCommander.png

1.4 Python Orange

Orange is an open source, from the AI Laboratory in Ljubljana, Slovenia.

How to install: you can download the appropriate executable installation file from their official web site for your OS here!

How to launch: The installation will create a desktop shortcut and add it to the menu bar. The application can be launched using the same.

Tutorial: Click here!

Sample Screenshot:

Orange.png

2.1 H2O

H2O is open-source software for big-data analysis. It is produced by the company H2O.ai, which launched in 2011 in Silicon Valley. H2O allows users to fit thousands of potential models as part of discovering patterns in data. It is claimed to be one of the world’s leading open source deep learning platform, used by over 100,000 data scientists and more than 10,000 organizations around the world. Their design goal is “To Bring Beautiful Business Transformation Through AI and Visual Intelligence” through 1) Make it Open 2) Make it Fast, Really Fast 3) Make it Beautiful

How to Install: Click here to learn more.

Tutorial: Click here!

Sample Screenshot:

H2o.png

2.2 KNIME

KNIME, the Konstanz Information Miner, is an open source data analytics, reporting and integration platform. KNIME integrates various components for machine learning and data mining through its modular data pipeline concept. KNIME Analytics Platform is the leading open solution for data-driven innovation, helping you discover the potential hidden in your data, mine for fresh insights, or predict new futures. It’s enterprise-grade, open source platform is fast to depoly, easy to scale and intuitive to learn. With more than 1000 modules, hundreds of ready-to-run examples, a comprehensive range of integrated tools, and the widest choice of advanced algorithms available, KNIME Analytics Platform is the perfect toolbox for any data scientist. Their steady course on an unrestricted open source is your passport to a global community of data scientists, their expertise, and their active contributions. Read more here!

How to Install: You can download the appropriate executable installation file from their official website for your OS here!

Tutorial: Link-1 here!, Link-2 here!

Sample Screenshots:

KNIME.png

2.3 WEKA

Waikato Environment for Knowledge Analysis is a suite of machine learning software written in Java, developed at the University of Waikato, New Zealand. It is free software licensed under the General Public License (GNU). Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a data-set or called from your own Java code. Weka contains tools for data per-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.

How to Install: Click here!

Tutorials: Link-1 here!, Link-2 here!, Link-3 (video) here!

Sample Screenshot:

WEKA.png

Hope this article was useful!

Leave a Reply