My FeedDiscussionsHeadless CMS
New
Sign in
Log inSign up
Learn more about Hashnode Headless CMSHashnode Headless CMS
Collaborate seamlessly with Hashnode Headless CMS for Enterprise.
Upgrade ✨Learn more
Using Open Source Tools for Data Science

Using Open Source Tools for Data Science

Mary's photo
Mary
·Feb 12, 2020

There are many tools available for data scientists to use. Some are proprietary and others are open source. In the second course, a number of open source tools and how they are used are taught.

1. Introducing Skills Network Labs
Skills Network Lab is a platform created by IBM. It is a platform that hosts a number of tools such as Jupyter Notebook, Zeppelin Notebook and RStudio. This platform is hosted on the cloud hence one does not need to install it locally on their machine. It is accessible on labs.cognitiveclass.ai

1.png

2. JupyterLab & Jupyter Notebook
JupyterLab is a web-based platform that allows data analysts and scientists to analyze, visualize and build models around their data. Files created on the platform are known as Jupyter Notebooks. This platform can be setup offline by using pip or installing the Anaconda distribution and it would load up with your local browser. For cloud hosting, it is already setup in the Skills Network Labs.

One can create notebooks in different tools such as R, Python, Julia, Swift and Scala.

2.png

3. Zeppelin Notebooks
Another web-based tool in the Network Skills Lab is the Zeppelin Notebook. Zeppelin Notebooks currently support over 20 interpreters such as Python, Spark, Cassandra etc. You can use multiple programming languages in the same Zeppelin Notebook. Files are stored in JSON extension.

3.png

4. RStudio IDE
This is another tool available in the Skills Network Lab. It can also be installed offline with an executable file. It is an integrated development environment that allows one to work with files written in R. Below shows how to load data in RStudio

ezgif-6-2ddaaaaef019.gif

5. IBM Watson Studio
This is another integrated development environment that allows us to create scripts with tools we have already discussed such as R, Python and Scala. It has a collaborative feature where team members can work on scripts and share their findings among themselves. It is available on ibm.com/cloud/watson-studio

5.png

Other open source tools for data science include:
Weka: Named after the university from which it was developed, Waikato Environment for Knowledge Analysis (WEKA) is a graphic based application for data mining and building machine learning applications.
Apache-Hadoop:
This is a platform developed to handle very large data in a distributed manner. It was built by the Apache Software Foundation.
D3:
This is an open source framework built in JavaScript and is used for creating data visualizations accessible in web browsers.
TensorFlow:
A popular machine learning framework built by Google to train and design models.

The next article will be a summary on Data Science Methodology .
Bye bye.