We always aim to reduce the generalization error of a classification model. Generalization error gives the idea that how well the model do on unseen data. But even after getting the lowest generalization error or highest accuracy, how confident are we that it shall hold on new/unseen data? A confidence…

This article show the features helping student learn and visualize data and model results in Orange using Breast Cancer dataset.

Dataset used here is taken from UCI Machine Learning repository, attributes are categorical and predicts weather the patient shall have recurrence or cancer or not.

For Starters, its easy to see the distribution of each column using distributions under visualization.

Easy to see scatter plot-

Decision Boundary-

Clustering, t-SNE and DBSCAN.

Complete quick and dirty workflow is as follows-

Explores a orderbook sample using Dtale and Autoviz.

So, I recently got an email from Kaggle that there is a new challenge uploaded from Optiver for Volatility prediction. Let’s try to use auto EDA tools to explore a stock.

Using Dtale-

There is describe option which shows the details of each column-

Histogram-

This article explain various window models for keeping relevant data from continuous stream of data.

Let’s assume we have a stream of tweets coming for a particular hashtag. Initially there were 2 clusters based on emotions, positive and negative we could map the tweets to. However, as the tweets are…

This article is a little demonstration of Bisecting K-means. We all are familiar with K-means, let’s see what Bisecting the data does.

So, the infamous problem of centroid initialization in K-means has many solutions, one of them is bisecting the data points. As the main goal of the K-means algorithm…

This is a basic report with few graphs about how various shops are doing for a small business.

Let’s say a small business owner wants to know how his shops are performing. We have got only two tables, one with shops details and another with appointment details of the shops. Let’s try to report it in a minimalistic way using PowerBI.

Here is the link of the PDF I generated from the PowerBI desktop.

https://github.com/BlackCurrantDS/Data-Mining/blob/master/shops_appointments.pdf

What I really like is the automatically generated insights. By just clicking on summaries with right click we can get quick one liner insights.

“At 0.19, Munich 1 had the highest cancellation_rate and was 983.35% higher than Berlin 2, which had the lowest cancellation_rate at 0.02.”

“Across all 7 city, cancellation_rate ranged from 0.02 to 0.19.”

Siya

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store