We always aim to reduce the generalization error of a classification model. Generalization error gives the idea that how well the model do on unseen data. But even after getting the lowest generalization error or highest accuracy, how confident are we that it shall hold on new/unseen data? A confidence…
This article show the features helping student learn and visualize data and model results in Orange using Breast Cancer dataset.
Dataset used here is taken from UCI Machine Learning repository, attributes are categorical and predicts weather the patient shall have recurrence or cancer or not.
For Starters, its easy to see the distribution of each column using distributions under visualization.
Easy to see scatter plot-
Clustering, t-SNE and DBSCAN.
Complete quick and dirty workflow is as follows-
Explores a orderbook sample using Dtale and Autoviz.
So, I recently got an email from Kaggle that there is a new challenge uploaded from Optiver for Volatility prediction. Let’s try to use auto EDA tools to explore a stock.
There is describe option which shows the details of each column-
This Story simply explores the dataset taken from data.world using Tableau.
A Line graphs for analyzing number of arrest each crime time from 2001 to 2012.
I came across a dataset consisting slogans of various company's on kaggle. Rather than watching a movie I thought to distract myself with this dataset. I tried running sentiment analysis of slogans to find something interesting.
As a ritual after performing some data quality checks, I found out that a…
This article explain various window models for keeping relevant data from continuous stream of data.
Let’s assume we have a stream of tweets coming for a particular hashtag. Initially there were 2 clusters based on emotions, positive and negative we could map the tweets to. However, as the tweets are…
This article is a little demonstration of Bisecting K-means. We all are familiar with K-means, let’s see what Bisecting the data does.
So, the infamous problem of centroid initialization in K-means has many solutions, one of them is bisecting the data points. As the main goal of the K-means algorithm…
This is a basic report with few graphs about how various shops are doing for a small business.
Let’s say a small business owner wants to know how his shops are performing. We have got only two tables, one with shops details and another with appointment details of the shops. Let’s try to report it in a minimalistic way using PowerBI.
Here is the link of the PDF I generated from the PowerBI desktop.
What I really like is the automatically generated insights. By just clicking on summaries with right click we can get quick one liner insights.
“At 0.19, Munich 1 had the highest cancellation_rate and was 983.35% higher than Berlin 2, which had the lowest cancellation_rate at 0.02.”
“Across all 7 city, cancellation_rate ranged from 0.02 to 0.19.”