Data Science Archives - Ivan Kuznetsov

2023 UPDATE: California Lottery – Scratchers – picking the winning tickets using math

Jan 11, 2023

—

by

2023 UPDATE: Full analysis in this spreadsheet has been updated to reflect the latest products offered by California Lottery. I’ve always been fascinated by how the lottery works. From a mathematical point of view spending money on lottery tickets is a complete waste of time and money. There are a few exceptions – when a…

Monty Hall Problem – How Randomness Rules Our World and Why We Cannot See It

Jul 25, 2017

—

by

Ivan Kuznetsov

in Data Science

Ever since I read about Monty Hall problem in “The Drunkard’s Walk: How Randomness Rules Our Lives” book by Leonard Mlodinow from of the California Institute of Technology, I always wanted to try and run a simulation to see that the math is correct. It is one of those problems, where the first answer that comes…

Are People in Colder Countries Taller? in Julia

Oct 5, 2016

—

by

Ivan Kuznetsov

in Data Science

Continuing to play with Julia and data visualizations. This time I decided to replicate a scatterplot created by Matt Stiles examining the relationship between a country’s average temperature and its male residents’ average height. Data comes from WorldBank and NCD-RisC. The size of the bubbles is linearly proportional to country population. Color indicates new World Bank income categories. People seem to…

Life Expectancy by Country

Aug 4, 2016

—

by

Ivan Kuznetsov

in Data Science

I was inspired by Andrew Collier’s blog post Life Expectancy by Country where he illustrated how to create a bubble chart that compares female and male life expectancies for a number of countries based on the data scraped from Wikipedia using R and Plot.ly charts. I decided to replicate these results using another popular language for technical computing – Julia. Scraping Wikipedia in…

Measuring user retention using cohort analysis with R

Apr 27, 2012

—

by

Ivan Kuznetsov

in Data Science, R, Startups, Web/Tech

Cohort analysis is super important if you want to know if your service is in fact a leaky bucket despite nice growth of absolute numbers. There’s a good write up on that subject “Cohorts, Retention, Churn, ARPU” by Matt Johnson. So how to do it using R and how to visualize it. Inspired by examples…

Heat map visualization of sick day trends in Finland with R, ggplot2 and Google Correlate

Apr 24, 2012

—

by

Ivan Kuznetsov

in Data Science, HeiaHeia, R

Inspired by Margintale’s post “ggplot2 Time Series Heatmaps” and Google Flu Trends I decided to use a heat map to visualize sick days logged by HeiaHeia.com Finnish users. I got the data from our database, filtering results by country (Finnish users only) in a tab separated form with the first line as the header. Three columns…

Informal notes from Strata 2012 conference on Big Data and Data Science

Mar 31, 2012

—

by

Ivan Kuznetsov

in Data Science, Travel, Web/Tech

It’s been almost a month since I came back from California, and I just got around to sorting the notes from O’Reilly Strata conference. Spending time in the Valley is always inspiring – lots of interesting people, old friends, new contacts, new start-ups – it is the center of IT universe. Spending 3 days with…

Category: Data Science

2023 UPDATE: California Lottery – Scratchers – picking the winning tickets using math

Monty Hall Problem – How Randomness Rules Our World and Why We Cannot See It

Are People in Colder Countries Taller? in Julia

Life Expectancy by Country

Measuring user retention using cohort analysis with R

Heat map visualization of sick day trends in Finland with R, ggplot2 and Google Correlate

Informal notes from Strata 2012 conference on Big Data and Data Science