What to do when we have mismatched training and validation set?

Deep learning algorithms require a huge amount of training data. This makes us put more and more labeled data into our training set even if it does not belong to the same distribution of data we are actually interested in. For example, let's say we are building a cat classifier for door camera devices. We… Continue reading What to do when we have mismatched training and validation set?

Hotdog or Not Hotdog – Image Classification in Python using fastai

not hot dog image classification fastai

Earlier, I was of the opinion that getting computers to recognize images requires - huge amount of data, carefully experimented neural network architectures and lots of coding. But, after taking the deep learning course - fast.ai, I found out that it is not always true. We can achieve a lot by writing just a few lines… Continue reading Hotdog or Not Hotdog – Image Classification in Python using fastai

Which activation function to use in neural networks?

Activation functions are an integral component in neural networks. There are a number of common activation functions. Due to which it often gets confusing as to which one is best suited for a particular task. In this blog post I will talk about, Why do we need activation functions in neural networks? Output layer activation… Continue reading Which activation function to use in neural networks?

Understanding Time Series Modelling and Forecasting, Part 2

time series modelling and forecasting arima

As promised, this is the second post on my two part blog series on time series modelling and forecasting. In my first blog post I discussed the basics of time series analysis and gave a theoretical overview. In case you missed it you can find it here - Understanding Time Series Modelling and Forecasting, Part 1 … Continue reading Understanding Time Series Modelling and Forecasting, Part 2

Understanding Time Series Modelling and Forecasting – Part 1

Time series modelling ARIMA

Time series forecasting is extensively used in numerous practical fields such as business, economics, finance, science and engineering. The main aim of a time series analysis is to forecast future values of a variable using its past values. In this post, I will give you a detailed introduction to time series modelling. This would be the… Continue reading Understanding Time Series Modelling and Forecasting – Part 1

Who was the lead character in Friends? The Data Science Answer

data analysis friends lead character

It has been more than 13 years since the last episode of Friends aired. But we never stop talking about it. Do we? I do not remember the last time I had a pizza without watching a random episode of Friends. Last night, I was watching one of my favorite episodes, "The One With Ross'… Continue reading Who was the lead character in Friends? The Data Science Answer

How to One Hot Encode Categorical Variables of a Large Dataset in Python?

categorical variable one hot encoding

In this post, I will discuss a very common problem that we face when dealing with a machine learning task - How to handle categorical data especially when the entire dataset is too large to fit in memory? I will talk about how to represent categorical variables, the common problems we face while one hot… Continue reading How to One Hot Encode Categorical Variables of a Large Dataset in Python?

Bootstrapping – A Powerful Resampling Method in Statistics

We are often interested in population parameters. For example, the mean salary of all adults in a country. But collecting data of the entire population is almost always infeasible. Therefore, we use samples of the population to get a point estimate of our parameter of interest. But, what is the 95% confidence interval of your… Continue reading Bootstrapping – A Powerful Resampling Method in Statistics