![basic data science questions basic data science questions](https://image.slidesharecdn.com/computersbasicmcqquestions3-140513110848-phpapp02/95/computers-basic-mcq-questions-3-2-638.jpg)
What is data normalization and why do we need it?ĭata normalization is a very important preprocessing step, which is used to rescale values to fit in a specific range to assure better convergence during backpropagation. So, a set of random variables are said to follow a normal distribution when most of the values cluster around the mean form a bell-shaped distribution. However, the data that is distributed around a central value without any bias to the left or right reaches normal distribution in the form of a bell-shaped curve. What is the Normal Distribution?ĭata is usually distributed in different ways, either with a bias to the left or to the right or it can all be jumbled up. Assigning a default value to the missing values, be it mean, minimum, or maximum value, into the data is important.If there are no patterns identified, then the missing values can be substituted with mean or median values (imputation), or they can simply be ignored.That can lead us to some conclusions about that locality. For example: In a survey about housing prices, some individuals in a certain locality have left a question blank. If there are patterns in the missing values, observing those patterns might lead to some meaningful insights.How do you treat missing values in any dataset? The most common ways to treat outlier values are first, to change the value and bring it within a range, and second, to exclude the values.All extreme values are not outlier values.If the number of outlier values is few, then they can be assessed individually but for a large number of outliers, the values can be substituted with either the 99th or the 1st percentile values.Outlier values can be identified using univariate or any other graphical analysis method.How can outliers be treated in any dataset? It might take up to 80% of the time just cleaning data, thus making it a critical part of the analysis task.Data Cleaning helps to increase the accuracy of the model in machine learning.Cleaning data from multiple sources helps to transform it into a format that data analysts or data scientists can work with.How does data cleaning play a vital role in any Data Science workflow? This post will briefly guide you through all the possible Data Science questions you might have. Now, what?ĭon't worry, we are here for you. Let's assume that you have the basic requirement to be a Data Scientist and you applied for various fresher jobs and internships, and you got a call back for an interview ( Congratulations! ).