Q1. PYTHON OR R – WHICH ONE WOULD YOU PREFER FOR TEXT ANALYTICS?
We will prefer Python because of the following reasons:
• Python would be the best option because it has Pandas library that provides easy to use data
structures and high-performance data analysis tools.
• R is more suitable for machine learning than just text analysis.
• Python performs faster for all types of text analytics.
Q2. How does data cleaning play a vital role in the analysis?
Data cleaning can help in analysis because:
• Cleaning data from multiple sources helps transform it into a format that data analysts or data
scientists can work with.
• Data Cleaning helps increase the accuracy of the model in machine learning.
• It is a cumbersome process because as the number of data sources increases, the time taken to
clean the data increases exponentially due to the number of sources and the volume of data
generated by these sources.
• It might take up to 80% of the time for just cleaning data making it a critical part of the analysis
Q3. Differentiate between univariate, bivariate and multivariate analysis?
Univariate analyses are
For example, the pie charts of sales based on territory involve only one variable and can the analysis can be referred to as univariate analysis.
For example, analyzing the volume of sale and spending can be considered as an example of bivariate analysis.
Q4. Explain Star Schema?
It is a traditional database schema with a central table. Satellite tables map IDs to physical names or descriptions and can be connected to the central fact table using the ID fields; these tables are known as lookup tables and are principally useful in real-time applications, as they save a lot of memory. Sometimes
star schemas involve several layers of summarization to recover information faster.
Q5. What is Cluster Sampling?
Cluster sampling is a technique used when it becomes difficult to study the target population spread across a wide area and simple random sampling cannot be applied. Cluster Sample is a probability sample
where each sampling unit is a collection or cluster of elements.
For example, a researcher wants to survey the academic performance of high school students in Japan. He can divide the entire population of Japan into different clusters (cities). Then the researcher selects a number of clusters depending on his research through simple or systematic random sampling.
Q6. What is Systematic Sampling?
Systematic sampling is a statistical technique where elements are selected from an ordered sampling frame. In systematic sampling, the list is progressed in a circular manner so once you reach the end of the
list, it is progressed from the top again.The best example of systematic sampling is equal probability
Q7. What are Eigenvectors and Eigenvalues?
Eigenvectors are used for understanding linear transformations. In data analysis,
particular linear transformation acts by flipping, compressing or stretching.
we usually calculate the
eigenvectors for a correlation or covariance matrix. Eigenvectors are the directions along which a
Eigenvalue can be referred to as the strength of the transformation in the direction of eigenvector or the
factor by which the compression occurs.