Understanding Data Types: A Guide for Novice Analysts

I have battled a little with why some of the data behaved differently during my analysis. It became clear that this was common to novice analysts. As a novice analyst, understanding different data types was crucial to effectively analyse and interpret data. The knowledge burned me, and I had to help others overcome this stage. Over the years I have helped many colleagues understand the different data types. In this article, I will explore the most common data types and highlight how best to handle each type. Additionally, I hope to assist you with the advantages and disadvantages of each data type to help you gain a solid foundation in data analysis. Especially considering that analysts are not always in control of the data that they get.

Common Data Types

There are common data types that I frequently analysed. This includes making analysis using (1) decimals data, (2) integers, (3) percentages, (4) date formatted data, (5) time formatted data, (6) date time (combining data and time), (7) text, (8) location, and (9) Boolean data – data that gives users two options such as yes/no, true/false, etcetera (They have stored the same way as Boolean, and we choose how we want this data presented). However, not all data is numerical or simply text-based. Before we could start with analysis, I would recommend that we define data types for each column (variable) so to optimise the amount of memory being used to store the data.

Various types could bring good insights into your analysis. It also depends on the choice of software being used or the needs of the management (Manco) or executive committee (Exco). The software will also be able to understand how to manipulate the values. Some software does a good job of defining data for us, like Excel. But Excel doesn’t get all the data types correct every time. This is because each data type behaves differently. I would like to discuss (1) Numeric data (decimals, integers, percentages, etc that could be continuous or discrete), (2) categorical data (ordinal and nominal), (3) text data, and (4) time series data.

Numeric Data

Numeric data represent quantitative values and are further categorized into continuous and discrete subtypes. These are continuous or discrete numeric data. I have tried to give an explanation below for your convenience.

Continuous Numeric Data: Continuous numeric data can take on any value within a range and allows for fractional values. It is data that shows where you are on the infinite range of numbers. Examples include temperature, height, and time. To handle continuous numeric data, statistical measures such as mean, median, and standard deviation can be employed. These statistical tools allow us to get a sense of the data. The advantages of using continuous numeric data include a high level of precision and detailed analysis. However, the disadvantage is that handling a large amount of continuous data can be computationally intensive.
Discrete Numeric Data: Discrete numeric data consists of whole numbers or countable values. Examples include the number of products sold, the number of website visits, or the number of employees in a company. Although you could use statistical measures to understand the data, the common techniques to handle this data include frequency distributions, histograms, and bar charts. It makes it easier to see and understand when these charts. Advantages include ease of interpretation and straightforward analysis. The disadvantage of using discrete data is that it may lack precision when compared to continuous data. This means you might not know whether selling ten apples is good or bad but continuous data would tell you that forty degrees is a higher temperature.

Categorical Data

Categorical data is used to show/represent distinct categories or groups. This type of data is divided into two subtypes namely nominal and ordinal. I have also given a little description below to aid with understanding.

Nominal Categorical Data: Nominal categorical data does not have any inherent order or ranking. You can rank it in the way that suits your analysis. Examples could be analysing data about gender, colours, or product categories. Neither gender is greater than the other nor colours. You are free to order as you prefer. I would advise that we handle nominal categorical data by creating frequency tables, bar charts, or pie charts. The advantages of this data type are their ease of interpretation and visual representation. However, the disadvantage is that nominal data cannot be subjected to mathematical operations such as addition or subtraction.
Ordinal Categorical Data: Ordinal categorical data has a predefined order or ranking. This means you must go through one before another, but it is not numerical all the time. Examples are comparing the educational levels of a group of individuals (e.g., primary, middle school, high school, university, professional), survey ratings (e.g., satisfaction levels), or product ratings. I’m assuming you have come across this type of data more often than not. I handle ordinal categorical data using techniques such as rank order and cumulative frequency. The advantages of ordinal data include the ability to establish relative differences between the data you are analysing. However, the magnitude of differences between categories may not be accurately represented and thus could be a disadvantage.

Text Data

Text data includes unstructured textual information such as customer reviews, social media posts, or survey responses. This is data that is becoming popular in the analysis since Exco discovered that customers could be represented using more than numerical data. Analysing text data often involves techniques like natural language processing (NLP), sentiment analysis, and text mining. The discovery of GPT4 has created waves in 2022 and 2023 for this reason. Businesses finally had access to a tool that could interpret customer needs better than data stored in SQL databases. It all depends on how text data will be used to empower both the marketing and sales teams/functions. The advantages of text data analysis include understanding customer sentiment, identifying trends, and extracting insights. The disadvantage of working with text data is that it can be challenging due to its complexity and ambiguity.

Time Series Data

Time series data represents observations collected over a sequence of equally spaced time intervals. These observations become statistically significant when collected over thirty observation points. We can thus use this to understand what happened historically and predict better what might happen next. Examples of time series data include stock market prices, weather data, or website traffic. Analysing time series data is exciting. I used techniques like trend analysis, seasonality detection, and forecasting. Seasonality detection and forecasting are available in the data analysis feature in Excel. I would encourage us to play around with this feature and increase our proficiency. The advantages of using time series data include the ability to identify patterns and make predictions. The disadvantage is that this data type can be influenced by various external factors, making it susceptible to outliers and anomalies.

Conclusion

Novice analysis might benefit from understanding the different types of data. The common data types are (1) decimals data, (2) integers, (3) percentages, (4) date formatted data, (5) time formatted data, (6) date time (combining data and time), (7) text, (8) location, and (9) Boolean data. I have thoroughly enjoyed sharing my understanding of (1) Numeric data (decimals, integers, percentages, etc that could be continuous or discrete), (2) categorical data (ordinal and nominal), (3) text data, and (4) time series data. I still believe that understanding different data types is essential for effective data analysis. By recognising the characteristics, advantages, and disadvantages of each data type, a novice analyst could choose appropriate techniques and tools to handle and interpret data accurately. Remember that combining multiple data types and employing advanced analytical methods can provide richer insights and support more informed decision-making.