Course Content
Orientation, introduction to the course
0/1
1. Human-Robot Interaction (HRI)
0/35
2. Research Methods in Human-Robot Interaction
0/27
3. Smart Cities & HRI
The demand for city living is already high, and it appears that this trend will continue. According to the United Nations World Cities Report, by 2050, more than 70% of the world's population will be living and working in cities — one of many reports predicting that cities will play an important role in our future (UN-Habitat, 2022). Thus, as cities are growing in size and scope, it is shaped into complex urban landscape where things, data, and people interact with each other. Everything and everyone has become so connected that Wifi too often fails to meet digital needs, online orders don't arrive fast enough, traffic jams still clog the roads and environmental pollution still weighs on cities. New technologies, technical intelligence, and robots can contribute to the direction of finding solutions to ever-increasing problems and assist the evolution of the growing urban space.
0/31
Human-Robot Interaction
About Lesson

Data Cleaning and Pre-processing

One of the essential steps after collecting data is to check for potential errors. This is particularly crucial for manually entered data, as people are prone to making mistakes. Although it is impossible to identify all errors, the goal is to minimize their negative impact by tracing as many errors as possible. Various methods can be used to identify errors, depending on the type of data collected. For instance, a reasonableness check can be conducted to spot errors such as entering an age of “333”, which can be corrected to “33”, but the same is not easy to do if the age is “232” (in this case we cannot be sue if is it 23, or 32, or 22). In some cases, multiple data fields may need to be examined to identify errors.

For automatically collected data, error checking usually focuses on time consistency issues or whether the performance falls within a reasonable range. In studies that collect data from multiple channels, it is crucial to ensure that data about the same participant is correctly grouped together.

When errors are identified, it is important to correct them and replace them with accurate data. However, this is not always possible, particularly in online studies or anonymous surveys. In such cases, problematic data items must be removed, and treated as missing values in statistical analysis.

Sometimes, data must be cleaned up due to inappropriate formatting. For example, in an online survey, participants may enter their age in various formats, such as numeric values or text descriptions. In such cases, the entries in text formats may need to be transformed into numeric values before statistical analysis can be performed.

Before conducting with the statistical analysis, it is common for researchers to code the original data collected. For example, while the age information is already numerical and does not require coding, other information such as gender, or previous software experience must be coded to be interpreted by statistical software. Typically, researchers use codes “0” and “1” for dichotomous variables, which are categorical variables with only two possible values. For example, we may use “1” representing “male” and “0” representing “female”, or code the previous software experience using “1” for “Yes” and 0 for “No”.

For further studying about data cleaning in SPSS watch this video:

References

Lazar, J. , Feng, J. H., Hochheiser, H.  (2017), Research methods in human-computer interaction: Morgan Kaufmann, 2017.

Dix, A. (2020). Statistics for HCI: Making Sense of Quantitative Data. Morgan & Claypool Publishers.
 
Robertson, J., & Kaptein, M. (2016). An introduction to modern statistical methods in HCI (pp. 1-14). Springer International Publishing.
 
Larson-Hall, J. (2015). A guide to doing statistics in second language research using SPSS and R. Routledge.
 
Aldrich, J. O. (2018). Using IBM SPSS statistics: An interactive hands-on approach. Sage Publications.