Autor da secção: Laiton Hedley

Overview

When analysing data, it is common for data to come in in a somewhat messy format. It’s often necessary to «tidy» and transform our data so that it is in a form suitable for our analyses. This may involve computing sum scores for each partipant, excluding outliers, recoding responses, transforming response variables to different scales, aggregation (such as computing mean response time for each participant), etc.

When raw data is first brought or entered into jamovi, it is represented in columns called Data Variables. This data can be analysed as-is, but it’s also possible to alter, recode and transform this data using Computed Variables and Transformed Variables. The fourth and final sort of column is Filters, used for filtering via categories or excluding outliers.

Data Variables

Data variables are simply the columns and rows of data brought or entered into jamovi. They typically represent the «raw data» that we want to use as a basis for our analyses. Opening a .csv file, or entering values directly into the spreadsheet will result in these values being represented in data variables.

Measure and Data Types

In jamovi, variables have a Measure type and a Data type to indicate the type of data contained in the variable and how it should be treated in analyses.

Data types can be either:

  • Integer: whole numbers without decimal points

  • Decimal: numbers with decimal points

  • Text: text, words, or «strings» that make up words or categories.

Note that jamovi does not support date and time data types at this time, however dates and times can be represented as text. Some basic date manipulations are described in Common Data Recipes.

Measure types are used to indicate the level of measurement.

  • Integer type icon Continuous: numbers where a continuum of values is meaningful, such as height, weight, counts.

  • Integer type icon Ordinal: discrete values where order is meaningful, i.e. “Strongly agree”, “Agree” … “< 18”, “18-30”, …

  • Integer type icon Nominal: discrete values with no meaningful order, i.e. Male, Female, … “Smoker”, “Non-smoker”, …

  • Integer type icon ID: unique identifiers for each row/participant.

Note that the levels of measurement don’t align perfectly with these types. jamovi follows a convention of combining ratio and interval into a single measure type Continuous. jamovi also provides an ID measure type for unique identifiers that are usually not analysed (i.e. Surname).

When data is first brought in, jamovi automatically assigns data types and measure types based on the form the data takes. Most of the time this automatic assignment will be correct, but there may be scenarios where the data type or measure type need to manually adjusted.

Note that some combinations of data type and measure type are not possible (e.g., a variable cannot be both text and continuous, or decimal and nominal).

To change the data type and measure type of a variable, double click on the column header to open the variable editor. Here the variable’s name, description, data type, and measure type can be seen and modified. Click on the measure type a drop down menu will appear with the options to change the measure type. To change the data type, click on the data type and select the desired data type from the drop down menu. For example, below is the variable editor for a variable called Score (representing some continuous percentage of performance) which incorrectly shows as Nominal data type with an Ordinal measure type:

The data editor in jamovi where you can edit your data and create new variables.

To change this to continuous, click on the measure type and select «Continuous» from the drop-down menu. Similarly, to change the data type to decimal, click on the data type and select «Decimal» from the drop-down menu.

The data editor in jamovi where you can edit your data and create new variables.

Now jamovi treats this variable as a continuous variable. This is important for performing analyses that require continuous data (such as a correlation or regression) or if we wish to perform mathematical operations on the variable (such as computing a z-score).

Missing Values

Often data will contain missing values, jamovi provides a number of options to handle missing values. In jamovi, missing values are represented as blank cells in the data view.

Some data sets use special values to encode missing values, for example, they may use the value -99 or 9999. This data, left as-is, will be treated by analyses as actual values. In order for these to be treated as missing values, it’s necessary to instruct jamovi to treat these values as missing.

With variable editor open, click on Missing values and a window will appear, click the Add Missing Value button. Now, it is possible to specify when the source or rather the variable of interest has a value exactly, equal to, less than, or greater than a value, it should be treated as a missing value.

For example, the formula of == 9999 shown below means that any values equal to 9999 will be treated as missing values.

The data editor in jamovi where you can edit your data and specify missing values.

Adding Labels to Levels

For variables with a nominal or ordinal measure type, it is possible to add labels to the levels of the variable: Below is a table of data with a variable called «Group» which has a nominal measure type and integer data type.

Table 1 Example of a Variable with Unlabeled Levels

ID

Group

1

1

2

2

3

1

4

2

The levels of 1 and 2 indicate whether the participant is a smoker or not, but these values are not very informative: Through the data variable editor levels 1 and 2 are presented for this variable:

The data editor in jamovi where you can recode your variables using the recode option.

By clicking on the 1 and 2 in our variable editor, meaningful labels can be assigned to these levels - such as «Smoker» and «Non-smoker», respectively:

The data editor in jamovi where you can recode your variables using the recode option.

Allowing us to have a more informative variable with the following data:

Table 2 Example of a Variable with Labeled Levels

ID

Group

1

Smoker

2

Non-smoker

3

Smoker

4

Non-smoker

Now, when performing analyses, the labels «Smoker» and «Non-smoker» will be used instead of the values 1 and 2, making the results more informative and easier to interpret.

Reordering Levels

For Ordinal Data (and at times Nominal Data), we may wish to «reorder» the levels in which the data appear in Figures and Tables.

Below is an example table of descriptive data computed via jamovi to begin inferring differences within a data set.

Table 3 Example Descriptives Table with Levels Unordered

Task Difficulty

Mean

Standard Deviation

Low

85.1

9.5

High

54.3

9.1

Medium

73.6

9.3

Each row represents one of three levels of Task Difficulty (Low, Medium, High) in a Memory Task, with columns to indicate the Mean and Standard deviation. However, the order in which the rows are presented is unintuitive, the High condition is in the middle and the natural order is interrupted.

Jamovi can allow for the reordering of variables, so in all future Tables and Figures the levels appear in a more intuitive order; Low, Medium, then High. To do this, double click on variable of interest to open the variable editor. Here, the Levels (such as the Low, High, and Medium) for a Variable (like Task Difficulty) can be seen:

The data editor in jamovi where you can reorder the levels in your variable to be more intuitive - currently they appear in an unusual order: Low, High, Medium.

Click on the level to reorder and use the arrow keys to move the level to an appropriate location in the heirarchy (the level at the top will be positoned at the left most or top position and bottom level will be position in the right most or lowest position for all future tables and figures):

The data editor in jamovi where you can reorder the levels in your variable to be more intuitive: Low, Medium, High.

The descriptive table (and any following figures) will present the levels in the same order as indicated in the variable editor. For example, the table below now displays each level of Task Difficulty from Low to High in an intuitive fashion.

Table 4 Example Descriptives Table with Levels Reordered

Task Difficulty

Mean

Standard Deviation

Low

85.1

9.5

Medium

73.6

9.3

High

54.3

9.1

The natural order of the conditions is now preserved, moving down each row it appears in the order of Low, Medium, then High, making the table easier to read and interpret.

Retain Unused Levels in Analyses

Some categorical variables have valid levels that do not appear in the current data set. These unused levels may still be meaningful and can be retained for analysis or reporting. For example, a survey may ask adult and child moviegoers whether they enjoyed a film, with responses of Yes or No. If all participants answer Yes, the No level is absent from the data (even though it remains a valid response option):

Table 5 Example of a Data with an Unused Level

Age Group:

Enjoyed Film (Y/N):

Adult

Yes

Adult

Yes

Adult

Yes

Child

Yes

Child

Yes

Child

Yes

If a contingency table was used to examine the relationship between Age Group and Film Enjoyment, there is a problem because «No» responses are absent from the data. Without this in our data the contingency table would not be able to represent the missing («No») level of the variable.

To ensure this level is retained in analyses, first the unused level (e.g., «No») must be added to the variable editor for the variable of interest (e.g., Enjoyed Film (Y/N)) by double clicking in the heading of the variable or clicking a cell of the variable the variable editor will appear and with a plus sign towards the bottom of the variable editor:

The data editor in jamovi where to add unused levels to be retained in analyses, even if they are not present in the data.

Click the plus sign to add the unused level (of «No») in the variable editor and entering the text for the unused level:

The data editor in jamovi where to add the text for the unused levels to be retained in analyses, even if they are not present in the data.

The unused level (of «No») is included as a level for the variable of interest (Enjoyed Film (Y/N)), but it is not yet retained in analyses. To retain this level in analyses, click the toggle on button next to the text: «Retain unused levels in analyses» option:

The data editor in jamovi where to add the text for the unused levels to be retained in analyses, even if they are not present in the data.

Now, when performing analyses, the «No» level will be retained in analyses and represented in tables and figures, even though there is an absence of «No» responses in the data.