Practical 4

Lecture

Lab

Data Visualization

For the practical, you can download this script.

You can also use this cheatsheet.

Introduction

Data visualization is a crucial skill in data science and analytics. In this tutorial, we will explore how to create various types of visualizations in R using the ggplot2 package, a powerful tool for creating informative and aesthetically pleasing graphics.

Installing and Loading Required Packages

Before we begin, ensure you have the necessary packages installed.

library(dplyr)
library(tidyverse)
library(ggplot2)
library(ggridges)
library(datasauRus)
library(colorspace)
library(pheatmap)
library(survival)
library(survminer)
library(qqman)
library(plotly)
library(ggridges)
library(gganimate)
library(gifski)
library(palmerpenguins)
library(gapminder)

Module 1: Brief Review of Basic Visualization in R

Let’s start with the simple built-in penguins data set to get familiar with the structure of ggplot2.

data(penguins)
head(penguins)
penguins <- penguins %>% drop_na()

# Basic scatter plot
ggplot(penguins) +
  geom_point(aes(x = bill_length_mm, 
                 y = bill_depth_mm))

For this tutorial, we will mostly use the built-in gapminder dataset, which contains excerpt of the Gapminder data on life expectancy, GDP per capita, and population by country.

Let’s explore the gapminder dataset.

data(gapminder) # load in dataset
head(gapminder, 5) # check the first 5 rows

Let’s use the gapminder dataset to build different useful plots.

1.1 bar plot

Bar plot is used to visually compare categorical data by representing values with rectangular bars.

# bar plot: discrete variable
ggplot(gapminder, aes(x = continent)) +
  geom_bar()

1.2 histogram/ frequency/ density plot

A histogram, frequency plot, and density plot all visualize the distribution of a dataset, but they differ in how they represent data: a histogram groups data into bins and shows counts, a frequency plot connects points representing counts for each bin, and a density plot estimates the probability density function, providing a smooth representation of distribution shape.

# histogram plot: continuous variable
ggplot(gapminder, aes(x = lifeExp, fill = continent)) +
  geom_histogram(binwidth = 1)

# frequency plot
ggplot(gapminder, aes(x = lifeExp, color = continent)) +
  geom_freqpoly()

# density plot (smooth histogram = density plot)
ggplot(gapminder, aes(x = lifeExp, color = continent)) +
  geom_density()

1.3 box plot

A box plot is used to summarize the distribution of a dataset by displaying its median, quartiles, and potential outliers, helping to identify variability and skewness.

# box plot: one discrete variable + one continuous variable
ggplot(gapminder, aes(x = year, y = lifeExp)) +
  geom_boxplot(aes(group = year))

1.4 violin plot

A violin plot combines aspects of a box plot and a density plot to show the distribution, probability density, and variability of a dataset, making it easier to identify patterns and multimodal distributions.

# violin plot
ggplot(gapminder, aes(x = continent, y = lifeExp)) +
  geom_violin(trim = FALSE, alpha = 0.5) +
  stat_summary(
    fun = mean,
    fun.max = function(x) {
      mean(x) + sd(x)
    },
    fun.min = function(x) {
      mean(x) - sd(x)
    },
    geom = "pointrange"
  )

1.5 Example 1

Let’s see an example of how to improve a data visualization plot step by step. We used the built-in lincoln_weather dataset to investigate the weather distribution of Lincoln. Lincoln weather dataset containing weather information from Lincoln, Nebraska, from 2016.

1.5.1 Create the dataset we will use in this example

lincoln_df <- lincoln_weather %>%
  mutate(
    month_short = fct_recode(
      Month,
      Jan = "January",
      Feb = "February",
      Mar = "March",
      Apr = "April",
      May = "May",
      Jun = "June",
      Jul = "July",
      Aug = "August",
      Sep = "September",
      Oct = "October",
      Nov = "November",
      Dec = "December"
    )
  ) %>%
  mutate(month_short = fct_rev(month_short)) %>%
  select(Month, month_short, 'Mean Temperature [F]')

head(lincoln_df)

1.5.2 What’s the problem of the following plot?

# points-errorbars
lincoln_errbar <- lincoln_df %>%
  ggplot(aes(x = month_short, y = `Mean Temperature [F]`)) +
  stat_summary(
    fun = mean, 
    fun.max = function(x) {
      mean(x) + 2 * sd(x)
    },
    fun.min = function(x) {
      mean(x) - 2 * sd(x)
    }, 
    geom = "pointrange"
  ) +
  xlab("month") +
  ylab("mean temperature (°F)") +
  theme_classic(base_size = 14) +
  theme(
    axis.text = element_text(color = "black", size = 12),
    plot.margin = margin(3, 7, 3, 1.5)
  )

lincoln_errbar

1.5.3 Use box/violin plot to show the distribution

# box plot
lincoln_box <- lincoln_df %>%
  ggplot(aes(x = month_short, y = `Mean Temperature [F]`)) +
  geom_boxplot(fill = "grey90") +
  xlab("month") +
  ylab("mean temperature (°F)") +
  theme_classic(base_size = 14) +
  theme(
    axis.text = element_text(color = "black", size = 12),
    plot.margin = margin(3, 7, 3, 1.5)
  )

lincoln_box

# violin plot
lincoln_violin <- lincoln_df %>%
  ggplot(aes(x = month_short, y = `Mean Temperature [F]`)) +
  geom_violin(fill = "grey90") +
  xlab("month") +
  ylab("mean temperature (°F)") +
  theme_classic(base_size = 14) +
  theme(
    axis.text = element_text(color = "black", size = 12),
    plot.margin = margin(3, 7, 3, 1.5)
  )

lincoln_violin

1.5.4 Use points to show the original data

# points
lincoln_points <- lincoln_df %>%
  ggplot(aes(x = month_short, y = `Mean Temperature [F]`)) +
  geom_point(size = 0.75) +
  xlab("month") +
  ylab("mean temperature (°F)") +
  theme_classic(base_size = 14) +
  theme(
    axis.text = element_text(color = "black", size = 12),
    plot.margin = margin(3, 7, 3, 1.5)
  )

lincoln_points

# jitter points
lincoln_jitter <- lincoln_df %>%
  ggplot(aes(x = month_short, y = `Mean Temperature [F]`)) +
  geom_point(position = position_jitter(width = .15, height = 0, seed = 320), size = 0.75) +
  xlab("month") +
  ylab("mean temperature (°F)") +
  theme_classic(base_size = 14) +
  theme(
    axis.text = element_text(
      color = "black",
      size = 12
    ),
    plot.margin = margin(3, 7, 3, 1.5)
  )

lincoln_jitter

1.5.5 Use sina plot to combine violin plot and jitter points

# sina plot
lincoln_sina <- lincoln_df %>%
  ggplot(aes(x = month_short, y = `Mean Temperature [F]`)) +
  geom_violin(color = "transparent", fill = "gray90") +
  # dviz.supp::stat_sina(size = 0.85) +
  geom_jitter(width = 0.25, size = 0.85) +
  xlab("month") +
  ylab("mean temperature (°F)") +
  theme_classic(base_size = 14) +
  theme(
    axis.text = element_text(
      color = "black",
      size = 12
    ),
    plot.margin = margin(3, 7, 3, 1.5)
  )

lincoln_sina

Module 2: Advanced Customization and Data Transformation

2.1 Data Transformations and Faceting

In data visualization, data transformation is a crucial step to ensure that the data is in the right format and structure for effective visual representation.

2.1.1 Log Transformation

There are many different data transformation methods, and the following is an example of log transformation.

We would like to investigate what is the relationship among GDP per capita, life expectancy, and population across different countries.

# Step 1: calculate the mean of lifeExp, gdpPercap, pop in different countries
newgapminder <- gapminder %>% 
  group_by(continent, country) %>% 
  summarise(
    across(c(lifeExp, gdpPercap, pop), mean)
  )
newgapminder

# Step 2: draw the plot
newgapminder %>% 
  ggplot(aes(x = gdpPercap, y = lifeExp)) +
    geom_point(aes(color = continent, size = pop))

# Step 3: log transformation of x-axis(gdpPercap)
newgapminder %>% 
  ggplot(aes(x = gdpPercap, y = lifeExp)) +
    geom_point(aes(color = continent, size = pop)) +
    scale_x_log10()

# Step 4 (Optional): add dollar format or unit format to label the axis for better interpretation
newgapminder %>% 
  ggplot(aes(x = gdpPercap, y = lifeExp)) +
    geom_point(aes(color = continent, size = pop)) +
    scale_x_log10(breaks = c(500, 1000, 3000, 10000, 30000),
                  labels = scales::dollar)

newgapminder %>% 
  ggplot(aes(x = gdpPercap, y = lifeExp)) +
    geom_point(aes(color = continent, size = pop)) +
    scale_x_log10(
      name = "GDP per capita",
      breaks = c(500, 1000, 3000, 10000, 30000),
      labels = scales::unit_format(unit = "dollar"))

2.1.2 Faceting

Data faceting is used in data visualization to break down data into subsets and display them across multiple smaller plots. This technique helps in revealing patterns, trends, and relationships that might be obscured in a single, aggregated plot. The following is an example of data faceting.

# facet_grid()：create a grid of graphs, by rows and columns
gapminder %>%
  ggplot(aes(x = lifeExp)) +
  geom_density() +
  facet_grid(. ~ continent)

gapminder %>%
  ggplot(aes(x = lifeExp)) +
  geom_histogram() +
  facet_grid(continent ~ .)

# facet_wrap()：create small multiples by “wrapping” a series of plots
gapminder %>%
  ggplot(aes(x = gdpPercap, y = lifeExp)) +
  geom_point(show.legend = FALSE) +
  scale_x_log10() +
  facet_wrap(~continent)

2.2 Advanced Aesthetic Customization - color

Color plays a crucial role in data visualization. We will explore how to use ColorSpace to create plots tailored to different data types.

Show all available color in colorspace.

hcl_palettes(plot = TRUE)

hcl_palettes("qualitative", plot = TRUE)
hcl_palettes("sequential (single-hue)", n = 7, plot = TRUE)
hcl_palettes("sequential (multi-hue)", n = 7, plot = TRUE)
hcl_palettes("diverging", n = 7, plot = TRUE)

2.2.1 qualitative colors

Qualitative (iris dataset) : create a scatter plot of Sepal.Length vs. Sepal.Width, coloring points by Species.

# qualitative colors (Categorical Data)
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
  geom_point(size = 3) +
  scale_color_discrete_qualitative(palette = "Dark 3") +  # Using discrete qualitative color scale
  ggtitle("Qualitative Colors Example: Iris Species")

2.2.2 Sequential(single-hue) colors

Sequential single-hue (mtcars dataset): create a bar plot of car models (rownames(mtcars)) based on their miles per gallon (mpg), using a single-hue sequential color gradient.

# sequential (single-hue) colors (Ordered Data)
# Order data by mpg
mtcars$car <- rownames(mtcars)
mtcars <- mtcars[order(mtcars$mpg), ]
# Create a bar plot
ggplot(mtcars, aes(x = reorder(car, mpg), y = mpg, fill = mpg)) +
  geom_bar(stat = "identity") +
  scale_fill_continuous_sequential(palette = "Blues 3") +  # Sequential color for fill
  coord_flip() + # Flip for better readability
  ggtitle("Sequential (Single-Hue) Colors Example: MPG in mtcars") +
  theme_minimal()

2.2.3 Sequential (multi-hue) colors

Sequential multi-hue (volcano dataset): The volcano heatmap provides a 2D topographic view of elevation data using color intensity. It highlights how terrain elevation changes over space.

# sequential (multi-hue) colors (Ordered Data)
# Convert volcano dataset to long format for ggplot
volcano_df <- as.data.frame(as.table(volcano))
colnames(volcano_df) <- c("x", "y", "elevation")
# Plot the heatmap
ggplot(volcano_df, aes(x = x, y = y, fill = elevation)) +
  geom_tile() +
  scale_fill_continuous_sequential(palette = "YlOrRd") +  # Sequential multi-hue color scale
  ggtitle("Sequential (Multi-Hue) Colors Example: Volcano Elevation") +
  xlab("X Coordinate") +  # Add x-axis label
  ylab("Y Coordinate") +  # Add y-axis label
  theme_minimal() +
  theme(
    axis.text = element_blank(),   # Remove axis text
    axis.ticks = element_blank()   # Remove axis ticks
  )

2.2.4 Diverging colors

Diverging (USArrests dataset): map UrbanPop deviation from the median across states using a diverging color scale. The goal is to understand how each state’s urban population percentage compares to the median urban population percentage. A positive value means the state’s urban population is above the median, while a negative value means it is below the median.

# diverging colors (USArrest Dataset)
# Compute deviation from median
USArrests$State <- rownames(USArrests)
USArrests$UrbanPop_dev <- USArrests$UrbanPop - median(USArrests$UrbanPop)
# Create a bar plot
ggplot(USArrests, aes(x = reorder(State, UrbanPop_dev), y = UrbanPop_dev, fill = UrbanPop_dev)) +
  geom_bar(stat = "identity") +
  scale_fill_continuous_divergingx(palette = "RdBu") +
  coord_flip() +
  ggtitle("Diverging Colors Example: UrbanPop Deviation in USArrests") +
  theme_minimal()

2.3 Example 2

Compare the before-and-after heatmaps below to see how color choice enhance readability and perceptual accuracy in data visualization.

2.3.1 Before

Use the default color to create the heatmap.

# before
gapminder %>% 
  group_by(continent, year) %>%
  summarise(mean_lifeExp = mean(lifeExp)) %>%
  ggplot(aes(x = year, y = continent, fill = mean_lifeExp)) +
  geom_tile() +
  theme_classic()

2.3.2 After

Use the Sequential color to show the heatmap.

# color choice 1 (Viridis)
gapminder %>% 
  group_by(continent, year) %>%
  summarise(mean_lifeExp = mean(lifeExp)) %>%
  ggplot(aes(x = year, y = continent, fill = mean_lifeExp)) +
  geom_tile() +
  theme_classic() +
  scale_fill_continuous_sequential(palette = "Viridis", rev = FALSE)

# color choice 2 (Inferno)
gapminder %>% 
  group_by(continent, year) %>%
  summarise(mean_lifeExp = mean(lifeExp)) %>%
  ggplot(aes(x = year, y = continent, fill = mean_lifeExp)) +
  geom_tile() +
  theme_classic() +
  scale_fill_continuous_sequential(palette = "Inferno", begin = 0.15, rev = FALSE)

2.4 Size & Transparency

Explore how to adjust the size and transparency of the points to enhance readability.

ggplot(penguins) +
  geom_point(aes(x = bill_length_mm, 
                 y = bill_depth_mm, 
                 color = species), 
             size=2, 
             alpha = 0.4)

ggplot(penguins) +
  geom_point(aes(x = bill_length_mm, 
                 y = bill_depth_mm, 
                 color = species), 
             size=4, 
             alpha = 0.8)

2.5 Legend

The legend plays a crucial role in interpreting the plot. Below, we will explore how to adjust its appearance, create different legends for different layers, and learn how to delete or merge legends effectively.

ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
  geom_point(size = 1, alpha = 0.4)

2.5.1 Adjust the appearance of the legend

# Adjust the appearance of the legend
ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
  geom_point(size = 1, alpha = 0.4) +
  guides(color = guide_legend(override.aes = list(size = 3,alpha=1)))

2.5.2 Generate different legends for different layers

# Generate different legends for different layers
ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
  geom_point(size  = 3) +
  geom_smooth(method = "lm", se = FALSE)

ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
  geom_point(aes(alpha = "Observed"), size = 3) +
  geom_smooth(method = "lm", se = FALSE, aes(alpha = "Fitted")) +
  scale_alpha_manual(
    name = NULL,
    values = c(1, 1),
    breaks = c("Observed", "Fitted")
  ) +
  guides(alpha = guide_legend(override.aes = list(
    linetype = c(0, 1),
    shape = c(16, NA),
    color = "black"
  )))

2.5.3 Delete legend

# delete legend
penguins %>%
  ggplot(aes(x = bill_length_mm, y = bill_depth_mm, color = species, size = body_mass_g)) +
  geom_point()

penguins %>%
  ggplot(aes(x = bill_length_mm, y = bill_depth_mm, color = species, size = body_mass_g)) +
  geom_point() +
  guides(
    color = guide_legend("species"),      # keep
    size = "none"                      # remove
  )

2.5.4 Merge legend

# merge legend
penguins %>% 
  ggplot(
    aes(x = bill_length_mm, y = bill_depth_mm, 
      color = body_mass_g, size = body_mass_g)
  ) +
  geom_point(alpha = 0.6) +
  scale_color_viridis_c()

# method 1
penguins %>% 
  ggplot(
    aes(x = bill_length_mm, y = bill_depth_mm, 
      color = body_mass_g, size = body_mass_g)
  ) +
  geom_point(alpha = 0.6) +
  scale_color_viridis_c() +
  guides(
    color = guide_legend(), 
    size = guide_legend()
  )

# method 2
penguins %>% 
  ggplot(
    aes(x = bill_length_mm, y = bill_depth_mm, 
      color = body_mass_g, size = body_mass_g)
  ) +
  geom_point(alpha = 0.6) +
  scale_color_viridis_c(guide = "legend")

2.6 Theme

Explore how to change the theme of the plots.

df <- data.frame(x = 1:3, y = 1:3)
base <- ggplot(df, aes(x, y)) + geom_point()
# default theme
base + theme_grey() + ggtitle("theme_grey()")
# other built-in themes in ggplot2
base + theme_bw() + ggtitle("theme_bw()")
base + theme_linedraw() + ggtitle("theme_linedraw()")
base + theme_light() + ggtitle("theme_light()")
base + theme_dark() + ggtitle("theme_dark()")
base + theme_minimal()  + ggtitle("theme_minimal()")
base + theme_classic() + ggtitle("theme_classic()")
base + theme_void() + ggtitle("theme_void()")

2.7 Exercise 1

Improve the plot based on the following requirements.

Improving the axes and legend labels
Adding a title for the plot
Tweaking the color scale to your favorite palette
The background should be white, not pale grey
Major gridlines should be a pale grey and minor gridlines should be removed
The plot title should be centered in 12pt bold text

ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
  geom_point()

The answer is as follows. Do not open the answer code block before completing this exercise.

# answer

# 1.Improving the axes and legend labels

# 2.Adding a title for the plot

# 3.Tweaking the color scale to your favorite palette

# 4.The background should be white, not pale grey

# 5.Major gridlines should be a pale grey and minor gridlines should be removed

# 6.The plot title should be centered in 12pt bold text

ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
  geom_point() +
  labs(
    title = "Penguin Bill Length vs. Depth",  # Add title
    x = "Bill Length (mm)",  # Improve x-axis label
    y = "Bill Depth (mm)",   # Improve y-axis label
    color = "Species"        # Improve legend label
  ) +
  scale_color_discrete_qualitative(palette = "Dark 3") + # Choose a favorite color palette
  theme_minimal(base_size = 12) +  # Set white background and base size
  theme(
    legend.justification = c(0.5, 0.5),  # Adjust legend positioning
    legend.title = element_text(size = 10),  # Set legend title font size
    legend.text = element_text(size = 9),   # Set legend text font size
    plot.title = element_text(size = 12, face = "bold", hjust = 0.5),  # Title centered in 12pt bold
    panel.grid.major = element_line(color = "grey90"),  # Pale grey major gridlines
    panel.grid.minor = element_blank(),  # Remove minor gridlines
    panel.background = element_blank()  # Set background to white
  )

Module 3. Advanced Visualization Techniques

Next, we will explore some specialized plots and learn which types of data are best suited for each.

3.1 Specialized plot - heatmaps

# base R: heatmap() function
data <- matrix(rnorm(100), nrow = 10)
rownames(data) <- paste("Row", 1:10)
colnames(data) <- paste("Col", 1:10)
heatmap(data)

# ggplot2 with geom_tile()
data <- expand.grid(Row = LETTERS[1:5], Column = LETTERS[6:10])
data$Value <- runif(nrow(data), 1, 100)
ggplot(data, aes(x = Column, y = Row, fill = Value)) +
  geom_tile() +
  scale_fill_continuous_sequential(palette = "Inferno", rev = FALSE) +
  labs(title = "Heatmap using ggplot2", x = "Columns", y = "Rows", fill = "Value") +
  theme_minimal()

# pheatmap for annotated heatmaps
data <- matrix(rnorm(100), nrow = 10)
rownames(data) <- paste("Gene", 1:10)
colnames(data) <- paste("Sample", 1:10)
pheatmap(data, color = colorRampPalette(c("blue", "white", "red"))(50))

3.2 Specialized plot - KM Curve

# Load the lung dataset
data("lung")

# Create a survival object
surv_object <- Surv(time = lung$time, event = lung$status)

# Fit a Kaplan-Meier survival curve by group
km_fit_group <- survfit(surv_object ~ sex, data = lung)

# Plot the Kaplan-Meier curves for groups
ggsurvplot(
  km_fit_group,
  data = lung,
  conf.int = TRUE,
  pval = TRUE,              # Add p-value from log-rank test
  xlab = "Time (days)",
  ylab = "Survival Probability",
  title = "Kaplan-Meier Curve by Sex",
  palette = c("blue", "red"), # Different colors for groups
  legend.labs = c("Male", "Female"), # Custom legend labels
  ggtheme = theme_minimal()
)

3.3 Specialized plot - Manhattan plot

# Load the geuvadis dataset
data("gwasResults")

# Inspect the first few rows
head(gwasResults)

# Structure of the data
# SNP: SNP identifier
# CHR: Chromosome
# BP: Base-pair position
# P: P-value

manhattan(
  gwasResults,
  chr = "CHR",       # Chromosome column
  bp = "BP",         # Base-pair position column
  p = "P",           # P-value column
  snp = "SNP",       # SNP column
  genomewideline = -log10(5e-8),  # Genome-wide significance threshold
  suggestiveline = -log10(1e-5),  # Suggestive significance threshold
  main = "Manhattan Plot", # Title
  annotatePval = 0.01
)

3.4 Specialized plot - 2d density plot

# Scatter plot
ggplot(diamonds, aes(x = carat, y = price)) +
  geom_point() +
  theme_minimal() +
  labs(title = "Simple 2D Density Plot",
       x = "Carat",
       y = "Price")

# 2D density plot
ggplot(diamonds, aes(x = carat, y = price)) +
  geom_bin2d(bins = 100) +
  scale_fill_continuous(type = "viridis") +
  theme_minimal() +
  labs(title = "Simple 2D Density Plot",
       x = "Carat",
       y = "Price")

3.5 Interactive plot - plotly

Interactive plots allow users to explore data more deeply through actions like clicking, zooming, or hovering, providing richer insights and making the data more understandable and engaging. Next, we will explore how to make plots interactive.

3.5.1 ggplotly

p <- ggplot(penguins) +
  geom_point(aes(x = bill_length_mm, 
                 y = bill_depth_mm, 
                 color = species))

ggplotly(p)

3.5.2 Customize tooltip

What if we are interested in which island the penguin lives or any other information?

penguins <- penguins %>%
  mutate(text = paste(
    "Island: ", island,
    "\nSpecies: ", species,
    "\nBill Length (mm): ", bill_length_mm,
    "\nBill Depth (mm): ", bill_depth_mm,
    "\nFlipper Length (mm): ",flipper_length_mm,
    "\nBody Mass (g): ", body_mass_g, sep=""
  ))
p <- ggplot(penguins) +
  geom_point(aes(x = bill_length_mm, 
                 y = bill_depth_mm, 
                 color = species,
                 text = text)) +
  theme(legend.position = "none")
ggplotly(p, tooltip="text")

3.5.3 More advanced customization

Adjust the tooltip based on the following requests.

Use bold text to make the tooltip more readable
Add a title to the tooltip
Change the background color of the tooltip to white
Change the font color of the tooltip to black

penguins <- penguins %>%
  mutate(text = paste(
    "<span style='font-size:16px;'><b>", species, "</b></span><br>",
    "<br><b>Island:</b> ", island,
    "<br><b>Bill Length (mm):</b> ", bill_length_mm,
    "<br><b>Bill Depth (mm):</b> ", bill_depth_mm,
    "<br><b>Flipper Length (mm):</b> ",flipper_length_mm,
    "<br><b>Body Mass (g):</b> ", body_mass_g, 
    sep=""
  ))
p <- ggplot(penguins) +
  geom_point(aes(x = bill_length_mm, 
                 y = bill_depth_mm, 
                 color = species,
                 text = text)) +
  theme(legend.position = "none")
pp <- ggplotly(p, tooltip="text")
pp <- pp %>%
  layout(
    hoverlabel = list(bgcolor = "white",
                      font = list(size = 12, color = "black"))
  )
pp

3.6 Animated plot - gganimate

Using gganimate to create animated plots helps visualize changes over time, making it easier to understand dynamic patterns and trends in data. It adds an extra layer of clarity by showing how variables evolve, which can be especially useful for time-series data.

# Create the plot
plot <- ggplot(gapminder, aes(gdpPercap, lifeExp, size = pop, colour = country)) +
  geom_point(alpha = 0.7, show.legend = FALSE) +
  scale_colour_manual(values = country_colors) +
  scale_size(range = c(2, 12)) +
  scale_x_log10() +
  facet_wrap(~continent) +
  # Here comes the gganimate specific bits
  labs(title = 'Year: {frame_time}', x = 'GDP per capita', y = 'life expectancy') +
  transition_time(year) +
  ease_aes('linear')

# Save at gif:
animate(plot, width = 500, height = 357, renderer = gifski_renderer())
anim_save("output.gif")

3.7 Shiny app

For this part of the practical, we will use 3 examples of Shiny App.

You can also use this cheatsheet.

3.7.1 Demo 1

You can download the .R file of the demo 1 of Shiny App here.

Here is the code for the Demo 1:

library(shiny)


ui <- fluidPage(
  # Define UI components 
  
  sidebarLayout(
    
    sidebarPanel(
      "This is the sidebar",
      
      # textbox that accepts both numeric and alphanumeric input
      textInput(inputId = "title", label="Text box title", value = "Text box content"),
      
      # textbox that only accepts numeric data between 1 and 30
      numericInput(inputId = "num", label = "Number of cars to show", value = 10, min = 1, max = 30),
      
      # slider that goes from 35 to 42 degrees with increments of 0.1
      sliderInput(inputId = "temperature", label = "Body temperature", min = 35, max = 42, value = 37.5, step = 0.1),
      
      # slider that allows the user to set a range (rather than a single value)
      sliderInput(inputId = "price", label = "Price (€)", value = c(39, 69), min = 0, max = 99),
      
      # input field that allows for a single selection
      radioButtons(inputId = "radio", label = "Choose your preferred time slot", choices = c("09:00 - 09:30", "09:30 - 10:00", "10:00 - 10:30", "10:30 - 11:00", "11:00 - 11:30"), selected = "10:00 - 10:30"),
      
      # a dropdown menu is useful when you have plenty of options and you don't want to list them all below one another
      selectInput(inputId = "major", label = "Major", choices = c("Business Administration", "Data Science", "Econometrics & Operations Research", "Economics", "Liberal Arts", "Industrial Engineering", "Marketing Management", "Marketing Analytics", "Psychology"),
                  selected = "Marketing Analytics"),
      
      # dropdown menu that allows for multiple selections (e.g., both R and JavaScript)
      selectInput(inputId = "programming_language", label = "Programming Languages",
                  choices = c("HTML", "CSS", "JavaScript", "Python", "R", "Stata"),
                  selected = "R", multiple = TRUE),
      
      # chekcbox: often used to let the user confirm their agreement
      checkboxInput(inputId = "agree", label = "I agree to the terms and conditions", value=TRUE)
      
    ),
    
    mainPanel(
      "Main panel goes here",
      
      tabsetPanel(
        tabPanel(title = "tab 1",
                 h1("Overview"),
                 "Content goes here"),
        tabPanel(title = "tab 2", "The content of the second tab"),
        tabPanel(title = "tab 3", "The content of the third tab")
      )
    )
  )
  
  
)

server <- function(input, output){
  # Implement server logic here
}

shinyApp(ui = ui, server = server)

3.7.2 Demo 2

You can download the .R file of the demo 2 of Shiny App here.

Here is the code for the Demo 2:

# Load necessary libraries
library(shiny)
library(ggplot2)
library(dplyr)
library(DT)
library(gapminder)

# UI - Define the layout of the app
ui <- fluidPage(
  
  # App title
  titlePanel("Interactive Gapminder Dashboard"),
  
  # Sidebar layout with input and output definitions
  sidebarLayout(
    sidebarPanel(
      
      # Select country or countries
      selectInput("countries", 
                  label = "Select Country/Countries",
                  choices = unique(gapminder$country),
                  selected = "United States",
                  multiple = TRUE),
      
      # Select variable for y-axis
      selectInput("yvar", 
                  label = "Select Y Variable",
                  choices = c("Life Expectancy" = "lifeExp", 
                              "GDP per Capita" = "gdpPercap", 
                              "Population" = "pop"),
                  selected = "lifeExp"),
      
      # Select year range
      sliderInput("yearRange", 
                  label = "Select Year Range", 
                  min = min(gapminder$year), 
                  max = max(gapminder$year), 
                  value = c(1952, 2007), 
                  step = 5)
    ),
    
    mainPanel(
      # Output for the plot
      plotOutput("trendPlot"),
      
      # Output for the data table
      DTOutput("dataTable")
    )
  )
)

# Server - Define the logic of the app
server <- function(input, output) {
  
  # Filtered dataset based on user inputs
  filteredData <- reactive({
    gapminder %>%
      filter(country %in% input$countries,
             year >= input$yearRange[1], 
             year <= input$yearRange[2])
  })
  
  # Render the trend plot
  output$trendPlot <- renderPlot({
    ggplot(filteredData(), aes(x = year, y = .data[[input$yvar]], color = country)) +
      geom_line(linewidth = 1.2) +
      geom_point(size = 2) +
      labs(x = "Year", 
           y = input$yvar, 
           color = "Country",
           title = paste("Trend of", input$yvar, "Over Time")) +
      theme_minimal()
  })
  
  # Render the data table
  output$dataTable <- renderDT({
    datatable(filteredData())
  })
}

# Run the app
shinyApp(ui = ui, server = server)

3.7.3 Demo 3

You can download the .R file of the demo 3 of Shiny App here.

Here is the code for the Demo 3:

# Load necessary libraries
library(shiny)
library(shinydashboard)
library(ggplot2)
library(dplyr)
library(DT)
library(gapminder)

# Define UI components for App 1
app1_ui <- tabItem(
  tabName = "app1",
  sidebarLayout(
    sidebarPanel(
      "This is the sidebar",
      textInput("title", "Text box title", "Text box content"),
      numericInput("num", "Number of cars to show", 10, min = 1, max = 30),
      sliderInput("temperature", "Body temperature", min = 35, max = 42, value = 37.5, step = 0.1),
      sliderInput("price", "Price (€)", value = c(39, 69), min = 0, max = 99),
      radioButtons("radio", "Choose your preferred time slot", 
                   choices = c("09:00 - 09:30", "09:30 - 10:00", "10:00 - 10:30", 
                               "10:30 - 11:00", "11:00 - 11:30"), 
                   selected = "10:00 - 10:30"),
      selectInput("major", "Major", 
                  choices = c("Business Administration", "Data Science", "Econometrics & Operations Research", 
                              "Economics", "Liberal Arts", "Industrial Engineering", "Marketing Management", 
                              "Marketing Analytics", "Psychology"),
                  selected = "Marketing Analytics"),
      selectInput("programming_language", "Programming Languages",
                  choices = c("HTML", "CSS", "JavaScript", "Python", "R", "Stata"),
                  selected = "R", multiple = TRUE),
      checkboxInput("agree", "I agree to the terms and conditions", value = TRUE)
    ),
    mainPanel(
      tabsetPanel(
        tabPanel("tab 1", h1("Overview"), "Content goes here"),
        tabPanel("tab 2", "The content of the second tab"),
        tabPanel("tab 3", "The content of the third tab")
      )
    )
  )
)

# Define UI components for App 2
app2_ui <- tabItem(
  tabName = "app2",
  sidebarLayout(
    sidebarPanel(
      selectInput("countries", "Select Country/Countries",
                  choices = unique(gapminder$country),
                  selected = "United States",
                  multiple = TRUE),
      selectInput("yvar", "Select Y Variable",
                  choices = c("Life Expectancy" = "lifeExp", 
                              "GDP per Capita" = "gdpPercap", 
                              "Population" = "pop"),
                  selected = "lifeExp"),
      sliderInput("yearRange", "Select Year Range", 
                  min = min(gapminder$year), 
                  max = max(gapminder$year), 
                  value = c(1952, 2007), 
                  step = 5)
    ),
    mainPanel(
      plotOutput("trendPlot"),
      DTOutput("dataTable")
    )
  )
)

# Define Dashboard UI
ui <- dashboardPage(
  
  # Defines the app's title and the navigation bar.
  dashboardHeader(title = "Multi-Page Shiny App"),
  
  # Creates a sidebar menu for navigating between different pages.
  dashboardSidebar(
    sidebarMenu(
      menuItem("demo", tabName = "app1", icon = icon("list")),
      menuItem("Gapminder Dashboard", tabName = "app2", icon = icon("chart-line"))
    )
  ),
  
  # Displays the corresponding content for each selected page.
  dashboardBody(
    tabItems(
      app1_ui,
      app2_ui
    )
  )
)

# Define server logic
server <- function(input, output) {
  # Filtered dataset based on user inputs for App 2
  filteredData <- reactive({
    gapminder %>%
      filter(country %in% input$countries,
             year >= input$yearRange[1], 
             year <= input$yearRange[2])
  })
  
  # Render the trend plot for App 2
  output$trendPlot <- renderPlot({
    ggplot(filteredData(), aes(x = year, y = .data[[input$yvar]], color = country)) +
      geom_line(size = 1.2) +
      geom_point(size = 2) +
      labs(x = "Year", 
           y = input$yvar, 
           color = "Country",
           title = paste("Trend of", input$yvar, "Over Time")) +
      theme_minimal()
  })
  
  # Render the data table for App 2
  output$dataTable <- renderDT({
    datatable(filteredData())
  })
}

# Run the app
shinyApp(ui = ui, server = server)