Super detailed tutorial: How to use AI to generate a Histogram with Standard Deviation Chart？

Mar 3, 2025

6 min read

Welcome to the AI and Statistics series!

Let’s dive into how AI can transform tabular data into various types of charts.

Today, we will be using a Column Table to generate a Histogram with Standard Deviation Chart.

In this session, we’ll explore how ordinary one-way ANOVA helps analyze the effects of different soybean feed concentrations on iron-deficiency anemia in mice.

This method assesses variability among sample means and is advantageous for its simplicity in analyzing single-factor experiments with only column groupings, critical for precise scientific conclusions.

The Case we study today is :🐁 a researcher conducted the following experiment to study the effect of soybeans on iron-deficiency anemia: 36 mice with established anemia models were randomly divided into 3 groups, with 12 mice in each group.

They were fed with the following three types of feed:

(1) Regular feed without soybeans

(2) Feed containing 10% soybeans

(3) Feed containing 15% soybeans

After one week, the number of red blood cells (x10⁶) in the mice’s blood was measured.

The results are shown in the “Soybean Feed vs. Iron-Deficiency Anemia.xlsx” table.

Don’t worry about using the AI Agent-driven Bayeslab, all you need is natural language to get the data analysis result.

All content will be explained in the most comprehensible natural language descriptions to help you get started with data analysis from scratch.

We’ll start with a data table featuring three columns: “Regular feed,” “10% soybean feed,” and “15% soybean feed.”

This chart will help visualize the differences in red blood cell counts among these feed groups.

We’ll delve into how these prompts influence the final charts and uncover techniques for effective data visualization.

In just 2 minutes, you’ll learn to harness ANOVA for impactful data insights.

Using different prompt inputs, we’ll demonstrate how AI generates detailed statistical analyses and visualizations, our steps will include:
Step 1: Normality Test
Step 2: One-Way ANOVA
Step 3: Post-Hoc Comparison
Step 4: Draw Histogram (With SD)

Step 1 — Normality Test

Conduct a normality test for each column to verify data distribution prerequisites for ANOVA.

The Prompt is :

Once the above prompt is written, click ‘Run’ to view the normality test results for each column.

Step 2 -One-Way ANOVA

Perform one-way ANOVA to detect any statistically significant differences in red blood cell counts across feed types.

The Prompt is :

Once the above prompt is written, click ‘Run’ to obtain a summary of the ANOVA results.

Step 3 -Post-Hoc Comparison

Conduct post-hoc comparisons to identify which specific group pairs differ significantly.

Due to the output of one-way ANOVA Significant diff. among means (P < 0.05): Yes, we need to perform post hoc comparisons.

For this, use:

- For pairwise comparisons, generally use Tukey.

- For multiple comparisons, generally use Dunnett.

The Prompt is :

Once the above prompt is written, click ‘Run’ to produce pairwise comparison results.

Step 4 -Draw Histogram (With SD)

Draw a histogram with mean values and standard deviations for visual comparison of the different feed groups.

The Prompt is :

Once the above prompt is written, click ‘Run’ to generate the histogram chart portraying group statistics.

Thank you for reading this installment of the AI and Statistics series!

We showed how to apply one-way ANOVA for determining the effects of different soybean feed concentrations on anemia, enhancing your data insights.

Stay tuned for our upcoming demonstrations to explore more fascinating data visualization.

Using AI Agent and Bayeslab, anyone can organize, analyze, plot data charts, and make business data predictions like a professional data analyst based on previous data.

Supplement-1: Guide to Variable Design and ANOVA Analysis Selection

Assume there are several conditions that, when changed, affect the occurrence of event A.

Suppose we now change condition a1 to observe its impact on event A. Here, condition a1 is considered a factor.

If an experiment changes one factor, it is a single-factor experiment (simple to perform but less efficient and ecologically valid).

Changing two factors simultaneously is a two-factor experiment (more efficient but requires stricter operation conditions).

If one factor is gender, with possibilities 1-male/2-female, “male/female” are called levels — 2 possibilities = 2 levels.

If daily study times are at least 1 hour, 2 hours, 3 hours, 5 hours, 7 hours, affecting the probability of acing the final exam, we control 5 possibilities for study time, resulting in 5 levels.

If gender and study duration are changed simultaneously to observe final study performance, this is called a two-factor experiment. Furthermore, changing 3 or more experimental conditions is called a multi-factor experiment.

▶︎ For single-factor experiments, use one-way ANOVA. The data table typically has a column structure with only one column variable grouping.

▶︎ For two-factor experiments, use two-way ANOVA. The data table usually has a grouped table structure with one column variable grouping and one row variable grouping, i.e., 2 factors.

▶︎ For multi-factor experiments, use multi-way ANOVA.

🚩Preconditions for using ANOVA include:

(1) Independence: Sample data must be independent; samples from different groups should not affect each other.

(2) Normality: Each group’s data should follow a normal distribution, checked using graphical methods (like QQ plots) or statistical tests (like the Shapiro-Wilk test).

(3) Homogeneity of variance: Data from each group should have equal variance, often tested using the Levene’s test or Bartlett’s test.

If these conditions are not met, consider data transformation or non-parametric tests.

Common examples of data transformation methods:

Log transformation: Reduces variance and normality issues by taking the logarithm of data.

Square root transformation: Applies to positively skewed data by taking square roots.

Reciprocal transformation: Takes the reciprocal, reducing impact from extreme values.

Box-Cox transformation:

A flexible power transformation decided by data.

Z-Score standardization: Standardizes data to have a mean of 0 and standard deviation of 1.Common examples of non-parametric testing methods:

Kruskal-Wallis test: Detects whether medians among three or more samples are equal.

Mann-Whitney U test:

Compares median differences between two independent samples (also known as Wilcoxon rank-sum test).Wilcoxon paired sample test: Com median differences between two paired samples.Friedman test: Used for repeated measures (i.e., one subject under different conditions).Spearman rank correlation: Measures the monotonic relationship between variables.

Supplement-2: Advanced ANOVA Alternatives for Unequal Variances

In addition to the classic ANOVA, we can also use the following tests when the sample groups have unequal variances during variance analysis:

→ the Brown-Forsythe and Welch ANOVA tests.

Both the Brown-Forsythe and Welch ANOVA tests are statistical methods used to analyze whether the means of multiple groups are equal, particularly in the case of unequal variances among the groups.They are variants of the classic ANOVA, providing more robust analysis for data that does not meet the homogeneity of variance assumption.

▶︎ Welch ANOVA

(1) Purpose: Used to compare the means of multiple groups even when variances are unequal (violating the assumption of homogeneity of variances).

(2) Features:

- Does not assume homogeneity of variances; suitable for cases with unequal variances.

- Performs well when sample sizes are unequal across groups.

(3) Implementation: Uses a weighted average approach to adjust for the impact of unequal variances on the test results.

▶︎ Brown-Forsythe Test

(1) Purpose: Used to assess the homogeneity of variances across multiple groups to determine if there are significant differences in variances among the groups.

(2) Features:

- Often used as a robust alternative to Levene’s Test, especially when samples exhibit skewed distributions.

- Based on medians rather than means, reducing sensitivity to non-normal data.

(3) Implementation: Evaluates homogeneity of variances by comparing the Median Absolute Deviation (MAD) of each group.

Main Differences

- Welch ANOVA: Primarily addresses the issue of unequal variances, focusing on comparing means; suitable for mean comparison hypotheses.

- Brown-Forsythe: Primarily detects variance homogeneity; suitable for variance comparison hypotheses.

However, both are used to address problems of unequal variances. Therefore, they are sometimes used in similar situations.

In Practical Application

Combined Use: Sometimes, when dealing with uncertain variances, these two methods are discussed or used together to comprehensively assess group differences. Thus, they complement each other in statistical analysis.

About Bayeslab

Bayeslab: Website

The AI First Data Workbench

X: @BayeslabAI

Documents:

https://bayeslab.gitbook.io/docs

Blogs:

https://bayeslab.ai/blog

Bayeslab is a powerful web-based AI code editor and data analysis assistant designed to cater to a diverse group of users, including :

👥 data analysts ，🧑🏼‍🔬experimental scientists, 📊statisticians, 👨🏿‍💻 business analysts, 👩‍🎓university students, 🖍️academic writers, 👩🏽‍🏫scholars, and ⌨️ Python learners.

Bayeslab makes data analysis as easy as note-taking!

Bayeslab makes data analysis as easy
as note-taking!

Start Free