Statistics Calculator
Calculate mean, median, mode, standard deviation, quartiles, and more from your dataset.
Enter Your Data
Results
Frequency Histogram
Statistics Calculator
Comprehensive descriptive statistics calculator with mean, median, mode, standard deviation, quartiles, IQR, and outlier detection.
Guide
How it works
Understanding Descriptive Statistics
Statistics is the science of collecting, organizing, analyzing, and interpreting data. Descriptive statistics summarize the main features of a dataset, while inferential statistics use sample data to make predictions about a larger population. This calculator focuses on descriptive statistics — the foundation for any data analysis task.
Mean vs. Median: Choosing the Right Center
The mean (arithmetic average) is sensitive to extreme values. Consider household income: if nine households earn $40,000 and one earns $4,000,000, the mean becomes $436,000 — misleading for the typical household. The median ($40,000) better represents the center of this skewed distribution. This is why economists report median household income rather than mean income. For symmetric distributions, mean and median are nearly equal; for skewed data, use the median.
The Empirical Rule (68-95-99.7)
For normally distributed data, approximately 68% of values fall within 1 standard deviation of the mean, 95% within 2 standard deviations, and 99.7% within 3 standard deviations. This rule is used in quality control (Six Sigma), test score analysis, and setting tolerances in manufacturing. A product that falls outside 3σ from the mean is statistically rare (0.3% probability).
Outlier Detection: IQR Method vs. Z-Score
This calculator uses the IQR (Interquartile Range) method: values below Q1 − 1.5×IQR or above Q3 + 1.5×IQR are flagged as outliers. This method is robust because it isn't affected by the outliers themselves. The alternative z-score method flags values more than 2 or 3 standard deviations from the mean — but since outliers inflate the standard deviation, this method can mask extreme values in small datasets. The IQR method is preferred for exploratory analysis and non-normal distributions.
Sample vs. Population: Bessel's Correction
When computing variance for a sample (a subset of a larger population), we divide by n−1 instead of n. This is called Bessel's correction. Why? A sample's values tend to cluster closer to the sample mean than the population mean, causing the raw variance to underestimate the true population variance. Dividing by n−1 corrects this bias. Use population variance (÷n) only when you have the complete dataset — every member of the group you care about.
Central Limit Theorem
The Central Limit Theorem (CLT) states that the distribution of sample means approaches a normal distribution as sample size increases, regardless of the original distribution's shape. This underpins hypothesis testing, confidence intervals, and A/B testing. The standard error (SE = SD/√n) quantifies how much sample means vary — larger samples produce smaller standard errors, meaning more precise estimates of the population mean.
Applications
Quality control: Statistical Process Control (SPC) uses mean and standard deviation to monitor manufacturing processes. Control charts flag when a process drifts beyond 3σ limits. A/B testing: Marketers use standard error and t-tests to determine if a conversion rate difference between two page versions is statistically significant or due to random chance. Finance: Portfolio variance and covariance form the basis of Modern Portfolio Theory — diversification reduces overall portfolio standard deviation even when individual asset volatilities are high.
What is the difference between standard deviation and standard error?expand_more
Standard deviation measures the spread of individual data points around the mean. Standard error measures how accurately the sample mean estimates the population mean. SE = SD / √n — as sample size grows, SE shrinks, meaning larger samples give more reliable mean estimates.
When should I use median instead of mean?expand_more
Use median when your data is skewed or contains outliers. Income, home prices, and response times are classic examples where the median is more representative. Use mean when data is symmetric and approximately normally distributed.
What does IQR tell me?expand_more
The Interquartile Range (IQR = Q3 − Q1) spans the middle 50% of your data. It is a robust measure of variability that isn't affected by extreme values. A large IQR means the middle half of data is widely spread; a small IQR means they cluster tightly.
Can a dataset have more than one mode?expand_more
Yes. A dataset with two modes is bimodal, three is trimodal, and so on. This often indicates the data comes from two distinct groups — for example, heights of a mixed-gender group may show peaks around 5'4" and 5'10".
Why might my variance be 0?expand_more
Variance is 0 when all values in the dataset are identical. Every value equals the mean, so all squared deviations are zero. This means there is no variability in your data.