In biology, data variability is not necessarily noise but can be viewed as the key to discovery if you are willing to flip the coin within the neural network of our thinking process. While analyzing gene expression or protein interactions, understanding statistical distributions unlocks the hidden patterns in our data.
A quick dive into some essential concepts using Python and R
In biological datasets, the mean can be skewed by outliers, while the median provides a robust alternative.
import numpy as np
x = np.random.uniform(size=10)
np.mean(x), np.median(x)
# Random Uniform Distribution
x <- runif(10)
mean(x)
median(x)
How spread out is the data? Variance and standard deviation tell the story of variability.
x = np.random.normal(7, 0.8, 60)
np.var(x), np.std(x)
set.seed(123)
# Random normal distribution
x <- rnorm(60, mean =7, sd =0.8)
# Variance
var(x)
# Standard Deviation
sd(x)
Outliers can distort the data narrative. The interquartile range (IQR) helps focus on the core dataset.
np.percentile(x, 75) - np.percentile(x, 25)
IQR(x)
Z-scores standardize data, showing how far a value is from the mean. Useful in hypothesis testing!
from scipy.stats import norm
z_score = -2 / 2
norm.cdf(z_score)
# Get Z - score
z_score <- -2/2
# Get Cumulative probability for the Z-score
pnorm(z_score)
Statistical distributions lay the groundwork for hypothesis testing, machine learning, and multi-omics analysis in bioinformatics, empowering data-driven discoveries.
🎲 Learn more here:
https://lnkd.in/eaDndahB
https://lnkd.in/eSqp8j5K
☕️ Curious about how these insights power breakthroughs in cancer research? Let's connect! I am passionate about using bioinformatics, Python, and R to illuminate complex biological systems. #Bioinformatics #Python #R #DataScience #Biostatistics #OpenToWork #CancerResearch #MultiOmics #StatisticalAnalysis