Data Bites
Public datasets worth keeping close.
A lightweight index of useful public datasets: what they contain, what they are good for,
and what to check before using them in a project.
Manufacturing · Surface defect segmentation
A surface-quality inspection dataset for studying defect segmentation under sparse and imbalanced manufacturing defect conditions.
Use for: surface defect segmentation, manufacturing inspection, imbalanced defect detection experiments
Scale: Dataset Ninja lists 3,336 images and 15,764 annotated defect objects
Watch out: Dataset Ninja lists CC BY-SA 4.0; check the original citation and dataset terms before redistribution.
Industrial inspection · Unsupervised anomaly detection
A real-world industrial anomaly detection dataset with defect-free training images, anomalous test images, and pixel-level annotations.
Use for: unsupervised anomaly detection, anomaly localization, industrial visual inspection baselines
Scale: Over 5,000 high-resolution images across 15 object and texture categories
Watch out: Commercial use is not allowed under the standard dataset license; contact MVTec if the use case may be commercial.
Biomedical imaging · Classification benchmark
A standardized collection of 2D and 3D biomedical image classification datasets with MNIST-like size options for fast medical imaging experiments.
Use for: biomedical image classification, lightweight benchmarking, AutoML tests, 2D/3D model sanity checks
Scale: About 708K 2D images and 10K 3D images across 18 datasets
Watch out: Most subsets are CC BY 4.0, but DermaMNIST is CC BY-NC 4.0; the dataset is not intended for clinical use.