Data Reliability & Noisy Input Handling in ML Models

This project analyzes how noisy and incomplete data affects the performance of machine learning models on a binary classification task. It builds a clean baseline model, then systematically corrupts the data with different levels of noise and missing values, and evaluates how model performance degrades.

Features

Synthetic binary classification dataset (no external data needed)
Train/test split and preprocessing
Training of Logistic Regression and RandomForestClassifier
Evaluation under multiple noise levels:
- Feature noise (Gaussian noise)
- Missing values injection
Summary report printed to the console
Optional visualization of performance vs. noise level

Project Structure

src/data_prep.py
Generates a clean synthetic dataset and saves it as data/clean_data.csv.
src/train_model.py
Trains baseline models on the clean dataset and saves them to models/.
src/evaluate_noise.py
Loads the trained models, applies different noise levels to the data and reports performance.
data/clean_data.csv
Synthetic dataset generated by data_prep.py.

How to Run

Create and activate a virtual environment (optional but recommended):

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Generate the synthetic dataset:

python src/data_prep.py

Train baseline models:

python src/train_model.py

Evaluate robustness under noise:

python src/evaluate_noise.py

This will print accuracy scores under different noise and missing-value levels. It will also save an optional plot noise_impact.png in the project root.

Requirements

Python 3.9+ (should also work on 3.8+)
Dependencies listed in requirements.txt

The code is written to be simple, readable and easily extensible for academic experiments or teaching purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Reliability & Noisy Input Handling in ML Models

Features

Project Structure

How to Run

Requirements

About

Uh oh!

Releases

Packages

Languages

talhayilmazc/Data-Reliability-Noisy-Input-Handling-in-ML-Models

Folders and files

Latest commit

History

Repository files navigation

Data Reliability & Noisy Input Handling in ML Models

Features

Project Structure

How to Run

Requirements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages