▶ video

ML Algorithms & Scikit-Learn

Difficulty: M.TechRead Time: ~15 min

Lecture Notes

## Introduction to Machine Learning Machine Learning (ML) is a subset of Artificial Intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. ## Scikit-Learn Ecosystem Scikit-Learn is the gold standard for traditional machine learning in Python. It features various classification, regression, and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN. ### Key Concepts 1. **Estimators**: The core object. It implements a `fit` method to learn from data. 2. **Predictors**: Implements a `predict` method to infer labels on new data. 3. **Transformers**: Implements a `transform` method to filter or modify the data (e.g., StandardScaler). ```python from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score # Assuming X is features and y is target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) # Initialize the Estimator clf = RandomForestClassifier(n_estimators=100, max_depth=5) # Fit the model clf.fit(X_train, y_train) # Predict predictions = clf.predict(X_test) print("Accuracy:", accuracy_score(y_test, predictions)) ``` ## Overfitting and Underfitting - **Overfitting** occurs when a model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data. - **Underfitting** occurs when a model cannot capture the underlying trend of the data. > **Tip:** Always use Cross-Validation (`GridSearchCV` or `RandomizedSearchCV`) to find the optimal hyperparameters that balance bias and variance.