五、如何準備ML Coding面試?
對於MLE/AS來說,甚至某些比較硬核的DS,有時候會考到Model Implementation from scratch,也就是不使用sklearn, pytorch,只用numpy去實踐model or algorithms,但是通常會考的也就是那幾個:
Unsupervised Mannequin:
import numpy as npdef initialize_centroids(X, okay):
"""Randomly initialize okay centroids from the dataset X."""
indices = np.random.permutation(X.form[0])
centroids = X[indices[:k]]
return centroids
def closest_centroid(X, centroids):
"""For every level in X, discover the closest centroid."""
distances = np.sqrt(((X - centroids[:, np.newaxis])**2).sum(axis=2))
return np.argmin(distances, axis=0)
def update_centroids(X, labels, okay):
"""Recalculate centroids."""
new_centroids = np.array([X[labels == i].imply(axis=0) for i in vary(okay)])
return new_centroids
def kmeans(X, okay, max_iters=100):
"""The primary k-means algorithm."""
centroids = initialize_centroids(X, okay)
for i in vary(max_iters):
labels = closest_centroid(X, centroids)
new_centroids = update_centroids(X, labels, okay)
# Examine for convergence (if centroids do not change)
if np.all(centroids == new_centroids):
break
centroids = new_centroids
return centroids, labels
# Instance utilization
# Generate some information
np.random.seed(42)
X = np.random.rand(100, 2)
# Carry out k-means clustering
okay = 3
centroids, labels = kmeans(X, okay)
print("Centroids:", centroids)
Supervised Fashions:
import numpy as np# Sigmoid operate to map predicted values to chances
def sigmoid(z):
return 1 / (1 + np.exp(-z))
# Loss operate to compute the price
def compute_loss(y, y_hat):
# Binary crossentropy loss
return -np.imply(y * np.log(y_hat) + (1 - y) * np.log(1 - y_hat))
# Gradient descent operate to replace parameters
def gradient_descent(X, y, params, learning_rate, iterations):
m = len(y)
loss_history = np.zeros((iterations,))
for i in vary(iterations):
# Calculate predictions
y_hat = sigmoid(np.dot(X, params))
# Replace parameters
params -= learning_rate * np.dot(X.T, y_hat - y) / m
# Save loss
loss_history[i] = compute_loss(y, y_hat)
return params, loss_history
# Predict operate
def predict(X, params):
return np.spherical(sigmoid(np.dot(X, params)))
# Generate artificial information
X = np.random.rand(100, 2) # 100 samples and a pair of options
y = np.random.randint(0, 2, 100) # Binary targets
# Add intercept time period to characteristic matrix
X = np.hstack((np.ones((X.form[0], 1)), X))
# Initialize parameters to zero
params = np.zeros(X.form[1])
# Set studying fee and variety of iterations
learning_rate = 0.01
iterations = 1000
# Carry out gradient descent
params, loss_history = gradient_descent(X, y, params, learning_rate, iterations)
# Predict
predictions = predict(X, params)
# Calculate accuracy
accuracy = np.imply(predictions == y)
print(f"Accuracy: {accuracy}")
- (A number of) Linear Regression
import numpy as npdef multiple_linear_regression(X, y):
# Including a column of ones so as to add the intercept time period (b_0)
X_b = np.hstack([np.ones((X.shape[0], 1)), X])
# Utilizing the Regular Equation to compute the best-fit parameters
theta_best = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y)
return theta_best # First aspect is the intercept, others are coefficients
# Instance utilization:
X = np.array([
[1, 2], # Two options for every information level
[2, 3],
[3, 4],
[4, 5],
[5, 6]
])
y = np.array([5, 7, 9, 11, 13]) # Goal values
# Practice the mannequin to search out the intercept and coefficients
theta_best = multiple_linear_regression(X, y)
print(f"Intercept and coefficients: {theta_best}")
# Predict operate utilizing the derived coefficients
def predict(X, theta_best):
X_b = np.hstack([np.ones((X.shape[0], 1)), X]) # Add the intercept time period
return X_b.dot(theta_best)
# Predicting values
X_new = np.array([
[6, 7],
[7, 8]
]) # New information factors
predictions = predict(X_new, theta_best)
print(f"Predictions: {predictions}")
Sorting:
有時候會被要求實踐一種Sorting方法,這裡以Insertion Kind為例:
def insertion_sort(arr):
# Traverse via 1 to len(arr)
for i in vary(1, len(arr)):key = arr[i]
# Transfer components of arr[0..i-1], which can be larger than key,
# to 1 place forward of their present place
j = i-1
whereas j >= 0 and key < arr[j]:
arr[j + 1] = arr[j]
j -= 1
arr[j + 1] = key
return arr
# Instance utilization
my_list = [64, 34, 25, 12, 22, 11, 90]
sorted_list = insertion_sort(my_list)
print("Sorted checklist:", sorted_list)
還有看過被要求Implement Consideration, CNN的,不過我沒遇過,更多這種Model Implement from Scratch可以參考:
除了implementing mannequin from scratch以外,ML Coding我還遇過PyTorch填空,可能會要你用PyTorch implement整個Class,然後debug mannequin pipeline完成training。這個部分我當時完全沒準備掛得很慘,平時太依賴ChatGPT了。這種真的很考驗平時使用PyTorch/Tensorflow的經驗,單純看一看Cheatsheet可能都不太夠。
https://www.datacamp.com/cheat-sheet/deep-learning-with-py-torch