五、如何準備ML Coding面試?
對於MLE/AS來說,甚至某些比較硬核的DS,有時候會考到Model Implementation from scratch,也就是不使用sklearn, pytorch,只用numpy去實踐model or algorithms,但是通常會考的也就是那幾個:
Unsupervised Model:
import numpy as npdef initialize_centroids(X, okay):
"""Randomly initialize okay centroids from the dataset X."""
indices = np.random.permutation(X.kind[0])
centroids = X[indices[:k]]
return centroids
def closest_centroid(X, centroids):
"""For each stage in X, uncover the closest centroid."""
distances = np.sqrt(((X - centroids[:, np.newaxis])**2).sum(axis=2))
return np.argmin(distances, axis=0)
def update_centroids(X, labels, okay):
"""Recalculate centroids."""
new_centroids = np.array([X[labels == i].indicate(axis=0) for i in differ(okay)])
return new_centroids
def kmeans(X, okay, max_iters=100):
"""The first k-means algorithm."""
centroids = initialize_centroids(X, okay)
for i in differ(max_iters):
labels = closest_centroid(X, centroids)
new_centroids = update_centroids(X, labels, okay)
# Study for convergence (if centroids don't change)
if np.all(centroids == new_centroids):
break
centroids = new_centroids
return centroids, labels
# Occasion utilization
# Generate some data
np.random.seed(42)
X = np.random.rand(100, 2)
# Perform k-means clustering
okay = 3
centroids, labels = kmeans(X, okay)
print("Centroids:", centroids)
Supervised Fashions:
import numpy as np# Sigmoid function to map predicted values to possibilities
def sigmoid(z):
return 1 / (1 + np.exp(-z))
# Loss function to compute the worth
def compute_loss(y, y_hat):
# Binary crossentropy loss
return -np.indicate(y * np.log(y_hat) + (1 - y) * np.log(1 - y_hat))
# Gradient descent function to switch parameters
def gradient_descent(X, y, params, learning_rate, iterations):
m = len(y)
loss_history = np.zeros((iterations,))
for i in differ(iterations):
# Calculate predictions
y_hat = sigmoid(np.dot(X, params))
# Exchange parameters
params -= learning_rate * np.dot(X.T, y_hat - y) / m
# Save loss
loss_history[i] = compute_loss(y, y_hat)
return params, loss_history
# Predict function
def predict(X, params):
return np.spherical(sigmoid(np.dot(X, params)))
# Generate synthetic data
X = np.random.rand(100, 2) # 100 samples and a pair of choices
y = np.random.randint(0, 2, 100) # Binary targets
# Add intercept time interval to attribute matrix
X = np.hstack((np.ones((X.kind[0], 1)), X))
# Initialize parameters to zero
params = np.zeros(X.kind[1])
# Set learning payment and number of iterations
learning_rate = 0.01
iterations = 1000
# Perform gradient descent
params, loss_history = gradient_descent(X, y, params, learning_rate, iterations)
# Predict
predictions = predict(X, params)
# Calculate accuracy
accuracy = np.indicate(predictions == y)
print(f"Accuracy: {accuracy}")
- (Quite a lot of) Linear Regression
import numpy as npdef multiple_linear_regression(X, y):
# Together with a column of ones in order so as to add the intercept time interval (b_0)
X_b = np.hstack([np.ones((X.shape[0], 1)), X])
# Using the Common Equation to compute the best-fit parameters
theta_best = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y)
return theta_best # First side is the intercept, others are coefficients
# Occasion utilization:
X = np.array([
[1, 2], # Two choices for each data stage
[2, 3],
[3, 4],
[4, 5],
[5, 6]
])
y = np.array([5, 7, 9, 11, 13]) # Aim values
# Follow the model to go looking out the intercept and coefficients
theta_best = multiple_linear_regression(X, y)
print(f"Intercept and coefficients: {theta_best}")
# Predict function using the derived coefficients
def predict(X, theta_best):
X_b = np.hstack([np.ones((X.shape[0], 1)), X]) # Add the intercept time interval
return X_b.dot(theta_best)
# Predicting values
X_new = np.array([
[6, 7],
[7, 8]
]) # New data components
predictions = predict(X_new, theta_best)
print(f"Predictions: {predictions}")
Sorting:
有時候會被要求實踐一種Sorting方法,這裡以Insertion Sort為例:
def insertion_sort(arr):
# Traverse through 1 to len(arr)
for i in differ(1, len(arr)):key = arr[i]
# Switch elements of arr[0..i-1], which will be bigger than key,
# to 1 place ahead of their current place
j = i-1
whereas j >= 0 and key < arr[j]:
arr[j + 1] = arr[j]
j -= 1
arr[j + 1] = key
return arr
# Occasion utilization
my_list = [64, 34, 25, 12, 22, 11, 90]
sorted_list = insertion_sort(my_list)
print("Sorted guidelines:", sorted_list)
還有看過被要求Implement Consideration, CNN的,不過我沒遇過,更多這種Model Implement from Scratch可以參考:
除了implementing model from scratch以外,ML Coding我還遇過PyTorch填空,可能會要你用PyTorch implement整個Class,然後debug model pipeline完成training。這個部分我當時完全沒準備掛得很慘,平時太依賴ChatGPT了。這種真的很考驗平時使用PyTorch/Tensorflow的經驗,單純看一看Cheatsheet可能都不太夠。
https://www.datacamp.com/cheat-sheet/deep-learning-with-py-torch