Accomplish Joyful Adventures: Machine Learning

Machine Learning

https://spark.apache.org/docs/latest/mllib-guide.html

Algorithms :

Classification

Linear Regression
Decision Trees
Naive Bayes
Logistics Regression
SVM

Clustering

k-Means
Gaussian Mixture
Power Iteration Clustering
Latent Dirichlet Allocation

Backbone of Spark MLLib:

RDD - aggregate, treeaggregate
Optimizer

Stochastic Gradient Descent
L-BGFS

Gradient

Logistic
Least Square
Hinge

Updater/Regualizer

Squared L2
L1
Simple

General Usage & Architecture :

Linear Models:

Usage

Prepare the data and create "LabelPoint" classes out of it, which basically groups the target/output class and the input features together
Call the "train" method of the respective training algorithm singleton object along with the algorithm specific parameters
That will return you an algorithm model you have trained for
Now simple your model is ready for use, use the method "predict" from the returned model object and start guessing the output for a new features by passing it to the "predict" method

Data -> Cleaning -> LabelPoint -> Algorithm.train -> AlgorithmModel -> predict -> Guessed target

Implementation

Repeat below steps until it converges or given number of iterations
Driver broadcasts initialized "weights" to each worker

Worker computes the gradient for next batch of B records from the training set

Each task (running on the workers) samples records from its data partition
Aggregate the gradient currently "RDD.treeaggregate" is used
Drivers updates the weight

No comments:

Post a Comment

Subscribe to: Posts (Atom)