/
Recitation for Recitation for

Recitation for - PowerPoint Presentation

natalia-silvester
natalia-silvester . @natalia-silvester
Follow
463 views
Uploaded On 2016-06-29

Recitation for - PPT Presentation

BigData Jay Gu Jan 10 HW1 preview and Java Review Outline HW1 preview Review of java basics An example of gradient descent for linear regression in Java HW1 Preview On 1 million size data ID: 381947

int double feature data double int data feature datainstance query class java map object title token public dataset integer

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Recitation for" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Recitation for BigData

Jay GuJan 10

HW1 preview and Java ReviewSlide2

OutlineHW1 preview

Review of java basicsAn example of gradient descent for linear regression in JavaSlide3

HW1 Preview

On ~1 million size data.Warm up exercise

Stochastic Gradient Descent for Logistic Regression

SGD with Hashing Kernel

Extra credit: Personalized Logistic RegressionSlide4

Starter Code

Class for parsing the input file and iterate over the dataset.Dataset dataset = new Dataset(

your_path

,

is_training

, size)

While(

d

ataset.hasNext

()) {

DataInstance

d =

d

ataset.next

();

… some action on d …

}Slide5

Starter Code

public class DataInstance

{

int

clicks; // number of clicks, -1 if it is testing data.

int

impressions; // number of impressions, -1 if it is testing data.

// Feature of the session

int

depth; // depth of the session.

int

[] query; // List of token

ids in the query field

// Feature of the ad

….

// Feature of the user

….

}Slide6

Starter Code

public class Weights { double w0;

/*

*

query.get

("123") will return the weight for the feature:

* "token 123 in the query field".

*/

Map<Integer, Double> query;

Map<Integer, Double> title;

Map<Integer, Double> keyword;

Map<Integer, Double> description;

double

wPosition

;

double

wDepth

;

double

wAge

;

double

wGender

;

}Slide7

BigData is often sparse

Be as lazy as you can …

Update only when necessary…Slide8

Avoid O(d): Sparse and lazy update

Although the feature space d is huge, each data point only has a few tokens.Only update what is changed.

But even so, regularization should be applied to all d weights at each step.

Delay and batch the regularization.Slide9

Java Review

Not required but good to know: Interface, Inheritance, Access Modifier,

I/O,

Language: Class, Object, variable, method

Data Structure: Java Collections

Array

List :

ArrayList

Map:

HashMapSlide10

Class

public class DataInstance

{

// Feature of the session

int

[] query ….

// Feature of the ad

int

[] title …

DataInstance

(String line, … ) {

// parse the line, and set the field

}

public void print() {

System.out.println

( “title: “); for (int token : title) System.out.print(token + “\t”); } }

Members or fields

Constructor

MethodSlide11

Object

DataInstance data = new DataInstance();

int

clicked =

data.clicked

data.print

()Slide12

Collections

Arrayint[] tokensdouble[] weights

ArrayList

ArrayList

<

DataInstance

>

HashMap

HashMap

<K, V>

Fixed Length, Most compact

Dynamically Increasing (double the size every time)

Constant time key value look up

Dynamically Increasing, use more memorySlide13

Variables

“Everything” in Java is an ObjectExcept for primitive types : int

, double

All object variables are reference/pointers to the Object

F

unction passes variables by valueSlide14

Example: SGD for linear regression

Demo