xDANTEx7018 xDANTEx7018

27-12-2020
Computers and Technology

contestada

Z is a summer intern working on spam classification in your company. The dataset consists of 10 million non-spam emails (class 0) and 10 thousand spam emails (class 1). Z considers the following steps of conducting experiments:

Step 1: Shuffle the dataset and split it into the train, validation, and test sets.
Step 2: Train logistic regression models on the train set with different hyper-parameters.
Step 3: Identify the best hyper-parameter using the validation set and report the results on the test set in accuracy.

Do you agree with the above experimental setup? If No, what is the major issue? Provide your suggestions in one or two sentences.

Relax

Respuesta :

IfeanyiEze8899 IfeanyiEze8899

28-12-2020

Answer:

No, the data points of class 0 and class 1 are imbalanced and the text should be converted to a vector before used

Explanation:

The non-spam emails of class 0 have 10 million rows of data while class 1 of spam emails have 10 thousand rows. The data points are imbalanced and would result in an inaccurate prediction for the model. Either class 0 be downsampled or class 1 be upsampled to improve the prediction of the model.

The text of the emails should also be converted to vectors before using it in the model using natural language processing (NLP) techniques.

Answer Link

Otras preguntas

How do you simplify 1 over 2 over 3

A cyclist rides her bike at a speed of 12 meters per second. What is this speed in kilometers per hour? How many kilometers will the cyclist travel in 2 hours?

On Wake Tech’s Blackboard login page, you will find a link to Blackboard’s Browser Compatibility Chart, which can be used to determine which Web browser will wo

Your teacher gives you a black powder. When you dissolve the powder in water, you see red particles that float to the top, black particles that sink to the bott

what steps were taken by the colonists to resist british control?

Cell-cycle checkpoints are key transition stages that allow or prohibit the cell's progression to the next stage. What is happening during cell cycle checkpoint

On December 28, year 2, Kerr Manufacturing Co. purchased goods costing $50,000. The terms were FOB destination. Some of the costs incurred in connection with th

Please help me out with this!

Divide. Write the quotient in lowest terms. - 18/7 * 3

PLEASE HELP :( the area of a rectangle is (x^3-4x^2) square units and the width is (x-4) units what is the length