Respuesta :
Answer:
The answer and explanation to this question is attached.
Answer:
Explanation:
Let first simplified the risk given our specific loss function. if f(x) = i i is not double , then the risk is
R(f(x) = i|x) = ∑ L(f(x) = i , y = j ) P(y = j|x) 2
=0.P (Y= i|x) +λc ∑ P (Y= j|x) 3
=λc (1 - P(Y= i|x)) 4
When f(x) = c + 1, meaning you have choosing doubt , the risk is
R(f(x) = c +1|x) = ∑ L (f(x)= c+1, y=j) P(Y=j|x) 5
=λd∑ P(Y=j|x) 6
=λd 7
because ∑ P(Y=j|x) should sum to 1 since its a proper probability distribution.
Now let fopt : Rd→ {1, . . . , c + 1} be the decision rule which implements (R1)–(R3).We want to show that in expectation the rule foptis at least as good as an arbitrary rulef. Let x ∈ Rdbe a data point, which we want to classify. Let’s examine all the possiblescenarios where fopt(x) and another arbitrary rule f(x) might differ:Case 1: Let fopt(x) = i where i 6= c + 1.– Case 1a: f(x) = k where k 6= i. Then we get with (R1) thatR(fopt(x) = i|x) = λc1 − P(Y = i| x)≤ λc1 − P(Y = k|x)= R(f(x) = k|x).– Case 1b: f(x) = c + 1. Then we get with (R1) thatR(fopt(x) = i|x) = λc1 − P(Y = i| x)≤ λc(1 − (1 −λdλc)) = λd= R(f(x) = c + 1|x).Case 2: Let fopt(x) = c + 1 and f(x) = k where k 6= c + 1. Then:R(f(x) = k|x) = λc(1 − P (Y = k|x)R(fopt(x) = c + 1|x) = λ