.

Monday, May 27, 2019

The Benefits and Drawbacks of a Binary Tree Versus a Bushier Tree

Homework 3 4. Discuss the benefits and drawbacks of a binary tree versus a bushier tree. The structure of binary is open than a bushier tree. Each parent pommel only has two baby bird. It save the storage space. Besides, binary tree may deeper than bushier tree. The result record of binary may not very refine. 5. build a classification and regression tree to classify salary based on the other variables. Do as some(prenominal) as you can by hand, before turning to the software. Data NO. 2 3 4 5 6 7 8 9 10 11 ply gross sales Management trade Service sex womanly male Male Male Female Male Female Female Male Female Male season 45 25 33 25 35 26 45 40 30 50 25 Salary $48,000 $25,000 $35,000 $45,000 $65,000 $45,000 $70,000 $50,000 $40,000 $40,000 $25,000 aim Level 3 Level 1 Level 2 Level 3 Level 4 Level 3 Level 4 Level 3 Level 2 Level 2 Level 1 chance Splits for t= cool it lymph gland prognosis Split 1 2 3 Left squirt Node, tL dividing line = Service patronage = Manageme nt Occupation = Sales Right pip-squeak Node, tR Occupation = Management, Sales, faculty Occupation = Service, Sales, Staff Occupation = Service, Management, Staff 5 6 7 8 9 10 11 12 Occupation = Staff sexual practice = Female grow 45 Values of the Components of the Optimality Measure =(st) for separately outlook split, for the Split PL PR P(L=1tL) P(L=2tL) P(L=3tL) P(L=4tL) P(L=1tR) P(L=2tR) P(L=3tR) P(L=4tR) 2PLPR ? (st) Root Node 1 2 3 4 5 6 7 8 9 0. 27 0. 73 0. 33 0. 33 0. 33 0. 00 0. 13 0. 25 0. 38 0. 29 0. 25 0. 40 0. 23 0. 36 0. 64 0. 00 0. 18 0. 82 0. 00 0. 18 0. 82 0. 50 0. 45 0. 55 0. 00 0. 27 0. 73 0. 67 0. 36 0. 64 0. 50 0. 45 0. 55 0. 40 0. 55 0. 45 0. 33 0. 00 0. 50 0. 50 0. 20 0. 00 0. 00 0. 20 0. 33 0. 29 0. 25 0. 20 0. 50 0. 50 0. 00 0. 0 0. 33 0. 50 0. 40 0. 33 0. 29 0. 38 0. 40 0. 50 0. 00 0. 00 0. 40 0. 00 0. 00 0. 00 0. 00 0. 14 0. 13 0. 20 0. 29 0. 22 0. 11 0. 33 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 43 0. 22 0. 22 0. 33 0. 38 0. 43 0. 33 0. 20 0. 2 5 0. 33 1. 00 0. 00 0. 22 0. 22 0. 00 0. 25 0. 29 0. 33 0. 40 0. 25 0. 33 0. 00 0. 46 0. 30 0. 30 0. 50 0. 40 0. 46 0. 93 0. 50 0. 46 0. 40 1. 60 0. 66 0. 26 0. 40 0. 46 0. 53 0. 66 0. 46 0. 46 0. 30 0. 23 0. 26 0. 33 0. 44 0. 33 0. 38 0. 29 0. 33 0. 40 0. 50 0. 33 0. 00 10 0. 64 0. 36 0. 29 11 0. 73 0. 27 0. 25 12 0. 91 0. 09 0. 20 Optimality cadence maximized to 0. 6, when occupation=Management(Left Branch), occupation=Service or Sales or Staff(Right Branch) After the first split, left tiddler has records 4,5,6,7, proper child has records 1,2,3,8,9,10,11. Now we split the left child which has records 4,5,6,7. Candidate Split 5 6 7 10 Left Child Node, tL Gender = Male Age 35 Values of the Components of the Optimality Measure =(st) for each candidate split, for the Split PL PR P(L=1tL) P(L=2tL) P(L=3tL) P(L=4tL) P(L=1tR) P(L=2tR) P(L=3tR) P(L=4tR) 2PLPR ? (st) each candidate split, for decision node A 5 6 7 0. 50 0. 50 0. 25 0. 75 0. 50 0. 50 0. 00 0. 00 0. 0 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 1. 00 1. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 1. 00 0. 00 0. 50 1. 00 0. 00 0. 00 0. 00 0. 33 0. 00 0. 00 0. 67 1. 00 1. 00 0. 38 0. 50 0. 38 0. 50 1. 00 0. 50 1. 00 0. 67 0. 00 0. 33 10 0. 75 0. 25 Optimality measure maximized to 1. 00, when Gender=Male(Left Branch), Gender=Female(Right Branch) After this split, both left branch and right branch terminate to pure finger node. The left child has records 4. 6 which value=Level 3 and the right child has record 5,7 which value=Level 4. Now we split the right child of germ node which has records 1,2,3,8,9,10,11.Candidate Split 1 3 Left Child Node, tL Occupation = Service Occupation = Sales Right Child Node, tR Occupation = Sales, Staff Occupation = Service, Staff 4 5 6 8 9 11 12 Occupation = Staff Gender = Female Age 45 Values of the Components of the Optimality Measure =(st) for each candidate split, for the Split PL PR P(L=1tL) P(L=2tL) P(L=3tL) P(L=4tL) P(L=1tR) P(L=2tR) P(L=3tR) P(L=4tR) 2PLPR ? (st) each candida te split, for decision node B 1 3 4 5 6 8 9 0. 43 0. 57 0. 29 0. 71 0. 29 0. 71 0. 43 0. 57 0. 29 0. 71 0. 43 0. 57 0. 57 0. 43 0. 33 0. 00 0. 50 0. 00 1. 0 0. 67 0. 50 0. 40 0. 33 0. 33 0. 50 0. 33 0. 50 0. 00 0. 00 0. 25 0. 40 0. 50 0. 25 0. 00 0. 49 0. 16 0. 40 0. 40 0. 50 0. 60 0. 50 0. 33 0. 50 1. 00 0. 20 0. 40 0. 00 0. 40 0. 50 0. 67 0. 50 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 41 0. 41 0. 49 0. 41 0. 49 0. 49 0. 41 0. 24 0. 33 0. 33 0. 65 0. 82 0. 65 0. 65 0. 33 0. 33 0. 50 0. 33 0. 00 0. 33 0. 50 0. 40 0. 33 0. 00 0. 67 0. 00 0. 00 0. 00 0. 20 0. 33 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 20 0. 50 0. 00 0. 00 0. 00 0. 00 0. 00 11 0. 71 0. 29 12 0. 86 0. 14 Optimality measure maximized to 0. 2, when Age25(Right Branch) After this split, the left branch terminates to pure leaf node which has records 2,11 and value=Level 1. The right branch has records 1,3,8,9,10. Now we split the right child which has records 1,3,8,9,10. Candidate Split Left Child Node, tL Right Child Node, tR 1 3 4 5 8 9 11 12 Occupation = Service Occupation = Sales Occupation = Staff Gender = Female Age 45 Values of the Components of the Optimality Measure =(st) for each candidate split, for the Split PL PR P(L=1tL) P(L=2tL) P(L=3tL) P(L=4tL) P(L=1tR) P(L=2tR) P(L=3tR) P(L=4tR) 2PLPR ? (st) ach candidate split, for decision node C 1 3 4 5 8 9 0. 40 0. 60 0. 40 0. 60 0. 20 0. 80 0. 60 0. 40 0. 20 0. 80 0. 40 0. 60 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 50 0. 50 1. 00 0. 50 0. 50 0. 00 0. 67 0. 00 0. 00 0. 33 0. 50 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 67 0. 67 0. 50 1. 00 0. 50 0. 33 0. 50 1. 00 0. 33 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 48 0. 48 0. 32 0. 48 0. 32 0. 48 0. 48 0. 32 0. 16 0. 16 0. 32 0. 64 0. 32 0. 64 0. 16 0. 32 0. 33 0. 50 0. 00 0. 50 0. 67 0. 50 0. 00 0. 33 1. 00 1. 00 0. 67 0. 50 0. 00 0. 00 0. 00 0. 00 0. 00 11 0. 60 0. 40 12 0. 0 0. 20 Optimality measure maximized to 0. 64, when Gender =Female(Left Branch), Gender=Male(Right Branch) After this split, the right branch terminates to pure leaf node which has records 3,9 and the value=Level 2. The left branch has records 1,8,9. Now we split the left child which has records 1,8,10. Candidate Split 1 3 4 11 12 Left Child Node,s tL Occupation = Service Occupation = Sales Occupation = Staff Age 45 Values of the Components of the Optimality Measure =(st) for each candidate split, for the Split PL PR P(L=1tL) P(L=2tL) P(L=3tL) P(L=4tL) P(L=1tR) P(L=2tR) P(L=3tR) P(L=4tR) 2PLPR ? st) each candidate split, for decision node D 1 3 4 0. 33 0. 67 0. 00 0. 33 0. 67 0. 00 0. 00 1. 00 0. 00 0. 00 0. 50 0. 50 0. 00 0. 44 0. 44 0. 00 1. 00 0. 00 0. 00 1. 00 0. 00 1. 00 1. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 50 0. 00 0. 50 1. 00 0. 50 1. 00 0. 50 0. 00 0. 00 0. 00 0. 00 0. 00 0. 44 0. 44 0. 44 0. 44 0. 44 0. 89 0. 44 0. 89 0. 33 0. 67 0. 00 11 0. 33 0. 67 0. 00 12 0. 67 0. 33 0. 00 Optimality measure maximized to 0. 89, when Occupation=Staff(Left Branch), Occupation=Service or Sales(Right Branch) After this split, both the left and right branch terminate to pure leaf node.The left branch has record 10 which value=Level 2 and the right branch has records 1 and 8 which value=Level 3. In summary, we construct the get behind tree below, Root Node (All Records) Occupation management vs. not management Occupation=man agement Occupationmanag ement Decision Node A (Records 4,5,6,7 ) Gender=Female Gender=Male Age25 Level 3 (Records 4,6) Level 4 (Records 5,7) Decision Node C (Records 1,3,8,9,10) Gender=Female Decision Node D (Records 1,8,10) Gender=Male Level 2 (Records 3,9) Occupation=Staff Level 3 Occupation=Service or Sales Level 2 (Record 10) Records 1. 8) 6. Construct a C4. 5 decision tree to classify salary based on the other variables. Do as much as you can by hand, before turning to the software. Below is all candidate split and information gain for root node Candidate Split 1 Child Nodes Occupa tion = Service Occupation = Management Occupation = Sales Occupation = Staff 2 Gender = Female Gender = Male Age 25 Age 26 Age 30 Age 33 7 Age 35 8 Age 40 9 Age 45 0. 19 0. 12 0. 15 0. 38 Information Gain 0. 78 3 0. 55 4 0. 58 5 0. 38 6 0. 38 Candidate split 1 has highest Information Gain=0. 8 bits and chosen for initial split. And the initial split produces four second level decision node, decision node A,B,C and D. Then do the same process again until all leaf nodes ask same target class values. The C4. 5 decision tree is below. Root Node(All Records) Occupation=Service, Management, Sales or Staff Occupation=Staff Occupation=Service Occupation= Management Decision Node A (Records 1,2,3) Decision Node B (Records 4,5,6,7) Occupation= Sales Decision Node C (Records 8,9) Decision Node D (Records 10,11) Gender=Female Level 4 (Records 5,7)Gender=Male Level 3 (Records 4,6) Gender=Male Gender=Female Gender=Male Level 2 Gender=Female Level 3 (Record 8) Level 2 (Record 9) (Record 10) Level 1 (Record 11) Gender=Female Level 3 (Record 1) Gender=Male Decision Node E (Records 2,3) Age25 Level 1 (Record 2) Level 2 (Record 3) 7. Compare the two decision trees and discuss the benefits and drawbacks of each. In this case, CART tree is deeper than C4. 5 tree. CART algorithm says each node(except left node) can only flip two child. But C4. 5 algorithm dont have this restriction. Besides, most of leaf nodes of C4. tree have only one record, it may cause overfitting. 8. Generate the skilful set of decision rules for the CART decision tree. Antecedent if Occupation = Management and Gender = Male if Occupation = Management and Gender = Female if Occupation = Service, Sales, Staff and Age 25 and Gender = Female if Occupation = Service, Sales and Age 25 and Gender = Female if Occupation = Service, Sales, Staff and Age 25 and Gender = Male effect because Level 3 then Level 4 then Level 1 then Level 2 then Level 3 then Level 2 Support 2 2 2 1 2 2 Confidence 1. 0 1. 0 1. 0 1. 0 1 . 0 1. 0 9.Generate the full set of decision rules for the C4. 5 decision tree. Antecedent if Occupation = Service and Gender = Female if Occupation = Service and Gender = Male and Age 25 if Occupation = Management and Gender = Female if Occupation = Management and Gender = Male if Occupation = Sales and Gender = Female if Occupation = Sales and Gender = Male if Occupation = Staff and Gender = Female if Occupation = Staff and Gender = Male Consequent then Level 3 then Level 1 then Level 2 then Level 4 then Level 3 then Level 3 then Level 2 then Level 2 then Level 1 Support 1/11 1/11 1/11 2/11 2/11 /11 1/11 1/11 1/11 Confidence 1. 0 1. 0 1. 0 1. 0 1. 0 1. 0 1. 0 1. 0 1. 0 10. Compare the two sets of decision rules and discuss the benefits and drawbacks of each. CART only has two branches, support is more than C4. 5, that is to say the result is not very refine. It is deeper than other trees most of the time. But it is easy to interpret. C4. 5 can have several branches. Support of C4. 5 is less than CART. The result is more accurate.

No comments:

Post a Comment