Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

...

PigMix is a set of queries used test pig performance from release to release. There are queries that test latency (how long does it take to run this query?), and queries that test scalability (how many fields or records can pig handle before it fails?). In addition it includes a set of map reduce java programs to run equivalent map reduce jobs directly. These will be used to test the performance gap between direct use of map reduce and using pig. In Jun 2010, we release PigMix2, which include 5 more queries in addition to the original 12 queries into PigMix to measure the performance of new Pig features. We will publish the result of both PigMix and PigMix2.

...

Usage

To run PigMix

...

, run the following command from PIG_HOME:

Code Block

ant -Dharness.hadoop.home=$HADOOP_HOME pigmix-deploy (generate test dataset)
ant -Dharness.hadoop.home=$HADOOP_HOME pigmix (run the PigMix benchmark)

You can optionally set HADOOP_CONF_DIR before run.

If you want to change the default size of test dataset, change test/perf/pigmix/conf/config.sh.

Note the PigMix is checked in to Pig 0.12 and beyond. If you want to run it in earlier version of Pig, Please go to https://issues.apache.org/jira/browse/PIG-200 and use PIG-200-0.12.patch.

Runs

PigMix

The following table includes runs done of the pig mix. All of these runs have been done on a cluster with 26 slaves plus one machine acting as the name node and job tracker. The cluster was running hadoop version 0.18.1. (TODO: Need to get specific hardware info on those machines).

The tests were run against two versions of pig: top of trunk, and top of types branch both as of Nov 21 2008.

The tests were run three times for each version and the results averaged.

tot = top of trunk
totb = top of types branch

Version

Map Reduce Java Code

tot 11/21/08

totb 11/21/08

totb 1/20/09

tot 2/23/09

Date Run

11/22/08

11/21/08

11/21/08

1/20/09

2/23/09

The following table includes runs done of the pig mix. All of these runs have been done on a cluster with 26 slaves plus one machine acting as the name node and job tracker. The cluster was running hadoop version 0.18.1. (TODO: Need to get specific hardware info on those machines).

The tests were run against two versions of pig: top of trunk, and top of types branch both as of Nov 21 2008.

The tests were run three times for each version and the results averaged.

tot = top of trunk
totb = top of types branch

Version

Map Reduce Java Code

tot 11/21/08

totb 11/21/08

totb 1/20/09

tot 2/23/09

Date Run

11/22/08

11/21/08

11/21/08

1/20/09

2/23/09

L1 explode

116

261

283

218

205

L2 fr join

41

1665

253

168

89

L3 join

97

1912

320

258

254

L4 distinct agg

68

254

193

110

116

L5 anti-join

90

1535

281

209

112

L6 large group by key

61

294

226

126

120

L7 nested split

72

243

204

107

102

L8 group all

56

462

194

104

103

L9 order by 1 field

286

5294

867

851

444

L10 order by multiple fields

634

1403

565

469

447

L11 distinct + union

120

316

255

164

154

L12 multi-store

150

fails

781

499

804

Total time

1791

13638

4420

3284

2950

Compared to hadoop

1.0

7.6

2.5

1.8

1.6

Weighted Average

1.0

11.2

3.26

2.20

1.97

...

Run date: October 18, 2009, run against top of trunk as of that day.
With this run we included a new measure, weighted average. Our previous multiplier that we have been publishing takes the total time of running all 12 Pig Latin scripts and compares it to the total time of running all 12 Java Map Reduce programs. This is a valid way to measure, as it shows the total amount of time to do all these operations on both platforms. But it has the drawback that it gives more weight to long running operations (such as joins and order bys) while masking the performance in faster operations such as group bys. The new "weighted average" adds up the multiplier for each Pig Latin script vs. Java program separately and then divides by 12, thus weighting each test equally. In past runs the weighted average had significantly lagged the overall average (for example, in the run above for August 27 it was 1.5 even though the total difference was 1.2). With this latest run it still lags some, but the gap has shrunk noticably.vs. Java program separately and then divides by 12, thus weighting each test equally. In past runs the weighted average had significantly lagged the overall average (for example, in the run above for August 27 it was 1.5 even though the total difference was 1.2). With this latest run it still lags some, but the gap has shrunk noticably.

Test

Pig run time

Java run time

Multiplier

PigMix_1

135.0

133.0

1.02

PigMix_2

46.67

39.33

1.19

PigMix_3

184.0

98.0

1.88

PigMix_4

71.67

77.67

0.92

PigMix_5

70.0

83.0

0.84

PigMix_6

76.67

61.0

1.26

PigMix_7

71.67

61.0

1.17

PigMix_8

43.33

47.67

0.91

PigMix_9

184.0

209.33

0.88

PigMix_10

268.67

283.0

0.95

PigMix_11

145.33

168.67

0.86

PigMix_12

55.33

95.33

0.58

Total

1352.33

1357

1.00

Weighted avg

 

 

1.04

Run date: January 4, 2010, run against 0.6 branch as of that day

Test

Pig run time

Java run time

Multiplier

PigMix_1

135 138.0 33 133

112.0 67

1.02 23

PigMix_2

46 66.67 33

39.33

1.19 69

PigMix_3

184.0 199 98

83.0 33

1 2.88 39

PigMix_4

71.67 59

77 60.67

0.92 97

PigMix_5

70 80.0 33 83

113.0 67

0.84 71

PigMix_6

65

76 77.67

61. 0 1.26 84

PigMix_7

71 63.67 33

61 .0

1.17 04

PigMix_8

43.33 40

47.67

0.91 84

PigMix_9

184.0 214

209 215.33 67

0.88 99

PigMix_10

268 284.67

283 284.0 33

0 1.95 00

PigMix_11

145 141.33

168 151.67 33

0.86 93

PigMix_12

55.33 67 95.33

115

0.58 48

Total

1407

1362 1352.33 1357

1.00 03

Weighted avg Avg

 

 

1.04 09

PigMix2

Run date: January 4May 29, 2010, run against 0.6 branch top of trunk as of that day.

Test

Pig run time

Java run time

Multiplier

PigMix_1

122.33

117

1.05

PigMix_2

50.33

42.67

1.18

PigMix_3

189

100.33

1.88

PigMix_4

75.67

61

1.24

PigMix_5

64

138.67

0.46

PigMix_6

65.67

69.33

0.95

PigMix_7

88.33

84

Test

Pig run time

Java run time

Multiplier

PigMix_1

138.33

112.67

1.23

PigMix_2

66.33

39.33

1.69 05

PigMix_3 8

199 39

83 47.33 67

2 0.39 82

PigMix_4 9

59 274.33

60 215.67 33

0 1.97 27

PigMix_5 10

80 333.33

113 311.67 33

0 1.71 07

PigMix_6 11

65

151.33

157 77.67

0.84 96

PigMix_7 12

63 70.33 67

61 97.67

1 0.04 72

PigMix_8 13

40 80

47.67 33

0 2.84 42

PigMix_9 14

214 69

215 86.67 33

0.99 80

PigMix_10 15

284 80.67 33 284

69.33

1.00 16

PigMix_11 16

141 82.33

151 69.33

0 1.93 19

PigMix_12 17

286

55 229.67 33

115

0 1.48 25

Total 1407

2121.67

1362 1929.33 67

1.03 10

Weighted Avg

 

 

1.09

...

15

Run date: May 29Jun 11, 20102011, run against top of trunk as of that day.

Test

Pig run time

Java run time

Multiplier

PigMix_1

122.33 130

117 139

1 0.05 94

PigMix_2

50.33 66

42 48.67

1.18 36

PigMix_3

189 138

100 107.33

1.88 29

PigMix_4

106

75 78.67 61 33

1.24 35

PigMix_5

64

138 135.67

114

0 1.46 19

PigMix_6

65 103.67

69 74.33

0 1.95 39

PigMix_7

88 77.33 67

84 77.33

1.05 00

PigMix_8

39

56.33

57 47.67

0.82 99

PigMix_9

274 384.33 67

215 280.33

1.27 37

PigMix_10

333.33 380 311

354.33 67

1.07

PigMix_11

151.33 164

157 141

0 1.96 16

PigMix_12

70 109.67 97

187.67 33

0.72 59

PigMix_13

80 78

44.33

2 1.42 76

PigMix_14

69 105.33 86

111.33 67

0.80 94

PigMix_15

80 89.33 67 69.33

87

1.16 03

PigMix_16

82 87.33 67

69 75.33

1.19 16

PigMix_17

286 171.33

229 152.33

1.25 12

Total

2121 2383.67 1929.67

2130

1.10 12

Weighted Avg

 

 

1.15 16

Pig 0.9.2Run date: Jun 11, 2011, run against top of trunk as of that day.

Test

Pig run time

Java run time

Multiplier

PigMix_1

130 146

139 147

0.94 993197278911565

PigMix_2

66 73

48.67 61

1.36 19672131147541

PigMix_3

138 134

107.33 158

1 0.29 848101265822785

PigMix_4

106 91

78.33 87

1.35 04597701149425

PigMix_5

135.67 81

114 153

1 0.19 529411764705882

PigMix_6

103.67

91

81 74.33

1.39 12345679012346

PigMix_7

77.67

77.33

71

86

0.825581395348837 1.00

PigMix_8

56 .33

57 61

0.99 918032786885246

PigMix_9

384.67

302

192 280.33

1.37 57291666666667

PigMix_10

380 312

354.67 226

1.07 38053097345133

PigMix_11

164 207

141 222

1 0.16 932432432432432

PigMix_12

109.67

96

163 187.33

0.59 588957055214724

PigMix_13

78 76

44.33 127

1 0.76 598425196850394

PigMix_14

105.33

94

157 111.67

0.94 598726114649682

PigMix_15

89.67 86

87 92

1 0.03 934782608695652

PigMix_16

87.67

75.33

80

82

0.975609756097561 1.16

PigMix_17

171.33

196

176 152.33

1.12 11363636363636

Total

2383.67 2192

2130 2271

1 0.12 965213562

Weighted Avg

 

 

1 0.16 951558634

Pig 0.10.1

Test

Pig run time

Java run time

Multiplier

PigMix_1

147

146

1.00684931506849

PigMix_2

74

62

1.19354838709677

PigMix_3

140

158

0.886075949367089

PigMix_4

87

86

1.01162790697674

PigMix_5

81

153

0.529411764705882

PigMix_6

92

262

0.351145038167939

PigMix_7

76

86

0.883720930232558

PigMix_8

62

61

1.01639344262295

PigMix_9

303

187

1.62032085561497

PigMix_10

303

232

1.30603448275862

PigMix_11

188

218

0.862385321100917

PigMix_12

101

157

0.643312101910828

PigMix_13

82

132

0.621212121212121

PigMix_14

99

158

0.626582278481013

PigMix_15

82

91

0.901098901098901

PigMix_16

82

82

1

PigMix_17

206

177

1.1638418079096

Total

2205

2448

0.900735294117647

Weighted Avg

 

 

0.919032977

...

Data Generation

If you want to run this queires yourselfknow the details of data generation, please , see https://issues.apache.org/jira/browse/PIG-200Image Removed on how to generate the data. See DataGeneratorHadoop for information on how to run data generator in hadoop mode.