pigscript error not calculation max
问题描述:
我遇到了一个猪脚本的问题,我尝试了很多不同的方法。任何人都可以指出我究竟做错了什么?它应该是非常简单的,我试图在计算平均值后得到最大值。pigscript error not calculation max
a = LOAD 'default.books' using org.apache.hcatalog.pig.HCatLoader();
b = LOAD 'default.book_rating' using org.apache.hcatalog.pig.HCatLoader();
books_and_ratings = join a by isbn, b by isbn;
by_isbn = GROUP books_and_ratings BY (a::isbn);
DESCRIBE by_isbn;
average_book_rating = FOREACH by_isbn
GENERATE books_and_ratings.book_title, books_and_ratings.a::isbn as isbn1,
books_and_ratings.book_author, books_and_ratings.publisher,
AVG(books_and_ratings.book_rating) as AVG_RATING;
DESCRIBE average_book_rating;
group_avg = GROUP average_book_rating ALL;
DESCRIBE group_avg;
max_avg_rating = FOREACH group_avg
GENERATE FLATTEN average_book_rating.a::book_title, isbn1,
average_book_rating.a::book_author, average_book_rating.a::publisher, MAX(AVG_RATING);
dump max_avg_rating;
解析失败:不匹配的输入 'average_book_rating' 期待LEFT_PAREN
答
你可以尝试这样的。
max_avg_rating = ORDER average_book_rating BY AVG_RATING DESC;
top_most_rating = LIMIT max_avg_rating 1;
dump top_most_rating;
答
看到阎王最新评论后(“可以有多种书籍最高平均等级”),我想你需要另一组,第一个,获得通过书号哪些群体的收视率,你想要的东西之后。
开始是这样的:由AVG_RATING
grouped_rating = GROUP average_book_rating;
然后你可以使用像@ Sivasakthi代码:
ordered_avg_rating = ORDER BY grouped_rating DESC组;
top_most_rating = LIMIT ordered_avg_rating 1;
dump top_most_rating;
这样一来,如果有与平等,最高收视多个结果,top_most_rating将所有的信息接受这个最高等级的书袋。当然,如果你不想把它作为一个包,你可以把它设计得更方便些。
UPDATE:
这是我怎么会改变上面的代码。有一件事情不是纯粹的功能,我会首先将评分平均,然后加入书籍/作者信息 - 这会更好地表现明智,否则你会增加评分的大小(其中有很多)时,他们去了。
所以它看起来像这样:
-- assume a: book_title, isbn, book_author, publisher (and maybe more, which we'll ignore)
a = LOAD 'default.books' using org.apache.hcatalog.pig.HCatLoader();
-- assume b: isbn, book_rating (and maybe more, which we'll ignore)
b = LOAD 'default.book_rating' using org.apache.hcatalog.pig.HCatLoader();
by_isbn = GROUP b BY isbn;
average_book_rating = FOREACH by_isbn GENERATE AVG(b.book_rating) AS AVG_RATING, group AS isbn;
group_avg = GROUP average_book_rating BY AVG_RATING;
ordered_avg_rating = ORDER group_avg BY group DESC;
top_most_rating = LIMIT ordered_avg_rating 1;
b = FOREACH top_most_rating GENERATE flatten(average_book_rating);
-- now add the book information
books_and_ratings = JOIN a BY isbn, b BY isbn;
books_and_ratings = FOREACH books_and_ratings GENERATE a::book_title AS title, a::isbn AS isbn, a::book_author AS author,a::publisher AS publisher, b::average_book_rating::AVG_RATING AS max_rating;
希望这个作品送给你。
您是否收到错误,或者只是没有正确计算最大值? – Eyal 2014-09-28 13:56:13
@eyal实际上得到一个错误.... – Hades 2014-09-28 20:24:43
计算max_avg_rating的最后一个stmt不正确。你能粘贴确切的错误吗? – 2014-09-29 00:48:24