Diego Klabjan
  • Home
  • Vita
  • Publications
  • Contact

Improvements to large-scale machine learning in Spark

4/28/2015

61 Comments

 
 One of the biggest hustles in mapreduce is model calibration for machine learning models such as the logistic regression and SVM. These algorithms are based on gradient optimization and require iterative computations of the gradient and in turn updating the weights. Mapreduce is ill suited for this since in each iteration the data has to be read from hdfs and there is significant cost of starting and winding down a mapreduce job.

On the other hand, Spark with its capability to persist rdd’s (resilient distributed dataset) in memory and natively offering dataflow capabilities, is a great candidate for efficient calibration on rdd’s.

Gradient based algorithms on distributed data sets rely on the paradigm of solving the optimization problem on each partition and then combining the solutions together. We implemented in scala three algorithms.

1.      Iterative parameter averaging (IPA): On each partition a single pass of the standard gradient algorithm is performed which produces weights. Weights from each partition are then averaged and form the initial weights for the next pass. The pseudo code is provided below.

Initialize w
Loop
            Broadcast w to each partition
            weightRDD = For each partition in rdd inputData
                                                wp = w
                                                Perform a single gradient descent pass over the records in            
                                                                        the partition by iteratively updating wp
                                                Return wp
            /* weightRDD is the rdd storing new weights */
            w = average of all weights in weightRDD
Return w

            The key is to keep the rdd inputData in memory (persist before calling IPA).

2.      Alternating direction method of multipliers (ADMM): http://stanford.edu/~boyd/admm.html

This method is based on the concept of the augmented Lagrangian. In each iteration for each partition the calibration model is solved on the records pertaining to the partition. The objective function is altered and it consists of the standard loss and a penalty term driving the weights to resemble the average weights. One needs to solve an extra regularization problem with penalties. For L2 and L1 norms this problem has a closed form solution.

After each partition computes its weights, they are averaged and the penalty term adjusted. Each partition has its own set of weights.

Since the algorithm is complex, we do not provide the pseudo code.  The bulk of the pseudo code is actually very similar to IPA, however there is additional work performed by the driver.

One challenge, i.e., inefficiency in spark or ‘we do not how to do it in spark,’ is the inability in spark to send particular data (in our case the penalties) to a particular actor working on a partition. Instead we had to do a forecast to all actors and then during processing of a partition only the relevant penalties that have been broadcast are used. The main issue here is that all penalties for each partition has to be held in memory at the driver. For very large-scale rdd’s with many features this will be a bottleneck.

3.      Progressive hedging (PH): This is very similar to ADMM. The regularization subproblem has a different from than in ADMM, but it still exhibits closed form solutions for L2 and L1 norms. 

The implementations together with test codes are available at https://github.com/wxhC3SC6OPm8M1HXboMy/spark-ml-optimization 

Below is a comparison with Spark on 4 CPUs each one with 8 cores for two large data sets. IPA is a clear winner with the default spark SGD being the worst algorithm. 

Picture
Picture
61 Comments
ecommerce data entry link
8/8/2015 11:14:27 pm

This plan quite big and will take time. The candidate will easily do this

Reply
superiorpapers review link
8/12/2015 06:38:01 am

An essay can be everything without exception which can impeccably clarify and remark on a given subject. The peruser of the essay can be termed as the last judge to focus, how great an essay is.

Reply
who can type my dissertation link
8/17/2015 07:11:27 pm

Looking for service, where someone can write your dissertation? You have come to the right place. Visit our service right now!

Reply
TV Mounting Service link
8/22/2015 03:24:19 pm

Amazing article and I love the pics we shell post them on our website. Thank you.

Reply
phd proposal writing link
8/25/2015 01:55:07 am

In case you quickly adjust issues any time begin a fresh piece the essay is not going to flow as well as will probably be hard for that target audience to comprehend.

Reply
Cheap Assignment Writing link
9/3/2015 09:39:54 pm

This is a wonderful post shared by Diego Klabjan. I think these improvements of Gradient based algorithms would definitely going to help me in future.

Reply
Video Converter link
9/4/2015 06:47:10 pm

Total Video Converter, a very powerful Avi to Mp4 Video Converter, convert any video files to avi, 3gp, mp4, psp, iPod, iPhone, flv, DVD, VCD.

Reply
Fairfax Dog Trainers link
9/6/2015 10:40:46 pm

Nice Post !! I like it, it also comprises a lot of useful facts. thanks to share your experience.

Reply
Henderson Dog Trainers link
9/6/2015 10:47:05 pm

Awesome post share I say thanks to share this impressive post.Keep it.

Reply
Surrey pressure washing link
9/6/2015 11:00:16 pm

Wonderful site for us keep me update by your new work.

Reply
Cheap smartphones link
9/6/2015 11:12:02 pm

Wow! You truly impressed me by this post of yours. And what is more commendable will be the authenticity with the content material.Thanks for shearing the information and facts.

Reply
Indianapolis dog trainers link
9/8/2015 04:03:29 pm

Very efficiently written information. It will be valuable to everyone who uses it, including myself. Thanks a lot! For the best Dog Trainer in Indy (Indianapolis) call Off Leash K9 Dog Training. For an expert Indianapolis Dog Training in Indy, Lawrence or Beach Grove contact.

Reply
Liggbås link
9/8/2015 04:09:03 pm

Ett koncept om självrengörande kalvboxar och liggbås som väsentligt förbättrar hygien och produktionsresultat.

Reply
Buy Hip Hop Beats link
9/8/2015 04:17:37 pm

Buy rap beats online, DBinstrumentals is the home of music producer Dreas Beats, who is notorious for having the best rap beats and hip hop beats online.

Reply
Airbag trouble codes link
9/8/2015 04:21:06 pm

Automotive troubleshooting for airbag light, check engine trouble codes, keyless remote programing, service reset, radio code, and more.

Reply
Cell Phone Repair Mill Creek link
9/8/2015 04:23:28 pm

Gizmodevicerepair provide reliable repair services for your iphone, Cell Phone Repair, ipad, smartphone, game console all around Bothell, Washington area.

Reply
شركة مكافحة حشرات بجدة link
9/8/2015 04:31:31 pm

تعد شركة القمه 0500855537 افضل شركة تنظيف فلل وشقق وعماير بجده حيث تقوم الشركة بالعديد من مجالات التنظيف في جدة.

Reply
Dollar 1 Web hosting link
9/11/2015 10:34:57 pm

Here you can get a Hosting and Domain in the cost of website $1 web hosting with free Domain.

Reply
Powered by The People link
9/12/2015 12:46:23 pm

Sometime you don’t want to spend time with the unlike personalities so you avoid the gatherings and try to be alone. But if we communicate others develop social skills so as we know more about the people and may find the person of similar interests.

Reply
Houston Dog Training link
9/14/2015 03:55:30 pm

Wow! I am mainly surprised by the way you unique out almost every single little detail. It can be genuinely heading to aid me a great offer. Thanks for sharing your suggestions so certainly.

Reply
Dog Trainers Las Vegas link
9/15/2015 07:17:45 pm

Your article is really an inspiration to many. I'll be looking forward for more of your posts. Keep it up!

Reply
writing a cover letter for a job link
9/19/2015 08:43:21 pm

This is a great work and i love to read your blog on regular basis. Keep doing this well.

Reply
IPhone App Development link
9/21/2015 03:56:25 am

I always like to read a quality content having accurate information regarding the subject and the same thing I found in this post. Nice work.

Reply
Black Magic Spells link
9/21/2015 06:48:16 pm

I think your mission it wonderful! I will all persons could understand that wildlife crimes should be banned everywhere you go! I feel producing support often is also may also help.

Reply
islamic vashikaran mantra link
9/22/2015 04:33:02 pm

Thanks for this impressive work.

Reply
Denver Guitar Lessons link
9/23/2015 04:35:46 pm

Very efficiently written information. It will be valuable to everyone who uses it, including myself. Thanks a lot!

Reply
Emergency Electrician Canberra link
9/25/2015 05:01:24 am

Our pricing policy guarantees fair play. We specify our call out fee over the phone, and we give an up front quotation after we have assessed the job. There are no hidden or extra fees, so all costs are fully disclosed before we start work. With our flat call out fee and low rates on parts and labour, our prices are the lowest around town. Add on top of that our 10 year quality guarantee and it makes 24 Hour Emergency Electrician by far the best deal in Canberra.

Reply
obat alami keputihan link
9/25/2015 11:07:38 pm

I feel that every words you put on this article is always meaningful. That's just great!

Reply
personal statement generator link
10/2/2015 03:37:29 am

This is a wonderful post shared by Diego Klabjan. I think these improvements of Gradient based algorithms would definitely going to help me in future.

Reply
manuscript writing link
10/4/2015 04:22:33 am

Nonetheless, with advancement, mindsets modified as well as females are actually observed frequently, since means in order to guys, a minimum of inside business field.

Reply
Merchandising companies in Dubai link
10/8/2015 04:01:50 pm

Very efficiently written information. It will be valuable to everyone who uses it, including myself. Thanks a lot

Reply
Astrology predictions link
10/11/2015 04:26:52 pm

Excellent post share with us and this blog is impresses more people to reading that blog.

Reply
Get my ex love back link
10/11/2015 05:01:58 pm

I would like to refer to one important thing that we like how you will create the post to the point.

Reply
Muslim Love Marriage Problem link
10/11/2015 08:31:44 pm

This post is really astounding one! I was delighted to read this, very much useful. Many thanks

Reply
Clinical Research Assignment Help Essay link
10/14/2015 03:48:13 pm

Thanks for sharing this blog its very informative and useful for us .

Reply
HRM homework Help link
10/14/2015 04:36:47 pm

This is really great work. Thank you for sharing such a useful information here in the blog.

Reply
Case Study Solution link
10/14/2015 05:30:24 pm

This case study requires the focus of the audience in order to make sure it is properly connected with the objective being presented.

Reply
Remote Object Help link
10/14/2015 07:12:56 pm

Well thanks for posting such an outstanding idea. I like this blog & I like the topic and thinking of making it right.

Reply
Value-Added Theory Sociology Help link
10/14/2015 07:53:30 pm

Thanks for such a great content on this website. 

Reply
Finance Dissertation Help link
10/14/2015 08:30:00 pm

yeah it was such an awesome blog that i have updated.. :)

Reply
Car insurance comparison sites link
10/19/2015 07:10:04 pm

Go compare car insurance believes in respecting every customers privacy and helps customers to get hold the product which they are most in need of, taking care of their budget line. This site is quite confident and Continue reading

Reply
boekhouder rotterdam link
10/19/2015 08:40:20 pm

Helaas is deze domeinnaam reeds geregistreerd voor een van onze klanten.U kunt hierboven op de pagina controleren of een domeinnaam nog beschikbaar is voor registratie

Reply
live band phuket link
10/23/2015 04:38:04 am

There is no chance of mistake during doing any deep work. You have to be very attentive during your work so that you may perform better in your field and the other people may feel proud of you.

Reply
Rs 99 Web Hosting link
10/23/2015 03:16:01 pm

Very informative blog thanks for share it with us really appreciate this.

Reply
Hide video/videos link
10/26/2015 05:37:21 pm

隐私卫士(LEO Privacy Guard)是一款免费的手机应用管理软件,提供应用加 锁、应用加速、应用备份、照片隐藏、流量监控等功能和服务,彻底解决 个人隐私、数据安全和手机卡慢问题,让你的手机安全无忧,轻松享受极 致移动互联网生活!

Reply
Woodinville Town Car Service link
10/26/2015 07:24:16 pm

Seattle town car service and airport transportation call today for reservations great rates, experienced drivers and personalized service in the Greater Puget Sound area.

Reply
NGO Kuwait link
10/26/2015 07:48:04 pm

en.v’s programs focus on three main areas including education, environment and capacity-building, with a particular emphasis on empowering youth, civil society organizations, and social entrepreneurs.

Reply
Engineering Homework Help link
10/28/2015 03:14:34 pm

It’s very informative and useful option for the daily online readers. Thumbs up for you!

Reply
Finance Assignment Help Online link
10/28/2015 03:37:57 pm

I want to say about this article, uummm this is amazing and nice post.

Reply
Training and Development HR Assignment Help link
10/28/2015 04:08:33 pm

This is really a great stuff for sharing. Keep it up .Thanks for sharing.

Reply
Cancer Medical Assignment Help link
10/28/2015 04:25:23 pm

Really i appreciate the effort you made to share the knowledge. This is really a great stuff for sharing. Keep it up . Thanks for sharing.

Reply
Physics Project Help link
10/28/2015 04:47:26 pm

This is really great work. Thank you for sharing such a useful information here in the blog.

Reply
Stats Online Homework Help link
10/28/2015 05:53:24 pm

Thanks for sharing this blog its very informative and useful for us .

Reply
Vashikaran Specialist in Mumbai link
11/4/2015 04:29:21 pm

I need to say 1 point which i just like the way you write the post to the phase.

Reply
Stainless steel washers link
11/5/2015 01:20:01 pm

Amazing article and I love the pics we shell post them on our website. Thank you.

Reply
Vashikaran Mantra for Love marriage Specialist link
11/5/2015 05:48:20 pm

Excellent article .Thanks to share this informative post with us .

Reply
Vashikaran Specialist Baba link
11/5/2015 06:46:09 pm

Interesting post it is .Very nice content is used in this post.

Reply
vashikaran specialist link
11/5/2015 07:05:22 pm

Nice article it is . Thanks for share it . I like this information .keep posting.

Reply
Love Guru Specialist link
11/5/2015 07:27:43 pm

This is very informative post . I like it . Thank you to share this post.

Reply
Vashikaran Specialist in Chandigarh link
11/5/2015 07:40:56 pm

Really it's so nice information shared .I really like this type of information . Thanks for share it.

Reply
Blackmagic Specialist in Chandigarh link
11/5/2015 07:41:37 pm

Thanks to you for share such kind of information.Appreciate you. Keep posting .

Reply

Your comment will be posted after it is approved.


Leave a Reply.

    Diego Klabjan

    Professor at Northwestern University, Department of Industrial Engineering and Management Sciences. Founding Director, Master of Science in Analytics.

    Archives

    July 2019
    June 2019
    March 2019
    February 2019
    January 2017
    August 2016
    March 2016
    November 2015
    April 2015
    March 2015
    February 2015
    January 2015
    December 2014

    Categories

    All
    Analytics

    RSS Feed