Improvements to large-scale machine learning in Spark

4/28/2015

One of the biggest hustles in mapreduce is model calibration for machine learning models such as the logistic regression and SVM. These algorithms are based on gradient optimization and require iterative computations of the gradient and in turn updating the weights. Mapreduce is ill suited for this since in each iteration the data has to be read from hdfs and there is significant cost of starting and winding down a mapreduce job.

On the other hand, Spark with its capability to persist rdd’s (resilient distributed dataset) in memory and natively offering dataflow capabilities, is a great candidate for efficient calibration on rdd’s.

Gradient based algorithms on distributed data sets rely on the paradigm of solving the optimization problem on each partition and then combining the solutions together. We implemented in scala three algorithms.

1.      Iterative parameter averaging (IPA): On each partition a single pass of the standard gradient algorithm is performed which produces weights. Weights from each partition are then averaged and form the initial weights for the next pass. The pseudo code is provided below.

Initialize w
Loop
            Broadcast w to each partition
            weightRDD = For each partition in rdd inputData
                                                wp = w
                                                Perform a single gradient descent pass over the records in
                                                                        the partition by iteratively updating wp
                                                Return wp
            /* weightRDD is the rdd storing new weights */
            w = average of all weights in weightRDD
Return w

            The key is to keep the rdd inputData in memory (persist before calling IPA).

2.      Alternating direction method of multipliers (ADMM): http://stanford.edu/~boyd/admm.html

This method is based on the concept of the augmented Lagrangian. In each iteration for each partition the calibration model is solved on the records pertaining to the partition. The objective function is altered and it consists of the standard loss and a penalty term driving the weights to resemble the average weights. One needs to solve an extra regularization problem with penalties. For L2 and L1 norms this problem has a closed form solution.

After each partition computes its weights, they are averaged and the penalty term adjusted. Each partition has its own set of weights.

Since the algorithm is complex, we do not provide the pseudo code. The bulk of the pseudo code is actually very similar to IPA, however there is additional work performed by the driver.

One challenge, i.e., inefficiency in spark or ‘we do not how to do it in spark,’ is the inability in spark to send particular data (in our case the penalties) to a particular actor working on a partition. Instead we had to do a forecast to all actors and then during processing of a partition only the relevant penalties that have been broadcast are used. The main issue here is that all penalties for each partition has to be held in memory at the driver. For very large-scale rdd’s with many features this will be a bottleneck.

3.      Progressive hedging (PH): This is very similar to ADMM. The regularization subproblem has a different from than in ADMM, but it still exhibits closed form solutions for L2 and L1 norms.

The implementations together with test codes are available at https://github.com/wxhC3SC6OPm8M1HXboMy/spark-ml-optimization

Below is a comparison with Spark on 4 CPUs each one with 8 cores for two large data sets. IPA is a clear winner with the default spark SGD being the worst algorithm.

61 Comments

ecommerce data entry link

8/8/2015 11:14:27 pm

This plan quite big and will take time. The candidate will easily do this

superiorpapers review link

8/12/2015 06:38:01 am

An essay can be everything without exception which can impeccably clarify and remark on a given subject. The peruser of the essay can be termed as the last judge to focus, how great an essay is.

who can type my dissertation link

8/17/2015 07:11:27 pm

Looking for service, where someone can write your dissertation? You have come to the right place. Visit our service right now!

TV Mounting Service link

8/22/2015 03:24:19 pm

Amazing article and I love the pics we shell post them on our website. Thank you.

phd proposal writing link

8/25/2015 01:55:07 am

In case you quickly adjust issues any time begin a fresh piece the essay is not going to flow as well as will probably be hard for that target audience to comprehend.

Cheap Assignment Writing link

9/3/2015 09:39:54 pm

This is a wonderful post shared by Diego Klabjan. I think these improvements of Gradient based algorithms would definitely going to help me in future.

Video Converter link

9/4/2015 06:47:10 pm

Total Video Converter, a very powerful Avi to Mp4 Video Converter, convert any video files to avi, 3gp, mp4, psp, iPod, iPhone, flv, DVD, VCD.

Fairfax Dog Trainers link

9/6/2015 10:40:46 pm

Nice Post !! I like it, it also comprises a lot of useful facts. thanks to share your experience.

Henderson Dog Trainers link

9/6/2015 10:47:05 pm

Awesome post share I say thanks to share this impressive post.Keep it.

Surrey pressure washing link

9/6/2015 11:00:16 pm

Wonderful site for us keep me update by your new work.

Cheap smartphones link

9/6/2015 11:12:02 pm

Wow! You truly impressed me by this post of yours. And what is more commendable will be the authenticity with the content material.Thanks for shearing the information and facts.

Indianapolis dog trainers link

9/8/2015 04:03:29 pm

Very efficiently written information. It will be valuable to everyone who uses it, including myself. Thanks a lot! For the best Dog Trainer in Indy (Indianapolis) call Off Leash K9 Dog Training. For an expert Indianapolis Dog Training in Indy, Lawrence or Beach Grove contact.

Liggbås link

9/8/2015 04:09:03 pm

Ett koncept om självrengörande kalvboxar och liggbås som väsentligt förbättrar hygien och produktionsresultat.

Buy Hip Hop Beats link

9/8/2015 04:17:37 pm

Buy rap beats online, DBinstrumentals is the home of music producer Dreas Beats, who is notorious for having the best rap beats and hip hop beats online.

Airbag trouble codes link

9/8/2015 04:21:06 pm

Automotive troubleshooting for airbag light, check engine trouble codes, keyless remote programing, service reset, radio code, and more.

Cell Phone Repair Mill Creek link

9/8/2015 04:23:28 pm

Gizmodevicerepair provide reliable repair services for your iphone, Cell Phone Repair, ipad, smartphone, game console all around Bothell, Washington area.

شركة مكافحة حشرات بجدة link

9/8/2015 04:31:31 pm

تعد شركة القمه 0500855537 افضل شركة تنظيف فلل وشقق وعماير بجده حيث تقوم الشركة بالعديد من مجالات التنظيف في جدة.

Dollar 1 Web hosting link

9/11/2015 10:34:57 pm

Here you can get a Hosting and Domain in the cost of website $1 web hosting with free Domain.

9/12/2015 12:46:23 pm

Sometime you don’t want to spend time with the unlike personalities so you avoid the gatherings and try to be alone. But if we communicate others develop social skills so as we know more about the people and may find the person of similar interests.

Houston Dog Training link

9/14/2015 03:55:30 pm

Wow! I am mainly surprised by the way you unique out almost every single little detail. It can be genuinely heading to aid me a great offer. Thanks for sharing your suggestions so certainly.

Dog Trainers Las Vegas link

9/15/2015 07:17:45 pm

Your article is really an inspiration to many. I'll be looking forward for more of your posts. Keep it up!

writing a cover letter for a job link

9/19/2015 08:43:21 pm

This is a great work and i love to read your blog on regular basis. Keep doing this well.

IPhone App Development link

9/21/2015 03:56:25 am

I always like to read a quality content having accurate information regarding the subject and the same thing I found in this post. Nice work.

Black Magic Spells link

9/21/2015 06:48:16 pm

I think your mission it wonderful! I will all persons could understand that wildlife crimes should be banned everywhere you go! I feel producing support often is also may also help.

islamic vashikaran mantra link

9/22/2015 04:33:02 pm

Thanks for this impressive work.

Denver Guitar Lessons link

9/23/2015 04:35:46 pm

Very efficiently written information. It will be valuable to everyone who uses it, including myself. Thanks a lot!

Emergency Electrician Canberra link

9/25/2015 05:01:24 am

Our pricing policy guarantees fair play. We specify our call out fee over the phone, and we give an up front quotation after we have assessed the job. There are no hidden or extra fees, so all costs are fully disclosed before we start work. With our flat call out fee and low rates on parts and labour, our prices are the lowest around town. Add on top of that our 10 year quality guarantee and it makes 24 Hour Emergency Electrician by far the best deal in Canberra.

obat alami keputihan link

9/25/2015 11:07:38 pm

I feel that every words you put on this article is always meaningful. That's just great!

personal statement generator link

10/2/2015 03:37:29 am

This is a wonderful post shared by Diego Klabjan. I think these improvements of Gradient based algorithms would definitely going to help me in future.

manuscript writing link

10/4/2015 04:22:33 am

Nonetheless, with advancement, mindsets modified as well as females are actually observed frequently, since means in order to guys, a minimum of inside business field.

Merchandising companies in Dubai link

10/8/2015 04:01:50 pm

Very efficiently written information. It will be valuable to everyone who uses it, including myself. Thanks a lot

Astrology predictions link

10/11/2015 04:26:52 pm

Excellent post share with us and this blog is impresses more people to reading that blog.

Get my ex love back link

10/11/2015 05:01:58 pm

I would like to refer to one important thing that we like how you will create the post to the point.

Muslim Love Marriage Problem link

10/11/2015 08:31:44 pm

This post is really astounding one! I was delighted to read this, very much useful. Many thanks

Clinical Research Assignment Help Essay link

10/14/2015 03:48:13 pm

Thanks for sharing this blog its very informative and useful for us .

HRM homework Help link

10/14/2015 04:36:47 pm

This is really great work. Thank you for sharing such a useful information here in the blog.

Case Study Solution link

10/14/2015 05:30:24 pm

This case study requires the focus of the audience in order to make sure it is properly connected with the objective being presented.

Remote Object Help link

10/14/2015 07:12:56 pm

Well thanks for posting such an outstanding idea. I like this blog & I like the topic and thinking of making it right.

Value-Added Theory Sociology Help link

10/14/2015 07:53:30 pm

Thanks for such a great content on this website.

Finance Dissertation Help link

10/14/2015 08:30:00 pm

yeah it was such an awesome blog that i have updated.. :)

Car insurance comparison sites link

10/19/2015 07:10:04 pm

Go compare car insurance believes in respecting every customers privacy and helps customers to get hold the product which they are most in need of, taking care of their budget line. This site is quite confident and Continue reading

boekhouder rotterdam link

10/19/2015 08:40:20 pm

Helaas is deze domeinnaam reeds geregistreerd voor een van onze klanten.U kunt hierboven op de pagina controleren of een domeinnaam nog beschikbaar is voor registratie

live band phuket link

10/23/2015 04:38:04 am

There is no chance of mistake during doing any deep work. You have to be very attentive during your work so that you may perform better in your field and the other people may feel proud of you.

Rs 99 Web Hosting link

10/23/2015 03:16:01 pm

Very informative blog thanks for share it with us really appreciate this.

Hide video/videos link

10/26/2015 05:37:21 pm

隐私卫士（LEO Privacy Guard）是一款免费的手机应用管理软件，提供应用加锁、应用加速、应用备份、照片隐藏、流量监控等功能和服务，彻底解决个人隐私、数据安全和手机卡慢问题，让你的手机安全无忧，轻松享受极致移动互联网生活！

Woodinville Town Car Service link

10/26/2015 07:24:16 pm

Seattle town car service and airport transportation call today for reservations great rates, experienced drivers and personalized service in the Greater Puget Sound area.

NGO Kuwait link

10/26/2015 07:48:04 pm

en.v’s programs focus on three main areas including education, environment and capacity-building, with a particular emphasis on empowering youth, civil society organizations, and social entrepreneurs.