Algorithms and Data Structures Library for Ruby

kanwei's picture

F88AD820-36C4-48C7-B26E-0F4B77131D50.jpg

I got accepted into the 2008 Google Summer of Code program! It’s a Google initiative to pay programmers to work on open source software projects. I’ll be updating my blog more this summer.

I will be writing a Ruby library to implement various algorithms and data structures so that they do not have to be re-implemented in other projects. My submitted proposal is as follows:

Using the right data structure or algorithm for the situation is an important aspect of programming. In computer science literature, many data structures and algorithms have been researched and extensively documented. However, there is still no standard library in Ruby implementing useful structures and algorithms like Splay Trees, Red/Black Trees, tries, graphs, different sorting algorithms, etc. This project will create such a library with documentation on when to use a particular structure/algorithm. It will also come with a benchmark suite to compare performance in different situations.

By leveraging the flexibility of Ruby’s duck typing, the structures and algorithms can be used on all classes of objects as long as a few methods are defined, much like the Enumerable mix-in. The library will be written in Ruby so that it can be used by all the different Ruby implementations, but will also be implemented in C and Java for performance.

Possible data structures and algorithms:

  • Trees (AVLTree, SplayTree, Red/Black Tree)
  • Heaps
  • Tries (Radix Tree, Suffix Tree)
  • Graphs (Adjacency List, Kruskal’s Algorithm, Djikstra’s Algorithm)
  • Sorting (Mergesort, Quicksort, Radix Sort, Insertion Sort)
  • Searching (Knuth-Morris-Pratt, Breadth-first tree search, depth-first tree search)

Motivation:

To create a fast ruby library of common data structures and algorithms so that they do not have to be re-implemented for other projects.

Organization:

  • The data structures part of the library will be implemented as classes to be instantiated.
  • The sorting algorithms will be implemented as mix-ins, much like the Enumerable and Comparable libraries that require the user to define #each and #<=> respectively in order to get additional methods. The search methods could be added to built-in structures like arrays and strings, but this may cause conflicts with existing code and this must be considered.
  • Algorithms related to certain data structures, such as Graph searches, will be implemented for the appropriate structure.

Reference:

The primary literature referenced will be Robert Sedgewick’s Algorithms (Parts 1 to 5), as it has both code examples and proofs for code complexity. I have used this book in many classes and it is a definitive resource for algorithms and data structures. For homework, I have implemented many of the structures and algorithms in Java, and have experience in writing benchmarks to test for different cases.

Implementation:

I plan to first write the entire library in Ruby and then implement it in C and Java for performance. Data structures and algorithms are low enough level that this is necessary. I have experience in both of the latter languages and should have no problem finishing over summer.

A benchmark suite will be created in Ruby (probably using RSpec) to demonstrate the best, average, and worst case running times for each algorithm and structure.

Documentation:

A key aspect of the project will be to provide documentation with real-life code examples to demonstrate the usage of the algorithms and structures. For example, the documentation entry for LSD radix sort will say that it quickly sorts numbers and strings, and will list the Big O complexity to be linear. A benchmark comparing the results with that of the Enumerable#sort will be shown, and hopefully the radix sort will be faster as it is O(n) and not O(n log n).

Explaining data structures will be trickier, as experience and theoretical background is needed to choose the right data structure for the problem. For example, many real world problems can be solved in terms of graphs. I will write up some realistic examples to demonstrate the use of trees, heaps and graphs, the main structures to be implemented. As with the algorithms, the documentation will provide benchmarks and Big O complexity.

Comments

robbyoconnor's picture

best of luck dude! -r0bby

best of luck dude!

-r0bby (Robert O’Connor)

Hey man! Good luck with the

Hey man! Good luck with the project!

I really enjoyed this post,

I really enjoyed this post, thanks for putting in the great effort to write and post..

Regards,
Jenny

Brilliant. Looking forward

Brilliant. Looking forward to see the result of this!

Outstanding

If you deliver this you will have added immensely to the Ruby community.

All the best!
J

Good luck

Hello! Good luck with your project!

Interesting

Hey man! It looks like a very interesting project, I hope you have a lot of success with it!

Brilliant. Looking forward

Brilliant. Looking forward to see the result of this!

Results

Good luck with it, make sure you keep us up to date on the results.

The plan for these

The plan for these high-school seniors with a major case of senioritis: break into Principal Dickwalder’s house and throw the party of the year. But Principal Dickwalder overhears Adam explaining the plan; ruining the celebration… or does it? Adam’s only chance at redemption is to host the party himself. Gorgeous models stripteases a gigantic gravity bong a funeral and some crepes are the key elements to the best skip-day party ever. Their only problem? Prinicipal Dickwalder is on a mission to stop the festivities. This hilarious fun-filled adventure will make you want to plan a skip day of your own.

great job

well it looks like you have your hands ful

that all sounds very difficult, good luck and let us know how it turns out