OStor – data deduplication in the cloud – HowTo

November 1, 2009

Quick Implementation Note

OStor is implemented using Java which allows for quick prototyping, platform independence and a language of choice in this generation. The core idea is based on this paper – Low-bandwidth file system. Those principles have been applied for both Wide-area acceleration as well as Data deduplication. A future blog post will include detailed implementation notes.

Pre-requisites

  • Java 1.6 (Havn’t tested with Java 1.5 and previous versions)
  • SVN to check out code.
  • External libraries – hadoop, log4j and commons-codec. These are including the SVN repository.

I have installed it out of box on my Mac OSX 10.5, 10.6 as well as Kubuntu 7 and later.

HowTo

Check out the code from http://code.google.com/p/ostor/

  • svn checkout http://ostor.googlecode.com/svn/trunk/ ostor

Read the README file – ostor/README

  • make install – find ostor.jar

Run ostor in interactive mode

  • java -cp *:*:jars/* com.ostor.dedup.core.DedupStorCli .dstor
  • type help to look at syntax
  • add file data/emacs.html – adds a file to the repository
  • show object all - dump all objects in the repository
  • INFO [main] (DedupObjectStor.java:170) – Dump object stor, number of object – 1
    INFO [main] (DedupObject.java:312) – [Object - data/emacs.html] length – 3170551 num segs – 143 unique:: segs – (143/143) size – (100%)
  • show segment all - dump all segments in the repository
  • INFO [main] (DedupSegment.java:212) – Dump segment – Id – SEGMENT-%D6%CBk%9D%EC%E6%88m%1F%1A%AC%E5%1E%23%FC*%81%F6e%60, len – 30720, num refs – 1, hash – 1strnezmiG0fGqzlHiP8KoH2ZWA=
    INFO [main] (DedupSegment.java:212) – Dump segment – Id – SEGMENT-Mfa%EDD%D7%7B%CF%C9%D0%3B%3D%FCk%1E%85cP%84%11, len – 30720, num refs – 1, hash – TWZh7UTXe8/J0Ds9/GsehWNQhBE=
    INFO [main] (DedupSegment.java:212) – Dump segment – Id – SEGMENT-%CB%B4%9F%7E%AA%9E%96%0EZ%EB8%B7%C0%D02%5DO%87%E8%95, len – 30720, num refs – 1, hash – y7Sffqqelg5a6zi3wNAyXU+H6JU=
  • …. and so on

In the next blog spot, I will describe how to run in standalone mode and then in Hadoop mode.

Moved project to google code

November 1, 2009

Old project homepage

http://ostor.sourceforge.net/

New project homepage

http://code.google.com/p/ostor/

I found it much easier to manage the project on Google Code than sourceforge – quite unfortunate considering I wanted to stay vendor neutral. It was so much more easier to link my blogs, manage wiki’s, etc. on google code.

Introducing OStor – data deduplication in the cloud. Open source project.

November 1, 2009

I have just launched a open source project to provide data deduplication services. Short bio from the project summary page – http://code.google.com/p/ostor/.

OStor (Optimized Storage) is a service to store data optimally using block level data de-duplication and compression techniques. It can be used as a standalone tool, an interactive tool as well as in the cloud leveraging using Hadoop Map-Reduce framework.

History

In recent years, cloud computing has emerged as a new paradigm in the tech industry. As more and more IT Infrastructure moves into the cloud, data is being generated in the cloud at an unprecedented rate. Data also gets fed into the cloud from other sources. A portion of this data has to be retained for archival purposes. As data gets versioned and archived, a pattern emerges that mimics what was observed in the traditional IT environments – need to do data deduplication – elimination of redundant data. In traditional IT environments, various vendors provide such services since half-a-dozen years. Most notably, Data Domain which was acquired by EMC recently.Their solutions are hardware based. Since the new generation of cloud services (Amazon AWS, Microsoft Azure, Google Apps) are based on virtualization, customers are reliant on either the cloud provider to provide such enhanced services or they use software-only solutions.

OStor attempts to bridge this gap. I will add details about the implementation and howto documentation in subsequent blog posts. Stay tuned.

Hello world!

November 1, 2009

Welcome to WordPress.com. This is your first post. Edit or delete it and start blogging!


Follow

Get every new post delivered to your Inbox.