Quick Implementation Note
OStor is implemented using Java which allows for quick prototyping, platform independence and a language of choice in this generation. The core idea is based on this paper – Low-bandwidth file system. Those principles have been applied for both Wide-area acceleration as well as Data deduplication. A future blog post will include detailed implementation notes.
Pre-requisites
- Java 1.6 (Havn’t tested with Java 1.5 and previous versions)
- SVN to check out code.
- External libraries – hadoop, log4j and commons-codec. These are including the SVN repository.
I have installed it out of box on my Mac OSX 10.5, 10.6 as well as Kubuntu 7 and later.
HowTo
Check out the code from http://code.google.com/p/ostor/
- svn checkout http://ostor.googlecode.com/svn/trunk/ ostor
Read the README file – ostor/README
- make install – find ostor.jar
Run ostor in interactive mode
- java -cp *:*:jars/* com.ostor.dedup.core.DedupStorCli .dstor
- type help to look at syntax
- add file data/emacs.html – adds a file to the repository
- show object all - dump all objects in the repository
- INFO [main] (DedupObjectStor.java:170) – Dump object stor, number of object – 1
INFO [main] (DedupObject.java:312) – [Object - data/emacs.html] length – 3170551 num segs – 143 unique:: segs – (143/143) size – (100%) - show segment all - dump all segments in the repository
- INFO [main] (DedupSegment.java:212) – Dump segment – Id – SEGMENT-%D6%CBk%9D%EC%E6%88m%1F%1A%AC%E5%1E%23%FC*%81%F6e%60, len – 30720, num refs – 1, hash – 1strnezmiG0fGqzlHiP8KoH2ZWA=
INFO [main] (DedupSegment.java:212) – Dump segment – Id – SEGMENT-Mfa%EDD%D7%7B%CF%C9%D0%3B%3D%FCk%1E%85cP%84%11, len – 30720, num refs – 1, hash – TWZh7UTXe8/J0Ds9/GsehWNQhBE=
INFO [main] (DedupSegment.java:212) – Dump segment – Id – SEGMENT-%CB%B4%9F%7E%AA%9E%96%0EZ%EB8%B7%C0%D02%5DO%87%E8%95, len – 30720, num refs – 1, hash – y7Sffqqelg5a6zi3wNAyXU+H6JU= - …. and so on
In the next blog spot, I will describe how to run in standalone mode and then in Hadoop mode.