Archive for April, 2010

Napp-it a free NAS web gui for Nexenta or eon/opensolaris

Friday, April 23rd, 2010

This looks very promising. Nexenta has long been an excellent OpenSolaris based distro that gave the user community a ZFS base filesystem with apt-get functionality all in a web gui configured NAS box package. They had released the core of this OS without the configuration programs as Nexenta Core…

This is the first how-to and implementation I’ve seen for taking the Nexenta Core package and turning it into a turnkey NAS box with all the ZFS, CIFS, and iScsi goodness that Nexenta Core has to offer. I’m thinking of cranking up one as a virtual machine over the weekend… :)

Link to site (English and German)

Installation instructions here: Napp-it installation instructions

Screens here: Napp-it ScreenShots

Web based configuration of a Opensolaris or Nexenta server as a NAS system.
Includes:

  1. Base system with root ssh-access via
    putty, winscp and midnight commander file browser
  2. smb fileserver for
    Mac/ Win workgroups and windows domains
    (OpenSolaris cifs)
  3. nfs and iSCSI san and iSCSI strorage for Apple’s
    Time-Machine
    (OpenSolaris comstar)
  4. backupserver

New features abilities for this version

  • ZFS raid (through raidz3) triple redundancy and automated deduplication
  • CIFS share with full share level Access Control List configuration
  • Comstar iScsi
  • Crossbow

Cool short film. Octopus steals a diver’s camera

Monday, April 19th, 2010

octopus steals my video camera and swims off with it (while it’s Recording)

Spiegel has the most excellent summation on the state of Global Warming…

Sunday, April 18th, 2010

Everyone on either side of the fence should check out this article on the state of climate science. It gets really  good about 8 paragraphs into the article. I am very impressed that they sited both Landsea and Pielke in the Hurricane debate… :)

The Truth in Caller ID Act of 2010 makes Caller ID spoofing a crime!!!

Thursday, April 15th, 2010

Wooo frikkin Hoo!!!!
http://www.engadget.com/2010/04/15/truth-in-caller-id-act-of-2010-makes-caller-id-spoofing-a-crime/

Open Source games I want to try…

Tuesday, April 6th, 2010

Scorched 3D  I cannot tell you how many hours me and my friends spent playing scorched. I really want to see the 3D version….

Open Transport Tycoon Deluxe  clone of the old Microprose trading game…

The Finger Test to Check the Doneness of Meat

Sunday, April 4th, 2010

If you do not have a temp probe handy here is a quick way to assess how done your steak is. When you press on the top of the steak the meat will give a little. The amount it gives will tell you how done the steak is. You can use your own palm as a reference for this method.

Press your finger into the muscle just to the inside of the thumb on your palm. When your palm is open this should give about as much as a raw steak

Now touch the tip of your thumb to the tip of your first finger. When you press the same spot on your palm you will notice that the tension in the muscle makes the muscle more firm. When you are touching the tip of your first finger this approximates the firmness of a rare steak.

Now repeat with the second finger. This would be the equivalent of a medium rare steak.

Now the third finger is approximately medium.

The fourth and final finger is the stage I call ruined. Others may call it “well done”… :)

How to enable Deduplication in the ZFS filesystem on OpenSolaris

Saturday, April 3rd, 2010

Works like a champ here!

http://blogs.sun.com/bonwick/entry/zfs_dedup

Article by Jeff Bonwick is reproduced below (Just to be sure I don’t loose it one day)… ;)

Monday Nov 02, 2009

ZFS Deduplication



You knew this day was coming: ZFS now has built-in deduplication.


If you already know what dedup is and why you want it, you can skip
the next couple of sections. For everyone else, let’s start with
a little background.

What is it?


Deduplication is the process of eliminating duplicate copies of data.
Dedup is generally either file-level, block-level, or byte-level.
Chunks of data — files, blocks, or byte ranges — are checksummed
using some hash function that uniquely identifies data with very high
probability. When using a secure hash like SHA256, the probability of a
hash collision is about 2^-256 = 10^-77 or, in more familiar notation,
0.00000000000000000000000000000000000000000000000000000000000000000000000000001.
For
reference, this is 50 orders of magnitude less likely than an
undetected,
uncorrected ECC memory error on the most reliable hardware you can buy.


Chunks of data are remembered in a table of some sort that maps the
data’s checksum to its storage location and reference count. When you
store another copy of existing data, instead of allocating new space
on disk, the dedup code just increments the reference count on the
existing data. When data is highly replicated, which is typical of
backup servers, virtual machine images, and source code repositories,
deduplication can reduce space consumption not just by percentages,
but by multiples.

What to dedup: Files, blocks, or bytes?


Data can be deduplicated at the level of files, blocks, or bytes.


File-level assigns a hash signature to an entire file. File-level
dedup has the lowest overhead when the natural granularity of data
duplication is whole files, but it also has significant limitations:
any change to any block in the file requires recomputing the checksum
of the whole file, which means that if even one block changes, any space
savings is lost because the two versions of the file are no longer
identical.
This is fine when the expected workload is something like JPEG or MPEG
files,
but is completely ineffective when managing things like virtual machine
images, which are mostly identical but differ in a few blocks.


Block-level dedup has somewhat higher overhead than file-level dedup
when
whole files are duplicated, but unlike file-level dedup, it handles
block-level
data such as virtual machine images extremely well. Most of a VM image
is
duplicated data — namely, a copy of the guest operating system — but
some
blocks are unique to each VM. With block-level dedup, only the blocks
that
are unique to each VM consume additional storage space. All other
blocks
are shared.


Byte-level dedup is in principle the most general, but it is also the
most
costly because the dedup code must compute ‘anchor points’ to determine
where the regions of duplicated vs. unique data begin and end.
Nevertheless, this approach is ideal for certain mail servers, in which
an
attachment may appear many times but not necessary be block-aligned in
each
user’s inbox. This type of deduplication is generally best left to the
application (e.g. Exchange server), because the application understands
the data it’s managing and can easily eliminate duplicates internally
rather than relying on the storage system to find them after the fact.


ZFS provides block-level deduplication because this is the finest
granularity that makes sense for a general-purpose storage system.
Block-level dedup also maps naturally to ZFS’s 256-bit block checksums,
which provide unique block signatures for all blocks in a storage pool
as long as the checksum function is cryptographically strong (e.g.
SHA256).

When to dedup: now or later?


In addition to the file/block/byte-level distinction described above,
deduplication can be either synchronous (aka real-time or in-line)
or asynchronous (aka batch or off-line). In synchronous dedup,
duplicates are eliminated as they appear. In asynchronous dedup,
duplicates are stored on disk and eliminated later (e.g. at night).
Asynchronous dedup is typically employed on storage systems that have
limited CPU power and/or limited multithreading to minimize the
impact on daytime performance. Given sufficient computing power,
synchronous dedup is preferable because it never wastes space
and never does needless disk writes of already-existing data.


ZFS deduplication is synchronous. ZFS assumes a highly multithreaded
operating system (Solaris) and a hardware environment in which CPU
cycles
(GHz times cores times sockets) are proliferating much faster than I/O.
This has been the general trend for the last twenty years, and the
underlying physics suggests that it will continue.

How do I use it?


Ah, finally, the part you’ve really been waiting for.


If you have a storage pool named ‘tank’ and you want to use dedup,
just type this:


zfs set dedup=on tank


That’s it.


Like all zfs properties, the ‘dedup’ property follows the usual rules
for ZFS dataset property inheritance. Thus, even though deduplication
has pool-wide scope, you can opt in or opt out on a per-dataset basis.

What are the tradeoffs?


It all depends on your data.


If your data doesn’t contain any duplicates, enabling dedup will add
overhead (a more CPU-intensive checksum and on-disk dedup table entries)
without providing any benefit. If your data does contain duplicates,
enabling dedup will both save space and increase performance. The
space savings are obvious; the performance improvement is due to the
elimination of disk writes when storing duplicate data, plus the
reduced memory footprint due to many applications sharing the same
pages of memory.


Most storage environments contain a mix of data that is mostly unique
and data that is mostly replicated. ZFS deduplication is per-dataset,
which means you can selectively enable dedup only where it is likely
to help. For example, suppose you have a storage pool containing
home directories, virtual machine images, and source code repositories.
You might choose to enable dedup follows:


zfs set dedup=off tank/home


zfs set dedup=on tank/vm


zfs set dedup=on tank/src

Trust or verify?


If you accept the mathematical claim that a secure hash like SHA256 has
only a 2^-256 probability of producing the same output given two
different
inputs, then it is reasonable to assume that when two blocks have the
same checksum, they are in fact the same block. You can trust the hash.
An enormous amount of the world’s commerce operates on this assumption,
including your daily credit card transactions. However, if this makes
you uneasy, that’s OK: ZFS provies a ‘verify’ option that performs
a full comparison of every incoming block with any alleged duplicate to
ensure that they really are the same, and ZFS resolves the conflict if
not.
To enable this variant of dedup, just specify ‘verify’ instead of ‘on’:


zfs set dedup=verify tank

Selecting a checksum


Given the ability to detect hash collisions as described above, it is
possible to use much weaker (but faster) hash functions in combination
with the ‘verify’ option to provide faster dedup. ZFS offers this
option for the fletcher4 checksum, which is quite fast:


zfs set dedup=fletcher4,verify tank


The tradeoff is that unlike SHA256, fletcher4 is not a pseudo-random
hash function, and therefore cannot be trusted not to collide. It is
therefore only suitable for dedup when combined with the ‘verify’
option,
which detects and resolves hash collisions. On systems with a very high
data ingest rate of largely duplicate data, this may provide better
overall performance than a secure hash without collision verification.


Unfortunately, because there are so many variables that affect
performance,
I cannot offer any absolute guidance on which is better. However, if
you are willing to make the investment to experiment with different
checksum/verify options on your data, the payoff may be substantial.
Otherwise, just stick with the default provided by setting dedup=on;
it’s cryptograhically strong and it’s still pretty fast.

Scalability and performance


Most dedup solutions only work on a limited amount of data — a handful
of terabytes — because they require their dedup tables to be resident
in memory.


ZFS places no restrictions on your ability to dedup. You can dedup
a petabyte if you’re so inclined. The performace of ZFS dedup will
follow the obvious trajectory: it will be fastest when the DDTs
(dedup tables) fit in memory, a little slower when they spill over
into the L2ARC, and much slower when they have to be read from disk.
The topic of dedup performance could easily fill many blog entries —
and
it will over time — but the point I want to emphasize here is that
there
are no limits in ZFS dedup. ZFS dedup scales to any capacity on any
platform, even a laptop; it just goes faster as you give it more
hardware.

Acknowledgements


Bill Moore and I developed the first dedup prototype in two very intense
days in December 2008. Mark Maybee and Matt Ahrens helped us navigate
the interactions of this mostly-SPA code change with the ARC and DMU.
Our initial prototype was quite primitive: it didn’t support gang
blocks,
ditto blocks, out-of-space, and various other real-world conditions.
However, it confirmed that the basic approach we’d been planning for
several years was sound: namely, to use the 256-bit block checksums
in ZFS as hash signatures for dedup.


Over the next several months Bill and I tag-teamed the work so that
at least one of us could make forward progress while the other dealt
with some random interrupt of the day.


As we approached the end game, Matt Ahrens and Adam Leventhal developed
several optimizations for the ZAP to minimize DDT space consumption both
on disk and in memory, key factors in dedup performance. George Wilson
stepped in to help with, well, just about everything, as he always does.


For final code review George and I flew to Colorado where many folks
generously lent their time and expertise: Mark Maybee, Neil Perrin,
Lori Alt, Eric Taylor, and Tim Haley.


Our test team, led by Robin Guo, pounded on the code and made a couple
of great finds — which were actually latent bugs exposed by some new,
tighter ASSERTs in the dedup code.


My family (Cathy, Andrew, David, and Galen) demonstrated enormous
patience as the project became all-consuming for the last few months.
On more than one occasion one of the kids has asked whether we can do
something and then immediately followed their own question with,
“Let me guess: after dedup is done.”


Well, kids, dedup is done. We’re going to have some fun now.