Can you dedupe?

Deduplication or abbreviated dedup. Sounds expensive then something technical, but in fact it is nothing more than eliminating duplicate data on a storage system. That does not make the process an sich less important, because duplicate data naturally takes up the double storage space. And you then backup duplicate data again. All unnecessary and even costly.

Practice

A practical example. Suppose you have a company or organization with 500 people with an e-mail account on Outlook. This company sends a newsletter to staff members every month. Five hundred times this newsletter to five hundred e-mail boxes. This newsletter is saved and backed up five hundred times each month. This can be done differently. By “deduping,” the newsletter is saved only once and all mailboxes contain a reference to this file. In itself, this is not abracadabra.

Refer!

A Silent Brick and some other storage systems work with Object Orientated Storage: the database contains a reference to the location of the file. But if you save this file under a different name, or in a different mailbox, then in the database you get a second reference to the already saved file, a second entry. With that, you already have a fairly simple dedup capability. Silent Bricks, among others, do this automatically.

More advanced, more risk

Now there are also storage systems that claim to have a more advanced dedup. These systems do not look at the whole file, but at pieces in the file. By again making separate entries to these pieces of data, you can dedupe even more and save more space. If you lose one reference in the less advanced way, you lose only the one file this entry refers to. With the more advanced way, you do indeed save more space, but you run a much greater risk. If you lose one reference, you thereby immediately lose all files that contain the small piece, which this entry refers to. On top of that, this way of working requires more computation and therefore time. The advantage is: you save (much) more space, but on the other hand, the risk of losing files is much higher and the system runs slower.

Weigh

The trick is to make a good trade-off between the space you want to save by deduping and the risk you want to run with it. Left or right, deduping is a savings measure. Want to know more? Please inquire about the possibilities!

Subscribe for tips and info

We regularly write blogs on current topics from the world of digital storage technology. Sign up here to be notified about new blogs.