Saturday, July 25, 2009

Using Bridge to share an Image Library

To recap the situation.

The goal is to share a large and growing library of more than 107,000 images among a small group of designers and editors.

I spent some three months creating the central library of some 434 gigabytes on a hard drive, which was then moved to a 2-terabyte mirrored RAID drive. It will remain there until the client migrates to a full-blown digital asset management system in 6 months or a year.

All the images are tagged with metadata and properly keyworded and can be readily and easily accessed by multiple users.

So what's the challenge now?

"Searchbility." That is, can multiple users search the library using the existing metadata and our keywording strategy.

The short answer is a (very) qualified "yes."

Very serious tip: indexing is the key to searchability and the cache is the key to indexing.

Recent tests have shown us that sharing one large central cache that would be regularly updated by the image librarian is NOT workable. Cache structures differ across platforms and across versions of Bridge. And it seems that Bridge enjoys the best performance with a local cache readily at hand.

The first time Bridge access a folder of images it creates the necessary metadata, thumbs and previews and stores all this information in the cache for future use. That way the next time you go to the folder, Bridge can easily find the metadata and there is little waiting around for the thumbs to load and they metadata is readily available to allow searching through the folder. If all the folders in a given "superfolder" have been indexed, then Bridge can quickly search across all the subfolders.

The root problem lies with Adobe’s indexing feature. Or rather it lies with Bridge trying to index not only hundreds but thousands (or tens of thousands) of image files at one time.

Now match this with existing folder structures. In our case I had originally created dozens of folders and subfolders. We had literally thousands of duplicate images that needed to be weeded out in the early phases of creating the library; and in Bridge it was not possible to move thousands of image files around without putting stress on the software.

With an image library of 10,000 images, for example, indexing does not present much a challenge.

But with tens of thousands of image files Bridge’s index ability is really put to the test.

Through a series of tests ran during the past week it became clear that Bridge's indexing threshold is somewhere between 2500 and 8000 images. Our fastest machines running the latest version of Bridge could simply not index 8250 images files; it ground to a halt at about 1400 -/+ remaining. Without more systematic testing we will never know where that exact threshold lies -- and may never know considering the variations that exist across platforms and versions.

Conclusion no. 1: Do NOT expect Bridge to index one large superfolder of images.

So that means you have to create some number of folders in the library. In our case the existing library has 31 top-level folders with probably three times as many subfolders.

But in ordeer to index all the images and metadata each user has to manually go in and index each folder/sub- and sub-sub-folder.

Not reasonable.

Conclusion no. 2: Create a new folder structure that is manageable and workable for the intermediate term.

But wait a minute, how does this solve the indexing issue?

For one thing it helps to streamline the existing folders so they are more easily indexable by each user. But – and here’s the good part. . .

Very, Very Important: Remember when we tried to share one central cache among multiple machines? We learned that didn't work, and in fact created quite an indexing mess. But you can share cache files across machines, at least among Macs at any rate, and this will make indexing much easier and faster for multiple users.

This is how it works:

1. Using the "main library cache" on the primary (image librarian's) machine, all the library files are indexed on this one machine, folder by folder.

2. Once indexed, the cache is copied over to a second machine.

3. Launch Bridge and navigate to a (previously unaccessed/unindexed) folder of images and the thumbs will load almost instantly and the metadata is readily available for searching.

4. Repeat on other machines.

Note: this has only been tested on Macs and only on CS4. I tried two sets of folders, one with 634 image files and another with 1500 files. The first folder of thumbs and metadata loaded almost at once; the second folder took a few seconds longer loading the thumbs.

While this does not completely resolve the indexing problem, it does appear to eliminate the need for each new machine to have to create all the thumbs, previews and metadata files from scratch. And the other users do NOT have to manually access each folder the first time to index all those files. That's a step forward for us.

But what happens when you add images to the library?

Our plan is simply to add new folders -- which we would have to do anyway from the librarian's end -- and then have the other users index those new folders regularly. It's another byte to add to the workflow but indexing smaller number of files (say a few hundred or so every week) is manageable.

Caution: Please bear in mind that this recommendation is still in the preliminary phase. We'll know if there are any serious limitations in the next next week or so as we try this for real life across several machines.

And we must always, always remember that Bridge is really not designed for any of what we are trying to do.

Thanks to the people on the Adobe message boards for bouncing ideas back and forth, and especially to Ramon Castaneda for clarifying the whole cache business.

And thanks to my colleague Mark Baker. Mark took an incredible amount of time away from his own pressing projects to help me work through these issues. He helped me take a more systematic approach to the problems and in particular to the testing. Thanks Mark.

Wednesday, July 22, 2009

Test 2: File handling

OK, so we ruled out any possibility of multiple machines sharing a central cache. Next, we wanted to test how Bridge handled a very large number of files within one folder.

I created a test “Images” folder of 8250 image files, taking up 45+ gigabytes of space.

Machine 1 (PC/Bridge CS3)

Machine 2 (Mac/CS3)

Machine 4 (Mac tower/CS4)

Machine 4 (Mac lap/CS4)

Copying files within drive –

Machine 4. Copy files within library drive – 837 files/4.5 gigs:

Using Bridge copy/rename: “could not process some files” – failed to copy/rename

Using Bridge: “could not process some files” – failed to copy

Using Mac finder (drag and drop): 17 mins

Machine 1. Copy files within library drive -- smaller test of 202 files/800mb:

Using Bridge copy/rename: 17 mins

Using “window to window”: 4 mins

Conclusion: Move rather than copy files.

Performance within drive -

Machine 1. Navigated to image library and accessed “Images” folder – machine hung up at 40% indexing (-/+ 20 minutes) and stopped after 62 mins.

Clicked off onto another folder in library – which worked fine and loaded previews and metadata – and then clicked back on “Images” stalled again/hung up entire program after that.

Machine 2. Navigated to “Images” folder and Bridge loaded after 45 mins Bridge loaded only two image files with thumbs and metadata, all other previews remained blank. Progress wheel stopped but no files loaded.

Machine 1. Attempted to purge and then rebuild cache with the same result as Machine 2: loaded two images with thumbs and metadata but then stopped indexing.

Machine 3. Navigated to “Images” folder and it completed indexing in 15 mins – keyword indexing complete only by manually scrolling down through the image files.

Machine 4. Navigated to “Images” folder and it completed indexing in 18 mins – keyword indexing complete only by manually scrolling down through the image files.

Note: CS4 Preferences allow for adjustment of number of items stored in a given cache – CS3 on the PC does not -- further confusion here since one Mac/CS3 had a slider allowing for "smaller" to "larger" in the cache prefs but another Mac/CS# did NOT.

Conclusion: Significant disparity between CS3 and CS4 in accessing one folder with large number of image files.

Additionally, both late model Mac CS4 machines experienced a significant slow down for adding existing keywords/metadata to the last 1400 or so files of the "superfolder" of 8250 files.

Sunday, July 19, 2009

Test 1: Accessibility

In our search to learn what Bridge's limitations are as a browser, I ran a series of tests recently to see how the program handled our new central library. The new drive, a 2-terabyte mirrored RAID system presently holds a 434-gigabyte library of more than 107,000 image files.

First I tested Bridge's ability to access files using a centralized cache. Based on speculation in our office as well as discussions found online, we posted a central cache in the library itself and pointed the various machines to that one location.

1. On PC/CS3 I created a test image file, tagged it with a unique keyword and moved it into the “Working” folder on the library drive.

2. Placed cursor at root library level and ran search for unique keyword on new image: NO RESULTS. (Expected.)

3. Returned to “Working” folder to “index” the new image.

4. Back to top level and ran search again: NO RESULTS. (Unexpected.)

5. Back to “Working” folder and selected test image – noted metadata/keywords visible in the metadata panel in Bridge.

6. Back to top level and run search: NO RESULTS.

7. Back to “Working” and moved test file to “Hospitality” folder.

8. Back to top and run search: NO RESULTS. (Expected.)

9. Return to “Hospitality” and select image.

10. Run search at top level: NO RESULTS. (Unexpected.)

11. Run search of “Hospitality” keyword: NO RESULTS.

12. Return to “Hospitality” folder and attempted search: NO RESULTS.

13. Various search criteria attempts all produce NO RESULTS.

Repeated test on a Mac/CS3 and with a second PC/CS3 with the same results.

Attempted to repeat test on Mac laptop/CS4 – Bridge produced an error message stating that it could not see the cache it was originally pointed to and wanted to recreate a new LOCAL cache folder.

(Light bulb.)

1. Quit Bridge and disconnected from library drive.

2. Re-launched Bridge CS4 and let it build a new local cache.

3. Reconnected to library, and let it re-index a folder: it opened the thumbs almost immediately and indexed a 2.6 gig file of 564 images in 3-5 secs. I immediately commenced a search and it found the desired images in 4-6 secs.

Conclusion: Centralizing the cache to be shared by different versions of Bridge is a problem.

How/why remains unclear: online forum discussions lead me to believe CS3 and CS4 create different “types” of caches. Adobe has not confirmed.

In any case since each machine would have to be indexed on its own and continue its own indexing, having a central cache seems irrelevant and the benefits (?) don’t justify the potential risks

Saturday, July 18, 2009

Using Adobe Bridge to share image files across a network

Over the past several weeks I have found serious limitations using Adobe Bridge for sharing a centralized library of images. In all fairness, though, Bridge is really not designed to do what we are all want it to do.

Still, like many others on this forum Bridge is the tool of choice for creating a central image library and then organizing and tagging existing and new images. After a series of meetings with the other members of the Creative Services team I began the creation and organization phases in late March, finishing in early July.

But first the basics:

1. We're running a network of a half dozen machines using both Windows and Mac operating systems some with Bridge CS3, most with CS4.

2. The image library is made up more than 107,000 image files, taking up roughly 434 gigabytes on a 2-terabyte mirrored RAID system in our IT data center. (New images are being added nearly every day.)

3. Cataloging software is not an option and a move to a full-blown server-based DAM system is projected but details remains uncertain at the moment.

Lessons learned so far:

1. Creating a large library requires some form of initial or permanent folder structure in order to move files around easily and quickly -- and to permit accurate and effective metadata tagging. Attempting to move more than a few hundred image files at a time can be a challenge for Bridge and will require the peppiest of workstations.

2. Using the primary work machine I pointed all our machines toward a centralized cache files (including the camera raw cache files) as suggested in this forum: right in the library itself, in a specially designated file. I also use this to work on collections of images before placing them in their respective library folders.

Warning! We discovered what may be a serious issue here: a day after pointing our machines to a centralized cache each copy of Bridge could see thumbnails and see the metadata but they could not search using the metadata. Even when they could search, each time a machine opened Bridge and accessed the library the thumbs would load painfully slow. This is both CS 3 and CS4 across platforms.

3. BTW, sharing master keyword lists is a breeze in CS4: just go to keywords panel and export the list to the location of your choice. It creates a .txt file (on the Mac). You can change it like any other text file and then import it right back again. Importing is equally easy: just go to Keywords panel and click on Import. Navigate to the changed keyword list and that’s it.

4. The central problem with Bridge is that each machine needs to fully index the library the first time – you can already see how long that is going to take with tens of thousands of images.

Moreover, there is no effective way to update the library from one source (for example, from the image librarian’s machine) and have all the other copies of Bridge automatically update with the new images or modified images.

Remember! Every time you move a file or modify the metadata for a file each cache on each copy of Bridge needs to be updated. One senses this could easily become a logistical nightmare.

5. Next phase is to duplicate the library and push all the files into one large folder. This should achieve two goals:

a. This should allow for easy updated of each copy of Bridge.

b. All the image files can be renamed using an agreed-upon renaming convention. (Another issue is our image files have a wide variety of file names, many using unacceptable characters (asterisks, pound signs, ampersands) and with spaces, etc.

c. Then we test this across platforms and versions of Bridge.

I should say that as a freelance digital photographer I use Adobe Lightroom 2 for my own image library; however, my present client cannot/does not want to purchase multiple licenses for such an expensive program.

I have also tested Microsoft's Expressions Media 2 and have found this to be a reliable, inexpensive handy little program for creating catalogs as well as simple web galleries to share. And the cool thing is that MS distributes a cross-platform catalog reader for free!

Wednesday, July 8, 2009

Issue no. 614 - filename problems

If you plan to move a large number of files across platforms it might be a good idea to ensure the image files and in particular their filenames carry with them no potential problems. We are in the process of moving the entire image library from its present location on an external hard drive to a new mirrored RAID server system and the wide variety of filenames seem to cause serious slowdowns in the migration process. In fact they seem to stop it altogether.

Part of this is probably the quirkiness in migrating from the Mac to a PC. But it should also be noted that many of the filenames used -- the images come from a large number of sources -- themselves have unusual and generally unacceptable characters (pound signs, ampersands, spaces, and even tildes).

Lesson learned: clean up the filenames prior to migration.

Sunday, July 5, 2009

Hardware for less than $500?

So, exactly what sort of "stuff" will you need to set up a digital asset management system? And how much is it going to cost?

1. Digitized photos
2. A computer
3. Cataloging software
4. A backup strategy

If you don't have the photos digitized then you'll have to either buy a scanner or have the images put on disk.

I'll assume you already have a computer. Ideally you would want a dedicated computer station for archiving photos -- particularly for very large collections. Moving hundreds or sometimes several thousand image files around at one time can tax an older machine; especially if it's bloated with games, lots of other software etc.

Aside from the computer the biggest cost is going to come from software and the backup system used to protect your image library. (Software will be discussed in our next episode.) In the ideal world a serious backup strategy centers 2+1:

a primary and a secondary external hard drive and a disk (CD or DVD)

However, backing up to CD is time-consuming and tedious and CD-Rs hold such a small amount of data. DVD-Rs are are a step upwards, but they, too, can easily be outpaced by the growth of your library. Blu-ray is the new disk standard -- each disk can hold up to ten times the amount of information on the average DVD-R and for the moment that makes for a sensible disk component in any backup system.

But for most amateurs and many professionals, a disk component is simply not a viable alternative. For cost reasons or time-management, most will rely on the external hard drive as the basis for a backup management program.


So, where does that leave us? In simple terms:

Scanner: $150
Hard drives, two 1-terabyte drives: $350

You can certainly spend less on a scanner but consider what you're scanning for. Also, if you have lots of slides and negatives that will also narrow your selection of available scanners -- and probably increase your cost as well. You'll want the best quality digital images so I'd rather err of spending a little too much than not enough.

The same is true of hard drives. Buy new, high-quality drives and get the largest size you can afford. Believe me you'll grow into them.

External hard drives are more reliable today than ever and their cost continues to fall. My suggestion is to get the largest pair of drives you can afford. And since many drives come with backup software as part of the purchase package, that's one less cost for you. Look for drives that have as many different types of connections as possible (3 or 4 is optimum but don't settle for just a single connection type.)

Also the drive costs noted above are for single drive systems, not for RAID or multiple drive configurations. (RAID stands for "redundant array of independent disks".) If you decide to boost your backup options through using a RAID mirrored then plan to increase your costs by half again as much.

For a larger image library and with a somewhat more generous budget, consider the following configuration:

scanner: $200
hard drives (on-site), 2+2-gigabyte RAID mirror: $800
blu-ray recorder: $300
blu-ray discs: $250 (cakebox of 50)
OR
hard drives (off-site, optional), 1-terabyte: $200

To find out more about RAID systems and whether it's right for you visit Wikipedia's in-depth discussion online.

Personally, I use both LaCie or OWC hard drives. No, they don't pay me to say that -- in fact they probably aren't even paying attention. Anyway, I've used LaCie for years and never had one fail yet -- and OWC is equally reliable. I like for their size and portability. Both come bundled with good backup software: LaCie bundles Genie and Intego backup; OWC bundles NovaStar for Windows and Prosoft's Data Backup for the Mac.

For example, right now I am using a pair of 500-gigabyte OWC "On-the-Go" drives for working and primary backup and then a 1-terabyte OWC desktop drive as a secondary backup drive for a client. Eventually we'll move their library to a 2-terabyte RAID-mirror system.

You can find LaCie online at: http://www.lacie.com/us/index.htm. And OWC can be found at: http://eshop.macsales.com/

Oh, and you can research and buy LaCie drives at OWC as well.

If you're the least paranoid -- and frankly you should be -- I'd recommend spending a bit more and using a desktop and portable off-site backup system. (Say in a safety deposit box or a basement.) We'll talk more of this later when we discuss "process" but you might consider using portable drives as a secondary backup.

OK, so that's the easy part.

Next week things get a bit harder as we dive into cataloging software. We'll also talk about using metadata and later we'll get to the really hard part: workflow and process.