Powershell Cmdlet for FAST Search Document Removal

On a project I’m currently on, we had a scenario where we needed to support being able to quickly remove potentially many documents from the FAST Search index. Unfortunately, the FAST web administration only allows you to delete one document at a time, which would definitely not be suitable for our scenario. We had a couple of ideas on how we were going to tackle the problem. One of the ideas we tossed around was using the FAST Content API. Although we didn’t end up using this technique for the project, I still believed that using the Content API along with Powershell to be a very useful and powerful combination. So today, I spent a little bit of time working on a Powershell cmdlet that can remove many items from the FAST index.

Visual Studio 2010 Project Setup

The first thing to do is to create a Class Library project in Visual Studio and add a reference to the Esp-Contentapi.dll from the ESP SDK. You’ll also want to add a reference to both  System.Management.Automation.dll (found in C:\Windows\assembly\GAC_MSIL\System.Management.Automation\1.0.0.0__31bf3856ad364e35) and System.Configuration.Install.dll (in C:\Windows\Microsoft.NET\Framework\v2.0.50727).

After adding the three dlls, you want to add two class files to the project, a Powershell snap-in class and a class for the cmdlet. In my project, my snap-in class is PointBridge.FAST.Cmdlets.PointBridgeFASTSnapIn and the cmdlet class is PointBridge.FAST.Cmdlets.Content.RemoveContentItem. The code and explanation of these classes follows.

PointBridge.FAST.Cmdlets.PointBridgeFASTSnapIn

This class derives from PSSnapIn and is used to register all the cmdlets in the assembly. When deriving from PSSnapIn, you need to override the following three properties: Name, Description, Vendor.

The class also is decorated with the RunInstaller attribute, in order to be able to install the assembly using installutil.exe.

 1: using System;
 2: using System.Collections.Generic;
 3: using System.Linq;
 4: using System.Text;
 5: using System.ComponentModel;
 6: using System.Configuration.Install;
 7: using System.Management.Automation;
 8:  
 9: namespace PointBridge.FAST.Cmdlets
 10: {
 11:     [RunInstaller(true)]
 12:     public class PointBridgeFASTSnapIn : PSSnapIn
 13:     {
 14:         public override string Name
 15:         {
 16:             get { return "PointBridgeFASTSnapIn"; }
 17:         }
 18:  
 19:         public override string Description
 20:         {
 21:             get { return "Various cmdlets to help with FAST management."; }
 22:         }
 23:  
 24:         public override string Vendor
 25:         {
 26:             get { return "PointBridge";  }
 27:         }
 28:     }
 29: }

 

PointBridge.FAST.Cmdlets.Content.RemoveContentItem

This class, which derives from Cmdlet, is the main class than handles the processing. When building cmdlets, you decorate the class with a Cmdlet attribute. This attribute is used to indicate the verb-noun pair used to invoke your cmdlet. In this instance, because of this attribute, my cmdlet is invoked as ‘Remove-ContentItem’ from the shell.

The RemoveContentItem class has three Powershell parameters:

  • ContentID – the ID of the content to delete from the FAST index.
  • Collection – the name of the collection in FAST where the item is in.
  • ContentDistributor – the server/port of the FAST ContentDistributor.

In the BeginProcessing() method (overridden from the Cmdlet base class), I set up an instance of an IDocumentFeeder object to be used later, when processing each record. The IDocumentFeeder is an interface that allows you to work with a FAST ESP collection for adding/removing/updating documents within that collection. You can get an instance of an IDocumentFeeder by calling the static CreateDocumentFeeder method of the Com.FastSearch.Esp.Content.Factory class.

In the ProcessRecord() method, I call the RemoveDocument() method of the IDocumentFeeder object to queue up the removal of the content item. The ProcessRecord() method is called for each ContentID passed into the cmdlet from the pipeline.

Finally, in the EndProcessing() method, I take care of reporting and clean up. The call to IDocumentFeeder.WaitForCompletion() is used to make sure all deletes that were submitted are complete (successfully or not) before we continue. After the deletes have been processed, I used the IDocumentFeederStatus object returned from IDocumentFeeder.GetStatusReport() to build up a report of the deletes that failed or executed with warnings.

 1: using System;
 2: using System.Collections;
 3: using System.Collections.Specialized;
 4: using System.Management.Automation;
 5: using Com.FastSearch.Esp.Content;
 6: using Com.FastSearch.Esp.Content.Config;
 7: using Com.FastSearch.Esp.Content.Errors;
 8: using Com.FastSearch.Esp.Content.Util;
 9:  
 10: namespace PointBridge.FAST.Cmdlets.Content
 11: {
 12:     [Cmdlet("Remove", "ContentItem")]
 13:     public class RemoveContentItem : Cmdlet
 14:     {
 15:         [Parameter(Mandatory=true, ValueFromPipeline=true, Position=0)]
 16:         public string ContentID { get; set; }
 17:  
 18:         [Parameter(Mandatory=true, Position=1)]
 19:         public string Collection { get; set; }
 20:  
 21:         [Parameter(Mandatory=true, Position=2)]
 22:         public string ContentDistributor { get; set; }
 23:  
 24:         private IDocumentFeeder _feeder = null;
 25:  
 26:         protected override void BeginProcessing()
 27:         {
 28:             base.BeginProcessing();
 29:             try
 30:             {
 31:                 _feeder = Factory.CreateDocumentFeeder(ContentDistributor, Collection);
 32:             }
 33:             catch (Exception ex)
 34:             {
 35:                 WriteError(new ErrorRecord(ex, "ContentFactoryOperationError", ErrorCategory.InvalidOperation, _feeder));
 36:             }
 37:  
 38:         }
 39:  
 40:         protected override void ProcessRecord()
 41:         {
 42:             base.ProcessRecord();
 43:  
 44:             if (_feeder == null) return;
 45:  
 46:             long opID = _feeder.RemoveDocument(ContentID);
 47:             WriteObject(string.Format("Removing item '{0}'. Operation ID: {1}", ContentID, opID));
 48:             
 49:         }
 50:  
 51:         protected override void EndProcessing()
 52:         {
 53:             base.EndProcessing();
 54:  
 55:             if (_feeder == null) return;
 56:  
 57:             _feeder.WaitForCompletion();
 58:             BuildStatusReport(_feeder.GetStatusReport());
 59:             _feeder.Dispose();
 60:         }
 61:  
 62:         private void BuildStatusReport(IDocumentFeederStatus status)
 63:         {
 64:             if (status.HasDocumentErrors())
 65:             {
 66:                 WriteObject(string.Format("Total Errors: {0}", status.NumDocumentErrors));
 67:  
 68:                 foreach (Pair p in status.AllDocumentErrors)
 69:                 {
 70:                     DocumentError error = (DocumentError)p.Second;
 71:  
 72:                     WriteObject(string.Format("Operation ID: {0} Document ID: {1} Error Code: {2} Description: {3}", 
 73:                         (long)p.First, error.DocumentId, error.ErrorCode, error.Description));
 74:                     
 75:                 }
 76:             }
 77:  
 78:             if (status.HasDocumentWarnings())
 79:             {
 80:                 WriteObject(string.Format("Total Warnings: {0}", status.NumDocumentWarnings));
 81:  
 82:                 foreach (Pair p in status.DocumentWarnings)
 83:                 {
 84:                     DocumentWarning warning = (DocumentWarning)p.Second;
 85:                     
 86:                     WriteObject(string.Format("Operation ID: {0} Document ID: {1} Warning Code: {2} Description: {3}",
 87:                         (long)p.First, warning.DocumentId, warning.WarningCode, warning.Description));
 88:  
 89:                 }
 90:  
 91:             }
 92:         }
 93:     }
 94: }

 

Using the cmdlet

In order to use the cmdlet, open up a new Powershell window and use installutil.exe to install the snap-in:

PS> CD [location of assemblies]
PS> set-alias installutil $env:windir\Microsoft.NET\Framework64\v2.0.50727\installutil
PS> installutil PointBridge.FAST.Cmdlets.dll

You only need to run installutil one time and the snap-in can be added on any subsequent Powershell sessions.

The following is an example of how to use the cmdlet:

 1: PS> add-pssnapin pointbridgefastsnapin
 2: PS> $contentids = "http://www.deviantpoint.com/category/ASPNET.aspx", "http://www.deviantpoint.com/?tag=/moss", "FAKEID", "http://www.deviantpoint.com/category/Workflow.aspx"
 3: PS> $contentids | remove-contentitem -collection "WebCollection" -contentdistributor "fsis:16100" | out-file "c:\temp\results.txt"

 

Line 1 just adds the snap-in for use in my current session. Line 2 sets up an array of the content ids I want to delete from my collection. This array (or set of records to process) can be read from a file, database, wherever. Here, I just set it up directly as an example. The third id in the example above is a fake id that doesn’t actually exist in my collection. Lastly, I take my content id array, pipe it to my remove-contentitem cmdlet and the results are sent to an output file (not necessary to push to an output file but I always like to, instead of everything dumping on the screen).

The results of running this cmdlet looks like this:

Removing item 'http://www.deviantpoint.com/category/ASPNET.aspx'. Operation ID: 1
Removing item 'http://www.deviantpoint.com/?tag=/moss'. Operation ID: 2
Removing item 'FAKEID'. Operation ID: 3
Removing item 'http://www.deviantpoint.com/category/Workflow.aspx'. Operation ID: 4
Total Errors: 1
Operation ID: 3 Document ID: FAKEID Error Code: 3 Description: Document e14c677abdbd9678dbfe1e5580de9aef_WebCollection does not exist

The nice thing about wrapping this up in a cmdlet is that I can reuse this cmdlet in my Powershell scripts so that I can easily remove any unwanted content from my collections.

So here is the shameless plug - If you want a copy of the Visual Studio solution, use the Tweet link below to tweet this post. Then send me an email (btubalinal@pointbridge.com) and I’ll send you a copy of the solution.

 

About the author

Bart X. Tubalinal is a Solutions Architect with over 10+ years experience in building enterprise applications. He also considers himself to be, pound for pound, one of the best developers there is.

Archives

Comments

Comment RSS