Tuesday, May 25, 2010

The Basics - Google Custom Search API - Google Code

The Basics - Google Custom Search API - Google Code: "- Sent using Google Toolbar"

emlynoregan@gmail.com | My favorites | English | Sign out

Watch Google I/O keynotes live on May 19 and 20!

Google Custom Search API

The Basics

The Getting Started page showed you how you can use the control panel to create a custom search engine and gave you an idea of what you need to specify to create it. This page introduces you to the basic concepts behind Custom Search and gets you ready to create a more powerful search engine with customized search results.

Contents

This page includes the following sections:

Overview

If the control panel does not give you the level of customization that you need, consider using the Custom Search XML or TSV format, which gives you more control, flexibility, and access to more powerful features.

If you want to use the Custom Search API, start with the specification files generated by the control panel. Do not create them from scratch.

XML Basics

Extensible Markup Language or XML is a general-purpose markup language. It is text with tags that you can read. For example, the Custom Search XML format includes the following tags: and .

As with any XML file, your Custom Search specifications must follow XML syntax (content) and be well-formed. XML has the following rules:

  • XML requires you to precede your top-level tags with an XML declaration (), but the Custom Search API doesn't require it. You could, but you don't have to bother prefacing your Custom Search XML with a declaration.
  • All your elements must have an opening tag () and a closing tag ().
  • All your tags must be properly nested. You cannot have XML code that looks like: peanut butter. Instead, it should be like: peanut butter.
  • XML is case-sensitive, so carefully follow the capitalization and spelling of the tags in the instructions.
  • All attribute values must be enclosed in double quotation marks ().
  • All attributes must be defined in the opening tag (), not the closing tag ( ).

You can write notes for yourself using comment tags (), and Custom Search will not parse that line of text as XML code. Apart from writing reminders or description, you can use comments to temporarily put some XML code out of commission (perhaps because you want to experiment with certain effects or you want to troubleshoot issues). However, these comments are not preserved in the files that you download from the control panel. If you want to keep the comments, you should keep a copy of your commented XML files even after you upload them to the control panel.

You can use a simple text editor to create and edit XML files. Just save the text file with the file extension .xml (for example, cse_badminton.xml).

TSV Basics

The Custom Search XML format is not hard to follow, but if you feel uncomfortable using it, you can use the Custom Search TSV (tab-separated values) format. As the name implies, a TSV file is a plain-text file that includes lines of fields (strings of characters) that are separated from each other by single tab stops. You can use a simple text editor or a spreadsheet editor to create and edit TSV files. Just save the text file with the file extension .tsv (for example, cse_bicycles.tsv).

Back to top

What's in a Custom Search Engine Updated!

A custom search engine has two main components housed in XML files:

  • Context - Describes the search engine. It specifies the global settings of the search engine. In the control panel, these settings are defined in the Basics, Refinements, Look and feel, Collaboration, and Make money tabs. Each search engine has its own context file. To learn more about context files, see Defining Your Search Engine Specifications. For more information about selecting the most appropriate file format for your search engine, see Choosing the Right Format for Your Search Engine.
  • Annotations - Lists the sites (webpages or websites) you want your search engine to cover. It is kind of like the index of your search engine and contains information about which sites should be covered and how they should be ranked in the results. In the control panel, the sites are defined in the Sites tab. Each site and its associated information is called an annotation.To learn more about annotations, see Selecting Sites to Search.
  • You can create an annotations file for each context file or you can have one communal annotations file shared by all your search engines. In either case, when you download the annotations file from the control panel, you will get a single annotations file that combines all the annotations from different search engines.

    In addition to these main components, a search engine can also have the following auxiliary files:

  • Promotions - Lists a series of custom results that are triggered by a pre-defined set of query terms. When a user types a search that exactly matches one of your query terms, the promotion appears at the top of the page. You can use promotions to directly answer the queries of your users, lead them to important information, or point them to webpages that are not at the top of the results page yet are especially relevant. In the control panel, promotions are defined in the Promotions tab. To learn more about promotion files, see Creating Special Results.
  • Synonyms - Expands the queries of your users to include variants of the search term. For example, if your user searches for "simian," the search engine also searches for "monkey" and "ape." In the control panel, synonyms are defined in the Synonyms tab.To learn more about synonym files, see Improving User Queries for More Relevant Results.

How the Components Work Together

The context file is the specification for the search engine. It defines the infrastructure of your search engine, such as how the results page should look, what the labels are, how pages should be ranked, and so on. The annotations file, on the other hand, governs the coverage of your search engine. It defines what sites should be searched.

The context file does not make any reference to which annotations file to use, and the annotations file makes no reference to the context file either. So how does Custom Search associate the context with the annotations? It does so through labels. The context file includes labels that identify the search engine, and you tag each annotation in the annotations file with search engine labels. If you change the name of the label in the context file, you have to change all the annotations that have been tagged with that label.

Although you can upload multiple annotations files, when you download them through the control panel, Custom Search merges all your annotations files into a single annotations file. Having a single annotations file for multiple search engines (with their own separate context files) simplifies your work and eliminates replication. It enables you to list sites only once, yet have the flexibility to customize the same site for various search engines. For example, one search engine could restrict its search to some sites, another could eliminate those sites, and yet another could promote those sites.

In the context file, the search engine label that is generated by the control panel looks something like:



When this label looks is associated with a site in the annotations file, it looks something like:




Labels and how they should be used in annotations are discussed in greater detail in the subsequent pages.

Back to top

Creating Advanced Custom Search Engines

Creating advanced custom search engines involves the following steps:

  1. Determine the format that is appropriate for your needs.
  2. Define the specifications for your search engine.
  3. Tell Custom Search which sites to search.
  4. Tell Custom Search how to rank the search results.
  5. Create your profile.
  6. Release your custom search engine.

Editing the Custom Search Files

To work on an XML file, download the XML specification from the control panel. Don't start a file from scratch. Do the following:

  1. Download the context file or annotations file from the Advanced tab of the control panel. Click one of the Download buttons under the appropriate sections.

    Warning: Do not confuse context files with annotations files.

    You can choose to download the files to your hard drive or view them in another browser window or tab window.

  2. Use your browser to save the XML file, or copy the XML text from the webpage and paste it to your favorite text editor. Use a text editor that can handle UNIX-style line endings (WordPad, Emacs, and TextMate works; NotePad doesn't). It does not matter what you name the file, so long as you save it with the file extension .xml (for example, cx_global.xml)
  3. Make a backup copy of the downloaded file in case your edited version does not work as expected, and you have to revert to the previous version.

    If you do not make a copy and the version that you edited does not work properly, you will need to debug your file or recreate your search engine all over again. Not fun.

    Note: Do not skip this step. It takes only little effort, yet it could save you from some tears.

  4. Edit the XML file and save it. Make sure that your text editor is saving the file as a Unicode text document and not some other file format.
  5. Upload the file under the correct section in the Advanced tab.

Choosing the Right Format

Before you start creating your custom search engine, determine which format best suits your needs. You don't want to select a format that is more powerful and complex than what you need, nor do you want to use one that you will quickly outgrow.

Use the following table to pick the appropriate format.

If you want to create... Use this format... Because.... But be aware of the limitations, which are.... For more information, see...
One or few search engines with a small number of sites Control panel You can quickly create your custom search engine by filling out text boxes instead of creating files with a text editor and uploading the files. The control panel is mostly useful for familiarizing yourself with Custom Search and creating search engines with few sites.

If you want to really customize your search engine or add a great number of sites, you might find the following quite limiting:

  • You cannot access all the features available on Custom Search.
  • You have to add sites one at a time in the control panel. Adding large number of sites and managing them can become tedious.
  • You do not have a full control over the look and feel of your search engine and over the ranking in the search results.
Getting Started
Complex search engines that use lots of sites, use feeds, or are programmatically created Context file and annotations files

The Custom Search files let you have a greater level of control over your search engines, and make the tasks of defining and managing sites a lot easier.

Even though you plan to create your search engine using context and annotations files, it's still a good idea to familiarize yourself with the control panel. Tour the different tabs and play with different settings to get a better feel for how Custom Search works. The Preview tab lets you instantly view the results of your experimentation.

The more you customize your search engine, the more complex it becomes. You have to learn the Custom Search elements and attributes, which are not hard to pick up, but they do require you to invest some time.

You will have to read the rest of the developer guide, which is not the most exciting reading material, unfortunately.

Defining Your Search Engine Specifications and Selecting Sites to Search
Search engines that show no ads and let you have the greatest level of control Site Search (business edition)

In addition to the features available to the control panel and the Custom Search files, you have an even greater level of control over the ranking and look-and-feel of the search results page. The pages do not display ads, and you can configure them to not display Google branding.

If you know how to program, you can create presentation templates that transforms the XML feeds of your search results into a format that suits your needs better. If your webpages also include structured data, you can even control the presentation and content of your result snippets.

Site Search also comes with technical support.

To upgrade your search engine, sign in to your Custom Search control panel, select the Business account tab, and click the Convert to Google Site Search button.

It is not free. This edition starts at $100.

To take full advantage of the niftiest features, you have to know how to program and learn the WebSearch Protocol to use the XML feeds of your search results.

To create rich snippets with custom content, you must be familiar with one of the structured data format and invest time in embedding structured data in your webpages. Of course, you must also create your own presentation templates to transform the XML feeds.

Google Site Search (Introduction) and Google WebSearch Protocol Reference

Back to top

Taking the Next Step

Now that you've picked up the basic concepts behind the core components in a custom search engine, you can start creating a context file that specifies your search engine.

< href="http://code.google.com/apis/customsearch/docs/start.html">Getting Started | Forward to Defining Your Search Engine Specifications >


No comments:

Post a Comment