[MidoNet-dev] Feature Proposal: Resource Tagging

Pino de Candia gdecandia at midokura.com
Wed Mar 13 13:41:30 UTC 2013


Hi Ryu, 

thanks for the great write-up!

Something was bothering me about this, but I had to sleep on it to figure it out: before we commit to the scan-ZK/index-in-memory approach, I'd like to compare to having the relations in a data-store because:
- I'd like to keep them outside of ZK (minor point though)
- I'd like to be able to examine them without the API server, and to be able to run more than one API server.

Can Cassandra serve this purpose? Doesn't it have most of the features you're looking for? The only comparison points I can come up with are:
- inconsistency window with Cassandra vs. no server redundancy/scalability with Lucene.
- adding relationships after the fact without triggering ZK watchers.
- ease of implementation - not sure which one wins.
- API server startup speed - I think the Cassandra approach wins
- query speed - the Lucene approach definitely wins.

Separately, and it goes for whatever approach we take - assuming we have to migrate Netflix to this new model... how will that work?

Finally, overall I think this is a great idea - we really, really have to have some search capabilities in the API.

thanks,
Pino



On Friday, March 8, 2013 at 10:12 AM, Ishimoto, Ryu wrote:

> Hi Devs,
> 
> I have started writing down my proposal for resource tagging (phase 1) in wiki:
> https://sites.google.com/a/midokura.jp/wiki/midonet/resource-tagging
> 
> There are still some TODOs in the document because I need to consult Lucene developers that we are planning to do this project with for more information, but I thought I have enough to get started. 
> 
> Feedback appreciated!
> 
> Best,
> Ryu
> 
> --------------------------------------------------------------------------------- 
> 
> Resource Tagging - Phase 1 
> This document proposes the integration of Lucene(http://lucene.apache.org/core/ (http://www.google.com/url?q=http%3A%2F%2Flucene.apache.org%2Fcore%2F&sa=D&sntz=1&usg=AFrqEzdFnR2odvAn8VEAwpLLBuOe3teygg)), an open-source indexing library, to enhance the search capabilities of MidoNet API.  In addition, it illustrates the improvement in the current Zookeeper directory structures brought forth by Lucene integration.
> 
> Introduction
> 
> MidoNet resources are stored in Zookeeper, and for those that need to be searched by clients (integration projects such as OpenStack or CloudStack, and MidoNeet Control Panel), index directories are created in Zookeeper during the creation process and cleaned up during the deletion process from the MidoNet code.  This has proven to be highly inefficient because it requires non-trivial amount of development work just to add searching capabilities of the resources.  Furthermore, numerous Zookeeper directories are created only for the indexing purpose which increases the number of directories unneeded by Midolman, making it difficult to inspect data. 
> 
> Assumptions
> 
> Only one instance of MidoNet API server runs at any time.   The detailed reasons for the difficulty of having multiple instances with Lucene integration are explained in the later section. 
> 
> It is currently planned that some parts of the project will be developed by external Lucene developers.  Thus, some parts of the documentation are incomplete as they need further consultation from them.  They will be filled in later. 
> 
> Search capabilities
> 
> Searchable items
> 
> The end goal is to implement search by all fields of all resources.  However, in this section, only those exposed by API are mentioned. 
> 
> Router, Bridge, Port, Port Group, Chain, Rule, Route, BGP, AdRoute, DHCP Subnet, DHCP Host, Host, and Tunnel Zone resource types are searchable.  They are searchable by its id, tags, properties, unique identifier of their parent resource.  The search scope is per resource type (i.e. search by a tag for a particular resource type). 
> 
> id is the unique identifier of each resource.  Most of them are type UUID.  DHCP Host, however, does not have id field, and it requires a combination of bridge ID and DHCP subnet (IP prefix/len) to uniquely identify the resource.  
> 
> Tags are a list of arbitrary strings associated with each resource object.  They can be set by external clients via MidoNet API.  Up to configurable maximum number of tags can be added to each resource.  tags are not considered first class citizens for both Midolman and MidoNet API.  They are meant to be data managed outside of MidoNet, by clients such as OpenStack and MidoNet Control Panel. 
> 
> Properties are key-value pairs of strings associated with each resource object.  These are used internally and cannot be accessed by external clients directly.  The data stored in properties are not relevant to Midolman, but they are relevant to MidoNet API.  The examples of properties are: 
> tenant_id: ID of the resource owner.  This is the ID used to perform authentication and authorization by MidoNet API. 
> 
> Parent resource ID is used to search for a list of sub-resources belonging to the parent resource.  For example, you could do a search for a list of router ports given the ID of the router.   The following sub-resource searches are supported:
> router -> ports
> bridge -> ports
> chain -> rules
> router -> routes
> bgp -> ad_routes
> bridge -> dhcp_subnet
> dhcp_subnet -> dhcp_host
> 
> TODO: add mac table and arp cache when they are done
> 
> Parent-child relationship could be many-to-many, in which case, both resource types could be queried given the ID of the other resource.  For example, tunnel zones can be searched by host ID and hosts can be searched by tunnel zone ID.  The following list shows the resources with such relationship: 
> tunnel_zone <-> host
> port_group <-> port
> 
> 
> 
> Pagination
> 
> Pagination feature is built into Lucene.  
> 
> TODO:  Explain the actual pagination mechanism implemented by Lucene and the corresponding fields introduced in MidoNet API.
> 
> Sorting
> 
> Sorting feature is built into Lucene.   
> 
> TODO:  Explain the actual sorting mechanism implemented by Lucene and the corresponding fields introduced in MidoNet API.
> 
> 
> Indexing
> 
> RAM Index
> 
> Lucene lets indices to be stored in memory, and this is the mode used by MidoNet API.  The indices will be stored in memory of the same host that the API server runs.  This means, however, if multiple API instances are running, the indices must be replicated across all of them.  A typical deployment scenario could that there are multiple API servers running behind a load balancer.   Having indices not in sync among them would expose incorrect behavior to the clients.  This problem will be the main focus in Phase 2. 
> 
> TODO: Explain the details of how Lucene stores indices in RAM.
> 
> MidoNet API Server Start
> 
> The Zookeeper resource data are scanned and indexed when the MidoNet API server starts.  A failure in the indexing process would cause the server to shut down.  In the single API server deployment, no new indices should be introduced during the indexing process at the start up. 
> 
> TODO: Explain the details of how Lucene indexes Zookeeper data.
> 
> New Resource Types
> 
> It is required that when new resource types are added to the system, they will be quickly picked up by the indexer without much effort.  
> 
> TODO: Explain the actual implementation to achieve this requirement.
> 
> API Changes
> 
> General Assumptions
> 
> 
> When searching by unique resource ID or by parent resource ID, the API remains unchanged from the current version.  For all other types of search, query strings are used to filter resources.   
> 
> 
> Tenant Property Search URI
> 
> To search by tenant_id property, the URI template to achieve this is given in the Application resource response .   
> 
> Method: GET  
> Accept: vnd.org.midonet.Application.v1+json
> URI: http://api.example.com/
> 
> => {"tenant_routers_template": "http://api.example.com/routers?tenant_id={tenant_id}", 
>     "tenant_bridges_template": "http://api.example.com/bridges?tenant_id={tenant_id}",
>     "tenant_chains_template": "http://api.example.com/chains?tenant_id={tenant_id}",
>     "tenant_port_groups_template": 
>             "http://api.example.com/port_groups?tenant_id={tenant_id}",
>     ...}
> 
> 
> 
> 
> The actual URI can be constructed by replacing '{tenant_id}' with the actual tenant ID value: 
> 
> Method: GET  
> Accept: vnd.org.midonet.Router.collection.v1+json
> URI: http://api.example.com/routers?tenant_id=foo
> 
> => [{"id": "router1", "tenant_id": "foo", ...}, 
>     {"id": "router3", "tenant_id": "foo", ...}]
> 
> 
> 
> Tag URIs
> 
> tags location can be discovered from the tags URI field in the response of all the resource objects: 
> 
> Method: GET  
> Accept: vnd.org.midonet.Router.v1+json
> URI: http://api.example.com/routers/1
> 
> => {"id": "router1", "tags": "http://api.example.com/routers/1/tags", 
>     "tag_template": "http://api.example.com/routers/1/tags/{tag}"}
> 
> 
> 
> tag_template contains a '{tag}' token, where it should be replaced with the actual token to construct the URI for the DELETE operation.
> 
> Tag Media Type
> 
> New media types, vnd.org.midonet.Tag.v1+json and vnd.org.midonet.Tag.collection.v1+json represent a tag object and a collection of tag objects, respectively.  tag media type is: 
> 
> Name   Type        Description 
> tag    String      Tag of the resource
> uri    URI         URI representing the location of this tag
> 
> 
> Tag Search queries
> 
> Doing a GET on the tags URI  returns all the tags associated with the resource:
> 
> Method: GET 
> Accept: vnd.org.midonet.Tag.collection.v1+json
> URI: http://api.example.com/routers/1/tags
> 
> => [{"tag": "foo", "uri": "http://api.example.com/routers/1/tags/foo"},  
>     {"tag": "bar", "uri": "http://api.example.com/routers/1/tags/bar"}]
> 
> 
> 
> Doing a POST on the tags URI adds a new tag:
> 
> Content-type: vnd.org.midonet.Tag.v1+json
> Method: POST
> URI: http://api.example.com/routers/1/tags
> 
> Body:
> {'tag': 'foo'}
> 
> * This adds a new tag, 'foo', to this router.
> 
> Doing a DELETE on the URI of an individual tag deletes the tag:
> 
> Method: DELETE 
> URI: http://api.example.com/routers/1/tags/foo
> 
> * This deletes a tag, 'foo', from this router.  It is an idempotent operation. 
> 
> 
> Resource search by tags
> 
> To search resources by tags:
> 
> Method: GET 
> accept: vnd.org.midonet.Router.collection.v1+json
> URI: http://api.example.com/routers?tag=os-router&tag=os-router-id|foo
> 
> => [{"id": "router2", ...}, {"id": "router3", ...}, ...]  
> This query would return a list of routers that have tags 'os-router' and 'os-router-id|foo'.
> 
> TODO: Give examples for sort and pagination.  
> 
> Zookeeper directories
> 
> Removing indexing directories
> 
> There are Zookeeper directories that exist only to provide indexing, and they are no longer necessary once Lucene does the indexing.   The following is a list of directories that can be removed: 
> 
>  /tenants 
>       /bridge-names
>       /port_group-names
>       /chain-names
>       /router-names
>       /port_group-names
> 
> 
> 
> 
> Removing unneeded directories  
> 
> There may be other Zookeeper directories that can be removed after Lucene takes over the API queries.  For example, there are directories accessible by IDs: 
> 
> /resource/<id>
> 
> That may no longer be necessary because the ID search is no longer required from Zookeeper.  The end goal is to have Zookeeper directories only contain the minimum amount of data required by Midolman. 
> 
> This type of clean-up will be left for Phase 2. 
> _______________________________________________
> MidoNet-dev mailing list
> MidoNet-dev at lists.midonet.org (mailto:MidoNet-dev at lists.midonet.org)
> http://lists.midonet.org/listinfo/midonet-dev
> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.midonet.org/pipermail/midonet-dev/attachments/20130313/e06558b9/attachment-0001.html>


More information about the MidoNet-dev mailing list