Wow I just wrote a big post and SO choked and hung on it, and when I hit my back button to resubmit, the markup editor was empty. aaargh.
So here I go again...
Regarding Stack Overflow, it turns out that they use SQL server 2005 full text search.
Regarding the OS projects recommended by @Grant:
- *DotNetKicks uses the DB for tagging and Lucene for full-text search. There appears to be no way to combine a full text search with a tag search
- Kigg uses Linq-to-SQL for both search and tag queries. Both queries join Stories->StoryTags->Tags.
- Both projects have a 3-table approach to tagging as everyone generally seems to recommend
I also found some other questions on SO that I'd missed before:
What I'm currently doing for each of the items I mentioned:
- In the DB, 3 tables: Entity, Tag, Entity_Tag. I use the DB to:
- Build site-wide tag clouds
- browse by tag (i.e. urls like SO's /questions/tagged/ASP.NET)
- For search I use Lucene + NHibernate.Search
- Tags are concat'd into a TagString that is indexed by Lucene
- So I have the full power of the Lucene query engine (AND / OR / NOT queries)
- I can search for text and filter by tags at the same time
- The Lucene analyzer merges words for better tag searches (i.e. a tag search for "test" will also find stuff tagged "testing")
- Lucene returns a potentially enormous result set, which I paginate to 20 results
- Then NHibernate loads the result Entities by Id, either from the DB or the Entity cache
- So it's entirely possible that a search results in 0 hits to the DB
- Not doing this yet, but I think I will probably try to find a way to build the tag cloud from the TagString in Lucene, rather than take another DB hit
- Haven't done this yet either, but I will probably store the TagString in the DB so that I can show an Entity's Tag list without having to make 2 more joins.
This means that whenever an Entity's tags are modified, I have to:
- Insert any new Tags that do not already exist
- Insert/Delete from the EntityTag table
- Update Entity.TagString
- Update the Lucene index for the Entity
Given that the ratio of reads to writes is very big in my application, I think I'm ok with this. The only really time-consuming part is Lucene indexing, because Lucene can only insert and delete from its index, so I have to re-index the entire entity in order to update the TagString. I'm not excited about that, but I think that if I do it in a background thread, it will be fine.
Time will tell...
What happens
When the user views a form to create, update, or destroy a resource, the Rails app creates a random authenticity_token
, stores this token in the session, and places it in a hidden field in the form. When the user submits the form, Rails looks for the authenticity_token
, compares it to the one stored in the session, and if they match the request is allowed to continue.
Why it happens
Since the authenticity token is stored in the session, the client cannot know its value. This prevents people from submitting forms to a Rails app without viewing the form within that app itself.
Imagine that you are using service A, you logged into the service and everything is ok. Now imagine that you went to use service B, and you saw a picture you like, and pressed on the picture to view a larger size of it. Now, if some evil code was there at service B, it might send a request to service A (which you are logged into), and ask to delete your account, by sending a request to http://serviceA.com/close_account
. This is what is known as CSRF (Cross Site Request Forgery).
If service A is using authenticity tokens, this attack vector is no longer applicable, since the request from service B would not contain the correct authenticity token, and will not be allowed to continue.
API docs describes details about meta tag:
CSRF protection is turned on with the protect_from_forgery
method,
which checks the token and resets the session if it doesn't match what
was expected. A call to this method is generated for new Rails
applications by default.
The token parameter is named authenticity_token
by default. The name
and value of this token must be added to every layout that renders
forms by including csrf_meta_tags
in the HTML head.
Notes
Keep in mind, Rails only verifies not idempotent methods (POST, PUT/PATCH and DELETE). GET request are not checked for authenticity token. Why? because the HTTP specification states that GET requests is idempotent and should not create, alter, or destroy resources at the server, and the request should be idempotent (if you run the same command multiple times, you should get the same result every time).
Also the real implementation is a bit more complicated as defined in the beginning, ensuring better security. Rails does not issue the same stored token with every form. Neither does it generate and store a different token every time. It generates and stores a cryptographic hash in a session and issues new cryptographic tokens, which can be matched against the stored one, every time a page is rendered. See request_forgery_protection.rb.
Lessons
Use authenticity_token
to protect your not idempotent methods (POST, PUT/PATCH, and DELETE). Also make sure not to allow any GET requests that could potentially modify resources on the server.
EDIT: Check the comment by @erturne regarding GET requests being idempotent. He explains it in a better way than I have done here.
Best Answer
The two most popular plugins (according to Ruby Toolbox) both use two separate models to realize tagging of arbitrary classes. Since your classes seem to be known beforehand, you might get away with using just one. Here are the proper URLs to both plugins for reference: