Lucene Solr – Multi Core vs Multiple Instance for Different Schema Documents

lucenesolr

I have performance concerns and wanted suggestions which work best for Multi Core or Multi Instance(with different port)?

My Case First:

Currently I am running Solr with multiple cores and its running OK. There is only one issue that sometime it throws "out of heap memory while processing facets fields", after which I have to restart the solr. ( To minimize the no. of restarts, I starts the Solr with high memory : java -Xms1000M -Xmx8000M -jar start.jar )

I have amazon EC2 instance with 8core-2.8GHtz /15GB Ram with optimized hard disk.

I have many database-tables(about 100) and have to create different schemas for each(leads to create different core).

Each table has millions of documents, with 7-9 indexed fields and 10-50 stored fields per document.

My web portals should handle very high traffic (currently I m having 10 request/second, may increase to 50-100/second). I know 'Solr' can handle that but it is just to inform you that I am concerned about every-smallest performance issue

Searching Solr by PHP and CURL in to specific core, so there is no problem of searching on different Solr instance also.

Question:

As per my knowledge Solr handles one request at a time. So I think if I create multiple instance of solr and starts those at different port, then my web portal can handle more request at a time. (if user search in different table).

So, what you will suggest me? Multi Core in Single Solr Instance? or Multiple Instances with Single/Dual Core in each?

Is there any problem in having multiple Solr instances running at different ports?

NOTE: Here, I can/may/will combine less-searched-core(s)/small-core(s) in one instance AND heavy-traffic-core(s) in separate instance OR two-three-heavy-traffic-core in one-instance etc. Coz, creating different Instances for each table(~100 here) will take too much hardware resources.

Best Answer

Solr can handle multiple requests at a time.

I have tested it by running a long query [qTime=7203, approx. 7sec] and several small-queries-after the long-one [qTime=30], Solr responds for smaller queries first, even if they ran after the long-query.

This point gives much reason in answer: Use single solr instance with multiple core. Just assign High memory to JVM.

Other Points:

1. Each solr instance will require RAM, so running multiple instances will require more resources, which will be expensive. And if you are using facets, sort fields then you need to allocate more RAM to each instance.

As you can see in my case I need to start the solr with high memory(8GB). You can see a case for Danish Web Archive, Which uses multiple instances and allocated 9GB RAM to each with cumulative 256GM total RAM.

2. You can run multiple instances of solr on different PORTS by using the command java -Djetty.port=8984 -jar start.jar. Everything ran ok BUT I got one problem.

While indexing it may give "not enough memory error" and then solr instance will be killed. So you need to start second instance with high memory, which will lead to more RAM requirement.

3. Solr Resource Requirement and Performance Problem can be understand here. According to this 64bit environment and 12GB RAM is recommended for good performance. Solr Optimization are explained here.

Related Solutions

Security – different levels of security configuration for Apache SOLR

Here's how to filter requests by user's IP address, using Tomcat's Valve Component: http://wiki.apache.org/tomcat/FAQ/Security#Q6

You can use Tomcat Basic authentication to restrict access to specific URL patterns.

Your Solr application's web.xml:

 <login-config>
    <auth-method>BASIC</auth-method>
    <realm-name>Admin and Update protection</realm-name>
 </login-config>

 <security-constraint>
    <web-resource-collection>
        <web-resource-name>Hr core administration</web-resource-name>
        <url-pattern>/coreHr/admin/*</url-pattern>
    </web-resource-collection>

    <auth-constraint>
        <role-name>solradmin</role-name>
    </auth-constraint>
 </security-constraint>

 <security-constraint>
    <web-resource-collection>
        <web-resource-name>En core administration</web-resource-name>
        <url-pattern>/coreEn/admin/*</url-pattern>
    </web-resource-collection>

    <auth-constraint>
        <role-name>solradmin</role-name>
    </auth-constraint>
 </security-constraint>

  <security-constraint>
    <web-resource-collection>
        <web-resource-name>Hr core update</web-resource-name>
        <url-pattern>/coreHr/update*</url-pattern>
    </web-resource-collection>

    <auth-constraint>
        <role-name>solradmin</role-name>
    </auth-constraint>
 </security-constraint>

  <security-constraint>
    <web-resource-collection>
        <web-resource-name>En core update</web-resource-name>
        <url-pattern>/coreEn/update*</url-pattern>
    </web-resource-collection>

    <auth-constraint>
        <role-name>solradmin</role-name>
    </auth-constraint>
 </security-constraint>

tomcat-users.xml:

<role rolename="manager"/>
<role rolename="admin"/>
<role rolename="solradmin"/>
<user username="mbo" password="mbo11" roles="manager,admin,solradmin"/>

Best Metrics for Monitoring Apache Solr instance

Sunspot uses solr, doesn't it? If so, why can't you expose Solr stats via JMX. Then you can just point something like check_jmx, solr-nagios-check, or jmxquery at it. The former can check different performanc metrics, not just "is it up?"; not sure about the other two.

Opsview has put together a decent guide for how this all works. Most of if should apply to Nagios, too.