Wednesday, February 24, 2016

Configure one-entry point for multiple ArcGIS for Server sites using IIS

Notes:
Use the information below at your own discretion and risk. Feel free to adjust, modify and leverage to meet your own requirements.
The method proposed in this post aims to be simple for easy adoption though it is highly recommended that it is performed by an experienced IT professional.
The main pros are the fact that no changes are required from the Esri software making it easier to upgrade ArcGIS or to rollback in case of failure. There are also no changes required from your reverse proxy methodology e.g. Netscalers 
This was tested in AGS Server 10.4

Esri delivers geospatial services in both REST and SOAP through ArcGIS for Server. This component is key for organizations wanting to build their Geospatial/IT infrastructure following a Service-Oriented Architecture (SOA). The principle behind service orientation is great for encapsulation and loose coupling and results in greater interoperability between IT systems.

ArcGIS for Server delivers your geospatial data, and business functions through HTTP-based services.

ArcGIS for Server is a component of ArcGIS Server, which includes the Portal, and is part of the ArcGIS suite of products.

For the sake of this post, I will simplify the ArcGIS for Server architecture in two main components:

  • Web Adaptor for IIS

  • GIS Server

The Web Adaptor is an optional component of ArcGIS for Server and allows you to integrate ArcGIS with your existing web server e.g. IIS.

The GIS server is the component that does all the heavy work. When a request is sent to the Web Adaptor it forwards the traffic to an available GIS server for processing.

The most basic deployment consists of installing both the Web Adaptor and the GIS Server in the same machine.



The web adaptor routes requests to the GIS server running in ports 6080 or 6443 via Tomcat listeners.

However, as computing requirements increase it is often required scaling both vertically and horizontally the architecture to comply with non-functional requirements such as high-availability, redundancy and failover.

Typically, the bottleneck is in the GIS servers as they can process operations that are CPU-intensive. The web server can normally handle fairly well in most cases.

For simplicity purposes, this post considers just one web server, however, more could be used following the same technique and having a load balancer behind round-robin requests to the available web servers.

There are two ways in which you can add more GIS servers to the architecture:

§  Clustering

You can add a machine to an existing cluster ‘default’ or to a new cluster. Independently, of which cluster the machine is allocated to, all GIS server machines share the same configuration as they are part of the same site.

 

§  Sites

You can add the machine to a new site. GIS server machines from different sites don’t share any information with each other. They can have for example, different authentication mechanisms each with their own identity stores.

For more details about clustering and sites, please refer to the Esri documentation as it is quite descriptive.

The clustering approach is ideal for most scenarios as it greatly simplifies the architecture and makes it easier to maintain through time, especially for small-sized IT teams. However, sites do offer better flexibility and are suited for more complex deployments where you have many GIS servers involved and the chatting between the servers become a problem. I would recommend monitoring the environment and once sites become a necessity then the architecture should change in accordance.

The same way, if your IIS web servers become a problem you can configure your existing Network Load Balancer (NLB) directly with the GIS servers. There are many ways in which you can optimize your infrastructure to improve performance. However, these optimizations normally come with a significant cost especially in regards to maintenance. The right balance between performance and flexibility is key here.

It is also common to adopt the ‘sites’ pattern from the beginning as a way of guaranteeing that the right building blocks underpinning the architecture are in place. It is always great to have a solid foundation.

Independently of which technique you use you should enforce that the URLs are standardized and consistent through time. In addition, you should also aim to have a single entry point to the organisation’s geospatial services. This is provided out-of-the-box when using clusters but it is more difficult to configure when using multiple sites. As a result, many of the existing sites implementation result in the creation of different entry points to the GIS services, one per site. This can become a problem over time.

For the purpose of this post, consider as an example, the following problem where you have a deployment with two sites and you want site2 to handle all the CPU-intensive geoprocessing (GP) tasks and site1 to handle everything else. Note that at the backend, each site is configured to use different GIS server machines (one or more).

The typical deployment results in two different URLs:

-          https://somedomain.nz/arcgis/rest/services                                  (REST)

https://somedomain.nz/arcgis/services                                           (SOAP)

 

-          https://somedomain.nz/arcgis2/rest/services/Geoprocessing    (REST)

https://somedomain.nz/arcgis2/services/Geoprocessing             (SOAP)

where ‘arcgis’ is a web adaptor pointing to site1 and ‘arcgis2’ is a web adaptor pointing to site2. These URLs are then registered with the Portal during the ArcGIS for Server federation.

This design works fine but it can become a problem as the number of sites increase. You can easily lose track of where a particular service is located and how to establish the naming convention for the folders as if you need to drill down each a bit more it can become a problem in time.

In addition, if you have many different systems leveraging these URLs you don’t want to couple the implementation to a particular site as services are likely to change from site to site over time to comply with your computing requirements.

There are many ways to address this issue e.g. by configuring a reverse proxy. This post describes a simple solution using IIS.

 

Establish one-entry point using IIS

 Goal

§  Expose a unique URL for all sites

Requests to the Geoprocessing folder should be routed automatically to Site2 by IIS:

https://somedomain.nz/arcgis/rest/services/Geoprocessing

https://somedomain.nz/arcgis/services/Geoprocessing

 

Any other requests to ArcGIS for Server should be routed automatically to Site1:

https://somedomain.nz/arcgis/rest/services

https://somedomain.nz/arcgis/services

 

§   Expose the URL for Site2 (for compatibility and admin purposes)

 

e.g.

 

https://somedomain.nz/arcgis2/rest/services/Geoprocessing

https://somedomain.nz/arcgis2/services/Geoprocessing

Note:

You don’t have to expose Site1 for admin purposes because when you access the admin

endpoint or ArcGIS Server Manager using the unique URL you are already accessing Site1. If you

have other sites then you can expose them if you want.

 

This helps transitioning to a unique URL approach. If required clients applications can also access

the sites directly though this is only likely to happen for admin purposes.


 Steps

1.       Uninstall and unregister you existing web adaptors with the sites

2.       Choose a unique URL as an entry point for all the geospatial services within the organization:

e.g. https://somedomain.nz/arcgis

3.       Browse to C:\inetpub, copy the ‘wwwroot’ folder and paste it as ‘wwwroot1’. Make sure that the only content inside is as follows:

 

 

The crossdomain files (clientaccesspolicy.xml and crossdomain.xml) are optional.

 

4.       Repeat step 3 but name the wwwroot folder as ‘wwwroot2’.

5.       Open IIS, right-click on ‘Sites’ to and click on “Add Website”

 

 

6.       Name the site ‘Site1’, and set the properties as per the figure below. Note that the physical path is set to ‘%SystemDrive%\inetpub\wwwroot1’ and that the binding uses HTTPS on port 4431. Choose your own certificate.

 

 

7.       Repeat steps 5 to 6 and create a new website called ‘Site2’. Set the HTTPS binding to use port 4432 and website physical path to: ‘%SystemDrive%\inetpub\wwwroot2’

 

8.       At this point the setup looks like the below:

9.       Install the Web Adaptor for IIS on each website: Site1 (port 4431) and Site2 (port 4432). Note that the virtual directory name should be ‘arcgis’ for both websites. The setup will look like the below.

10.   Open Internet Explorer and launch each web adaptor:

https://somedomain.nz:4431/arcgis/webadaptor

https://somedomain.nz:4432/arcgis/webadaptor

 

11.   Configure the web adaptor running on port 4431 with the GIS servers from site 1 and the web adaptor on port 4432 with site 2.

 

12.   If you intend to expose Site 1 and Site 2 directly then you have two options:

 

§  (Preferred) Configure your reverse proxy to forward requests to ports 4431 and 4432 (normally only 80 and 443 are allowed). You can access each site directly from any machine through the ports

§  If for compatibility reasons you want to use the tag ‘arcgis2’ for site 2 and ‘arcgis’ for site 1 then you only have to install the Web Adaptor in the “Default Web Site”, name it ‘arcgis2’, configure it to use port 443 and register it with Site 2 GIS Servers. You don’t have to install the web adaptor for site 1 (arcgis) as this is already provided through the settings above

 

13.   Log in to the ArcGIS Server Administrator directory for each site and go to system -> properties -> update and set the WebContextURL to the standardized URL (or to arcgis2 in case you have created this web adaptor in the previous step). If you are using a reverse proxy URL you should use this URL instead.

Example:

{

   "WebContextURL": "https://somedomain.nz/arcgis"

}

 

Click Update and restart the ArcGIS for Server service on each GIS server from each site

 

14.   At this point, your reverse proxy methodology cannot reach your web adaptors running on these custom ports. So if you open your browser and type “https://somedomain.nz/arcgis” which runs on port 443 it won’t work. Note that the URLs below do work properly:

 

§  https://somedomain.nz:4431/arcgis

§  https://somedomain.nz:4432/arcgis

 

15.   So the next step consists in configuring your main website on port 443 to forward traffic to the appropriate web adaptor running in ports 4431 and 4432.

 

16.   All custom geoprocessing requests should be forwarded to the web adaptor on port 4432 (site 2). Everything else should be routed to port 4431 (site 1).

 

17.   Log in to ArcGIS for Server Manager on port 4432 (site 2) and create the Geoprocessing folder.

https://somedomain.nz:4432/arcgis/manager

(If the web adaptor was not configured for administrative access the GIS server directly using port 6443)

18.   Publish your custom geoprocessing services in the Geoprocessing folder from site 2 e.g. Routing

 

Example:

https://somedomain.nz:4432/arcgis/rest/services/Geoprocessing/Routing/GPServer

 

19.   Log in to ArcGIS for Server Manager on port 4431 (site 1) and also create the Geoprocessing folder there. Site 1 should always have the list of folders for ALL sites.

 

20.   Install the URL Rewrite extension for IIS

http://www.iis.net/downloads/microsoft/url-rewrite

 

21.   Configure IIS to route traffic to the appropriate web adaptors. Browse to ‘C:\inetpub\wwwroot’ and edit the Web.config file for the ‘Default Web Site’ using a text editor. If it does not exist create it.

 

22.   Locate the section configuration -> system.webServer

Example:

 

23.   Insert the following rules within the system.webServer section:


<rewrite>

  <rules>

    <rule name="Site1">

      <match url="^.*arcgis.*$" />

      <conditions>

        <add input="{HTTPS}" pattern="^ON$" />

        <add input="{HTTP_HOST}" pattern=".*" />

        <add input="{SERVER_PORT}" pattern="^4431|6443|7443$" negate="true" />

        <add input="{PATH_INFO}" pattern="^.*(arcgis2/.*|arcgis2)$" negate="true" ignoreCase="true" />

        <add input="{PATH_INFO}" pattern="^.*(portal/.*|portal)$" negate="true" ignoreCase="true" />

        <add input="{PATH_INFO}" pattern="^(/arcgis/)+(rest/services/|services/)+(Geoprocessing.*/| Geoprocessing)$" negate="true" ignoreCase="true" />

      conditions>

      <action type="Rewrite" url="https://{SERVER_NAME}:4431/{HTTP_URL}" />

      <serverVariables>

        <set name="ORIGINAL_PORT" value="{SERVER_PORT}" />

        <set name="HTTP_Accept-Encoding" value="" />

      serverVariables>

    rule>

    <rule name="Site2">

      <match url="^.*arcgis.*$" />

      <conditions>

        <add input="{HTTPS}" pattern="^ON$" />

        <add input="{HTTP_HOST}" pattern=".*" />

        <add input="{SERVER_PORT}" pattern="^4432|6443|7443$" negate="true" />

        <add input="{PATH_INFO}" pattern="^.*(arcgis2/.*|arcgis2)$" negate="true" ignoreCase="true" />

        <add input="{PATH_INFO}" pattern="^.*(portal/.*|portal)$" negate="true" ignoreCase="true" />

        <add input="{PATH_INFO}" pattern="^(/arcgis/)+(rest/services/|services/)+( Geoprocessing /.*| Geoprocessing)$" />

      conditions>

      <action type="Rewrite" url="https://{SERVER_NAME}:4432/{HTTP_URL}" />

      <serverVariables>

        <set name="ORIGINAL_PORT" value="{SERVER_PORT}" />

        <set name="HTTP_Accept-Encoding" value="" />

      serverVariables>

    rule>

  rules>

  <outboundRules>

    <rule name="ChangeServerResponseLocationValue" preCondition="IsRedirection" patternSyntax="ECMAScript">

      <match serverVariable="RESPONSE_Location" pattern="^(.*)returnUrl=(.*):(4431|4432)(.*)redirect=(.*)(%3A4431|%3A4432)(.*)" />

      <action type="Rewrite" value="{R:1}returnUrl={R:2}{R:4}redirect={R:5}{R:7}" />

    rule>

    <rule name="ChangeServerResponseLocationDuplicateReturnValue" preCondition="IsRedirection" patternSyntax="ECMAScript">

      <match serverVariable="RESPONSE_Location" pattern="^(.*)returnUrl=.*(returnUrl=(.*))$" />

      <action type="Rewrite" value="{R:1}{R:2}" />

    rule>

    <rule name="ChangeServerResponseHREFReturnAndRedirectValues" preCondition="IsHtml" patternSyntax="ECMAScript">

      <match filterByTags="A" pattern="^(.*):(4431|4432)(.*):(4431|4432)(.*):(4431|4432)(.*)$" />

      <action type="Rewrite" value="{R:1}{R:3}{R:5}{R:7}" />

    rule>

    <rule name="ChangeServerResponseHREFReturnValues" preCondition="IsHtml" patternSyntax="ECMAScript">

      <match filterByTags="A" pattern="^(.*):(4431|4432)(.*):(4431|4432)(.*)$" />

      <action type="Rewrite" value="{R:1}{R:3}{R:5}" />

    rule>

    <rule name="ChangeServerResponseHREFValues" preCondition="IsHtml" patternSyntax="ECMAScript">

      <match filterByTags="A" pattern="^(.*):(4431|4432)(.*)$" />

      <action type="Rewrite" value="{R:1}{R:3}" />

    rule>

    <rule name="ChangeServerResponseJSONInfoValues" preCondition="IsText" patternSyntax="ECMAScript">

      <match filterByTags="None" pattern="^(.*):(4431|4432)(.*)$" />

      <action type="Rewrite" value="{R:1}{R:3}" />

      <conditions>

        <add input="{PATH_INFO}" pattern="^.*arcgis.*/rest/info.*$" />

      conditions>

    rule>

    <preConditions>

      <preCondition name="IsRedirection">

        <add input="{HTTPS}" pattern="^ON$" />

        <add input="{PATH_INFO}" pattern="^.*(arcgis2/.*|arcgis2)$" negate="true" ignoreCase="true" />

        <add input="{PATH_INFO}" pattern="^.*(portal/.*|portal)$" negate="true" ignoreCase="true" />

        <add input="{ORIGINAL_PORT}" pattern="^4431|4432|6443|7443$" negate="true" />

        <add input="{PATH_INFO}" pattern="^.*arcgis.*$" ignoreCase="true" />

        <add input="{RESPONSE_STATUS}" pattern="3\d\d" />

      precondition>

      <preCondition name="IsHtml">

        <add input="{HTTPS}" pattern="^ON$" />

        <add input="{PATH_INFO}" pattern="^.*(arcgis2/.*|arcgis2)$" negate="true" ignoreCase="true" />

        <add input="{PATH_INFO}" pattern="^.*(portal/.*|portal)$" negate="true" ignoreCase="true" />

        <add input="{ORIGINAL_PORT}" pattern="^4431|4432|6443|7443$" negate="true" />

        <add input="{PATH_INFO}" pattern="^.*arcgis.*$" />

        <add input="{RESPONSE_CONTENT_TYPE}" pattern="^text/html" />

      precondition>

      <preCondition name="IsText">

        <add input="{HTTPS}" pattern="^ON$" />

        <add input="{PATH_INFO}" pattern="^.*(arcgis2/.*|arcgis2)$" negate="true" ignoreCase="true" />

        <add input="{PATH_INFO}" pattern="^.*(portal/.*|portal)$" negate="true" ignoreCase="true" />

        <add input="{ORIGINAL_PORT}" pattern="^4431|4432|6443|7443$" negate="true" />

        <add input="{PATH_INFO}" pattern="^.*arcgis.*$" />

        <add input="{RESPONSE_CONTENT_TYPE}" pattern="^text/plain" />

      precondition>

      <preCondition name="IsXML">

        <add input="{RESPONSE_CONTENT_TYPE}" pattern="^text/xml" />

      precondition>

    preConditions>

  outboundRules>

rewrite>

24.   The code above performs the following functions:

  • Routes all HTTPS requests to the Geoprocessing folder to Site2 via the unique ‘arcgis’ URL

  • Routes any other AGS Server HTTPS requests via the unique URL ‘arcgis’ to Site1

  • All requests to Portal for ArcGIS (as long as the name in IIS is ‘portal’) are left to the software to function as normal

  • All requests via the ports 4431, 4432, 6443 and 7443 are left to the software to function as normal

  • All requests to ‘arcgis2’ are left to the software to function as normal (example created in case you have installed a web adaptor for direct access to Site 2 using step 12)

  • Rewrites certain links and json text responses to have the correct URL

Assumptions

  • You shouldn’t administrate the sites using the standardized URL. You should access each site individually. Accessing ArcGIS for Server Manager or the administrative page via the standardized URL takes you to the settings of Site 1.

  • Any folder you create in any of the sites should also be created in Site 1 (default site). This site provides information to client applications.

  • Each site should be federated with Portal for ArcGIS.

  • When you access ‘https://somedomain.nz/arcgis/rest/services’ site 1 provides the full service catalogue. As items are selected from the service catalogue the URL is routed to the appropriate site automatically preserving the unique tag name ‘arcgis’

Known Issues

There are no main issues but there are things that you should be aware of:

  • When accessing the administrative web page using the unique URL (takes you to the Site 1 admin page behind the scenes) you get an HTTP 500 error saying that the redirect has an ‘Invalid redirect parameter. Unfortunately, this is because Esri is comparing the URLs before redirecting and aborts the operation when it shouldn’t). The authentication is successful though so you can navigate to Home and check all the properties as normal. As mentioned before, you shouldn’t try to access the admin page using the unique URL but instead you should use the web adaptors directly through the ports 4431 and 4432.

     

     

  • The text in the body of the HTML page may still include the details about the port. In the figure above look to the text with the 4431 portion. For obvious reasons you should use the correct referrer which does not use 4431 but 443. When you are trying to generate a token you should just remove the “:4431” section. Again, you shouldn’t try to access the admin page using the unique URL but instead you should use the web adaptors directly through the ports 4431 and 4432.

     

     

  • While the SOAP WSDL are all exposed using the correct URLs, the URLs within the WSDL include the port number e.g. 4431