2017年10月24日 星期二

ASP.NET MVC 產生特定 HTML 給網路爬蟲、機器人(Bot、FbBot、Crawler)

最近遇到一個問題是要讓 Facebook 的機器人來抓取網頁時,HTML 需送出特定處理過的 head 內容,head 標籤裡的內容前端無法先產出再給 FBbot所以這邊我就採用 IIS 擴充功能 URL rewrite module 的方式,利用建立的規則來將所有名單內的來源通通轉到我們指定的頁面去,以下就開始一個簡單的實作:

1. 先新增一個 Controller,命名為 SeoController,程式碼如下

using System;
using System.Configuration;
using System.Linq;
using System.Web.Mvc;

namespace CarrefourWeb.Controllers
{
    public class SeoController : Controller
    {
        // GET: Seo
        public ActionResult Index()
        {
            return View();
        }

        //For Facebook Share 
        public ActionResult BotCrawler(string uri)
        {
            //給 Html Head 用
            var title = "爬蟲機器人專屬!!";
            var description = "這是專門給網路機器人或爬蟲看的敘述";
            var url = Request.Url == null ? "http://linmasaki09.blogspot.tw/" : "http://" + Request.Url.Authority;
            var image = "http://www.icons101.com/icon_ico/id_36436/Mushroom__1UP.ico";  //預顯示的縮圖連結
   
            //給 Html Body 用
            var body = "<div>Hello Bots...</div>"
   
            //加入其它客製化內容...   
   
            ViewData["title"] = title;
            ViewData["description"] = description;
            ViewData["url"] = url;
            ViewData["image"] = image;
            ViewData["body"] = body;   

            return View();
        }

    }
}

2. 新增一張 View 對應到 BotCrawler Action程式碼如下

@{
    Layout = null;
}
<!DOCTYPE html>
<html>
<head>
    <meta name="viewport" content="width=device-width" />
    <meta property="og:title" content="@ViewData["title"]" />
    <meta property="og:description" content="@ViewData["description"]" />
    <meta property="og:url" content="@ViewData["url"]" />
    <meta property="og:image" content="@ViewData["image"]" />
    <title>"@ViewData["title"]"</title>
</head>
<body>
    @ViewData["body"]
</body>
</html>

3. 再來就是新增修改 web.config 裡 system.webServer 下的 rewrite 區段的規則,如下

<?xml version="1.0" encoding="utf-8"?>
<!--
  For more information on how to configure your ASP.NET application, please visit
  http://go.microsoft.com/fwlink/?LinkId=301879
  -->
<configuration>
  .......
  <system.webServer>
    .......
    <rewrite>
      <rules>
        <!-- rules go below -->
        <rule name="Static Html Content For Bot" stopProcessing="true">
          <match url="^(.*)$"/>
          <conditions>
            <add input="{HTTP_USER_AGENT}" ignoreCase="true" pattern="MSNBot|bot|googlebot|crawler|spider|robot|crawling|Facebot|facebookexternalhit"/>
          </conditions>
          <action type="Rewrite" url="/Seo/BotCrawler/{REQUEST_URI}" appendQueryString="false"/>
        </rule>
      </rules>
    </rewrite>
    .......
  </system.webServer>
  .......
</configuration>

4. 修改 RouteConfig (非必要)

using System.Web.Mvc;
using System.Web.Routing;

namespace CarrefourWeb
{
    public class RouteConfig
    {
        public static void RegisterRoutes(RouteCollection routes)
        {
            routes.IgnoreRoute("{resource}.axd/{*pathInfo}");
            routes.IgnoreRoute("{*allfiles}", new { allfiles = @".*\.(css|js|gif|jpg|png)" });

            //加入以下這段,先行攔截 Controller 為 Seo 的 Action
            routes.MapRoute(
                name: "CatchAllBot",
                url: "Seo/{action}/{*uri}",
                defaults: new { controller = "Seo", action = "Index", uri = "" }
            );

            routes.MapRoute(
                name: "Default",
                url: "{controller}/{action}/{id}",
                defaults: new { controller = "Home", action = "Index", id = UrlParameter.Optional }
            );
            
        }
    }
}

5. 接著我們可以使用 Postman 工具來模擬網路上的 Bot 做測試結果如下圖


訪客統計