Google-sheets – Reverse Shortened URL to Get the Original URL Before the Shortening in Google Sheets

google sheetsgoogle-appsgoogle-apps-script

I have a sheet where I collect tweets with links based on a search query. Twitter, however, is handing me for each tweet their short URL format (t.co) rather than the original URL that was shared.

Is there a way to use formulas or google code to retrace the short url and fetch the final destination URL? The original that was shared by the person authoring the tweet?

     What i have        |       what i'm looking for
---------------------------------------------------------------------------
https://t.co/dura1sUSxm | https://www.jpost.com/Breaking-News/Russia-Saudi-Arabia-plan-deals-for-2-bln-for-Putins-visit-to-Riyadh-604200
https://t.co/Ayy7ww8dFX | https://www.washingtonpost.com/national-security/trump-says-little-as-his-gop-allies-condemn-turkeys-incursion-into-syria/2019/10/09/c46210f6-eaab-11e9-9306-47cb0324fd44_story.html
https://t.co/WLj6PipXkC | https://www.newsweek.com/teacher-fired-refusing-sign-pro-document-1262083
https://t.co/UoqiqfaHup | https://www.reuters.com/article/us-environment-waste-idUSKBN1WP1RE
https://t.co/hO9swbmeeZ | https://www.washingtonpost.com/national-security/trump-says-little-as-his-gop-allies-condemn-turkeys-incursion-into-syria/2019/10/09/c46210f6-eaab-11e9-9306-47cb0324fd44_story.html
https://t.co/Ve8ZpCp1s1 | https://www.reuters.com/article/us-environment-waste-idUSKBN1WP1RE

Best Answer

Redirectnator v1.1 is a spreadsheet created by Jeff Louella. It has a function built-in, with server code (301/302/200/404) and final url. You can make a copy here : https://www.thetechseo.com/seo-tools/redirect-checker/

Here's the script you can use/modify :

function redirectCheck(url, user, pwd) {
  try {
    function getResp(url, user, pwd){  
      var resp = UrlFetchApp.fetch(url, {
        muteHttpExceptions: true,
        followRedirects: false,
        headers: {
          'Authorization': 'Basic ' + Utilities.base64Encode(user+':'+pwd)
        }
      });
      return resp;

    }

    var response = getResp(url, user, pwd);
    var rCode = response.getResponseCode();
    var redirectCount = 0;
    var tCode = rCode.toString();
    var location = url;
    var domain = getDomain(url);

    while (rCode == 301 || rCode == 302 && redirectCount <= 10) {
      redirectCount++;
      header = response.getHeaders();
      location = getFullUrl(header['Location'],domain);
      domain = getDomain(location);
      Logger.log('location: '+location);
      response = getResp(location, user, pwd);
      rCode = response.getResponseCode(); 
      tCode = tCode + " > " + rCode.toString();
      Utilities.sleep(500);// pause in the loop for 500 milliseconds
    }     


    Logger.log('redirectCount: '+redirectCount);
    return tCode + "|" + redirectCount + "|" + location;


  } catch (error) {
    Logger.log(error);
    return "Error| |"+error;
  }
}

function getDomain(url) {
  var domain = '',
      protocol;
  if (url.indexOf("://") > -1) {
    domain = url.split('/')[2];
    protocol = url.split('/')[0];    
    //remove port number
    domain = domain.split(':')[0];
    //add protocol back
    domain = protocol+"//"+domain;
  }  

  return domain;
}

function getFullUrl(url,prevDom) {
  var fullUrl,
      domain = getDomain(url);
  if(domain == ''){
    fullUrl = prevDom+url;
  } else {
    fullUrl = url;
  }       

  return fullUrl;
}

function redirectCheckTest() {
  var test = redirectCheck('http://blog.pexcard.com/contractors/building-budget-construction-business/');
  Logger.log('test: '+test);
}

The result :

enter image description here


Note : This script doesn't work when testing urls from websites known for being hard to scrap (i.e. facebook, instagram, linkedin, google, etc.. I assume the U-A or IP used by Google-sheet is recognised as a bot therefore it returns the homepage or block the robot.)