Google Sheets – Prevent ImportXML from Spilling Text Across Columns

google sheetsimportxmlxpath

When importing chat transcript by importXML, e.g.,

=IMPORTXML("http://chat.stackexchange.com/transcript/7/2015/8/13", "//div[@class='content']")

I find that some messages split across several columns:

+-------------------+---------------+----------------+
| "The tag-wiki ... | google-search |  tag instead.  |
+-------------------+---------------+----------------+

I understand this happens because "google-search" is within its own HTML element:

... use the <a href="//webapps.stackexchange.com/questions/tagged/google-search"><span class="ob-post-tag" style="background-color: #E2EDFA; color: #5D7A9C; border-color: #5D7A9C; border-style: solid;">google-search</span></a> tag instead. 

But I don't want to have multiple columns like that. I want all chat message in the same cell:

The tag-wiki contains the sentence: If you are talking about search, use the google-search tag instead.

Is there a way to either prevent importXML from using breaking text into columns, or to join them back? join("", importxml(...)) throws an error because the range is two-dimensional.

Best Answer

Short answer

=TRANSPOSE(
   QUERY(
    TRANSPOSE(
      IMPORTXML(
         "http://chat.stackexchange.com/transcript/7/2015/8/13",
         "//div[@class='content']"
      )
    ),
    ,
    1000
   )
 )

Explanation

IMPORTXML is returning an array having one column by each inner span tag. There is one span tag by each smart link included in the chat message.

"It's known" that QUERY joins the headers rows. As in this case the OP wants to join rows, the formula includes TRANSPOSE two times, the first to be able to use the QUERY behavior to join columns and the second to return the array to the original shape.

Reference