Extracting data from semi-structured documents is a very hard task, and will be going to become more and more critical as the amount of digital data available on the internet develops. In fact, documents are regularly so expansive that the data set returned as answer to a query might be too big to convey interpret-able knowledge. In this work we depict an approach in view of Tree-based Association Rules (TARs) mined standards, which give inexact, in tensional data on both the structure and the substance of XML records, and can be stored in XML format too. This mined information is later used to give: (I) a concise idea – the gist – of both the structure and the substance of the XML archive and (ii) quick, approximate answers to queries. In this work we focus around the second element. A prototype system and experimental results demonstrate the effectiveness of the approach.

