Dataset

Sample document in the corpus

A typical document contains: an identifier, a piece of text (in the full version), a set of annotations (love, satisfaction), the referred brand, the sector, other named entities.

For a number of documents, extended information has been given, linking data to external datasets (Thomson Reuters' PermID, etc.).

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix sabd: <http://sabcorpus.linkeddata.es/data/> .
@prefix sabv: <http://sabcorpus.linkeddata.es/vocab/> .
@prefix sioc: <http://rdfs.org/sioc/ns#> .
@prefix marl: <http://purl.org/marl/ns#> .
@prefix onyx: <http://www.gsi.dit.upm.es/ontologies/onyx/ns#> .
@prefix permid: <https://permid.org/> .
@prefix org: <http://www.w3.org/TR/vocab-org/> .
@prefix gr: <http://purl.org/goodrelations/v1#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .

sabd:826812979421257730 a sioc:Post ;
  sioc:id "826812979421257730" ;
  sioc:content "Ya me quede sin credito?? Hace 3 dias tengo credito nomas... Movistar y la concha de tu hermana 😒"@es ;
  marl:describesObject sabd:Movistar ;
  sabd:isInPurchaseFunnel sabv:postPurchase;
  sabd:hasMarketingMix sabv:price;
  onyx:hasEmotion sabv:hate, sabv:dissatisfaccion ;
  marl:hasPolarity marl:negative ;
  marl:forDomain "TELCO" . 
Information on companies, brands and emotions is also given.
sabd:Movistar a gr:Brand ;
  rdfs:seeAlso <http://dbpedia.org/resource/Movistar> ;
  rdfs:label "Movistar" .
sabd:1-5000062703 a gr:Business ;
  rdfs:label "Telefonica de Espana, S.A.U.";
  rdfs:seeAlso <https://opencorporates.com/companies/es/82018474> ;
  owl:sameAs permid:1-5000062703 .

Corpus download

These datasets lack the Twitter texts due to copyright reasons. You can retrieve them from the ID.

Version 1 The corpus contains only sentiment tags made following criteria and using an ad-hoc vocabulary.

Version 2 The corpus contains purchase funnel and marketing mix tags, having being tagged using the following criteria and using the same ad-hoc vocabulary as in v1.

Authorship

For copyright reasons, the text is not available for download (but requests at vrodriguez.AT.fi.upm.es will be considered). However, the annotations are work of María Navas, Víctor Rodríguez and Idafen Santana. They are freely downloadable under a CC-BY 4.0 license.

If you publish a work using this resource please mention us as:
Spanish Corpus for Sentiment Analysis Towards Brands, M. Navas-Loro, V. Rodríguez-Doncel, I. Santana-Pérez, A. Sánchez, in Int. Conf. on Speech and Computer (pp. 680-689). Springer ISBN: 978-3-319-66428-6 (2017)
A new reference pointing to the extended version will be coming soon.