KDnuggets Home » Polls » Data types/sources analyzed (May 2014)

What data types/sources you analyzed in the past 12 months?


 
  
What data types/sources you analyzed in the past 12 months? [264 votes total]

% users in 2014   % users in 2012
table data (fixed n. columns) (203) 77%
73%
time series (126) 48%
44%
text (108) 41%
39%
itemsets / transactions (70) 27%
33%
location/geo/mobile (52) 20%
19%
Twitter (47) 18%
NA (not asked in 2012)
JSON (45) 17%
8.7%
web content (42) 16%
13%
social network (41) 16%
18%
anonymized data (37) 14%
24%
XML (37) 14%
9.3%
web clickstream/web log (33) 12.5%
9.3%
email (26) 10%
11%
images / video (13) 4.9%
6.0%
music / audio (7) 2.7%
1.1%
Other (19) 7.2%
8.2%


Data Sources Notes
Comparing with a similar 2012 KDnuggets Poll: Data types analyzed/mined, we see that the data types/sources with the highest increase were
  • music / audio: 143% up, from 1.1% rate in 2012 to 2.7% in 2014
  • JSON: 95% up, from 8.7% in 2012 to 17%
  • XML: 51% up, from 9.3% in 2012 to 14%.

The largest declines in usage were for

  • anonymized data: 42% down, from 24% in 2012 to 14% in 2014
  • itemsets / transactions: 19% down, from 33% to 27%
  • images / video: 18% down, from 6.0% to 4.9%

We also added new options in 2014 poll for accessing data from a database engine, and took the top 7 database engines from db-engines.com/en/ranking.

Overall, 70% of all respondents have accessed data from some database, but only about 20% accessed NoSQL databases (Hadoop, MongoDB or another DB engine)

Data Source% Useddb-engines Rank
Microsoft SQL Server (84) 31.8%3
Oracle (65) 24.6%1
MySQL (60) 22.7%2
another database engine (41) 15.5%na
Hadoop/HDFS (34) 12.9%na
Microsoft Access (31) 11.7%7
PostgreSQL (25) 9.5%4
DB2 (19) 7.2%6
MongoDB (13) 4.9%5

We note that popularity of database engine for data mining is NOT in the same order as in db-engines ranking, with SQL server being an especially popular source for data access.

We also analyzed co-occurrence of popular data types with different types of databases, and measured "affinity" of database engine to data type as ratio of how frequently this data type was used in conjunction with that database, divided by average % usage of that data type. The usage of a particular data type and a database engine by the same respondent within a year does not mean that that DB was used for analyzing this data type, but we found some interesting and strong correlations.

Interestingly, database engines with the most affinity for text data were MongoDB (1.88) and Hadoop (1.65), while for time series, the most popular database engines were Postgres (1.65).

Table below shows breakdown by Region, with columns:

  • % Participants: % of participants from that region
  • Ntypes: N. of Different data Sources used,
  • %from DB: % used data from a database engine or Hadoop
  • %from NoSQL: % used data from NoSQL engine
  • %text: % used text data

Region% ParticipantsNtypes%from
DB engine
%from
NoSQL
%text
US/Canada47%5.478%20%44%
Europe26%4.566%10%38%
Asia13%3.841%15%41%
Latin America8.7%4.474%17%35%
Africa/Middle East3.8%4.570%20%30%
Australia/NZ2.3%4.283%0%50%
ALL100%4.870%16%41%

We note that US/Canada region is leading in N. of different types used, usage of all database engines, and in NoSQL engines. Europe is lagging in using NoSQL engines, while Asia is lagging in usage of data from database engines in general (but not so much with NoSQL engines).


KDnuggets Home » Polls » Data types/sources analyzed (May 2014)