pajek2stata

Posted May 2nd, 2010 by Rense and filed in SNA with stata
Comments Off

Of all the currently available data formats for social network data, the .net format  that Pajek uses is probably the most widely compatible and versatile format in use.  Although these files are basically just ASCII files, Stata cannot directly read data from this format because of rather specific way the data are defined. For one, .net files contain at least two data objects: a specification of vertices (nodes), and a specification of relations between those vertices. Build-in Stata routines to read ASCII data (insheet, infile, infix) expect only one data object (your regular data matrix of observations and variables).

This is were my new command pajek2stata comes in. Pajek2stata lets you import .net files, storing the vertices part of the file as your Stata dataset, and the relations part as a matrix in Mata with a name of your choice.  The syntax is pajek2stata using fname, name(name) [clear replace], in which fname is the name of your Pajek file and name is the name of the Mata matrix that will hold the network part of the data.

Why store the network part of the data in Mata? Unlike other statistical packages, Stata can have only one dataset in memory at the time. So, if you are loading a network dataset that naturally consists of two objects, you have to put second part somewhere. Mata, then, is a good option (a Stata matrix could be another) because it allows you to easily interact with the data, and it has no trouble handling the different shapes that the network parts of .net files can take.

At this point you might wonder: why would I want to load Pajek data in Stata in the first place? I can think of a number of reasons. First, you might have a .net file with information on vertices that you want to analyze in Stata. Pajek2stata provides an easy way to access those data, even if you are not interested in the network part of the data. Second, once you have the network data in Mata, this allows you to use those data in a very flexible way, for example by programming your own network measures. I will post some examples of that later.

One challenge in analyzing your network data like this in Mata is that the data can take different shapes in .net files: they can be square N x N adjacency matrices, or they can by two-column edge- or arclists. Pajek2stata simply loads the data in the shape that the .net file provides. I’ll write more later about how to get from one shape to the other.

At the moment, pajek2stata can only handle simple .net files containing not more than one network specification. Pajek also supports .net files with multiple network specifications for the same set of vertices. Future versions of pajek2stata may be able to deal with such files.

In case you’d like to export data from Stata to Pajek, stata2pajek (written by Gabriel Rossman) can do the job. At the moment, the two programs do not really smoothly combine in the sense that you could import a .net file with pajek2stata and immediately export it using stata2pajek (because the latter assumes an edge- or arclist stored as a Stata dataset), but there are workarounds for that. Again, more later.

To install pajek2stata, type ssc install pajek2stata.