MPI etc

1. Using revolve-door unrank function in textbook to generate index array for subset combination. So, we can generate arbitary subset by given its rank. This is used for partition our searching job. This method is comparatively a bit slow than other "successor" function. (And this is only my original estimation. After I made a program to compare the three algorithm I found out that "unrank" is extremely slow, about 200 times slower than "successor".

2. Improved on basis of first version so that two consecutive subset combination don't need calculate repeated xor. Instead
I compare the difference in two subsets and use previous result to xor with two changed vector. In order to keep both
current and previous subset combination, I use two integer array, oldArray and newArray. instead of using a pointer to point to them alternatively in my previous version, I have to backup newArray into oldArray because Knuth's "successor" is depending on previous results and modifies them directly.

3. Based on second version and use MPI to distribute computation to other nodes. My idea is straight forward. If we can
start any computation by giving a rank, then we can divide the total computation job with different range of rank. That's
why I implement my algorithm by using "unrank" function.
4. Root node read from data file and broadcast n,k,d and vector to all other nodes.
5. Then all slave nodes start computation by iterate all subset size from 1 to k. And each node knows its part of range
because of its MPI_Rank. For example, for n=32,k=16,d=7 the total number of combination of subset 5 is (16,5)=11440. If
MPI_Size=9 which means 8 slave nodes, then each node will be given a range of 11440/8 and they don't need further instruction
from root node and can start working as soon as they receive all setup information.
5. I choose to let root node be specializing in listening to simplify programming because we don't know when and which node
will report discovery. So, we have to use non-blocking "recv" method in MPI. If I give computation job to root node, I have
to use multi-thread for listening which is too complicated and has only suspicious performance gain. Therefore, the root
node will "post" a recieve request to all slave nodes and then "test" requests in a loop. The slave nodes will first answer
"yes" or "no" to discovery. If "yes", it will send combination of vector which has smaller weight than d seperately. If "no",
no further informatio is sent. By counting "no" replies or receiving "yes" reply, root node can decide if we find or not.

6. Originally I write my program in windows and in order to run and test in Linux, I have to change some data type. For example, in windows, "size_t" is one of primitive type while in Linux, I have to typedef it to "unsigned int". And in windows, I am using VC++ special type __int64 which is 64 bit integer while in Linux, I have to type define it to "long long". So, if you need to run them in windows, comment out those typedef's in "myset.h".

7. One of few things changed here is just to adjust the searching scope from n to d which reduces greatly so that all test

cases become reachable. I made myself a fool by writing email and claimed that test case 7 required three years to finish.

The other thing I changed here is how to decide running result. I read the requirement and divide the task into two parts.

One is to confirm if the code has a linear combination of weight equal to d. The other one is to find out if there is a

If in any searching node, a smaller weight is found, the size of that subset is sent and program returns by sending all combinations in an array.

If only a minimal weight exactly equal to d is found, the size of that subset is sent as size*(-1) and that particular combination is sent in an array.

Post receiving requests on each slave node and test in a big loop until one of the following two conditions are satisfied: a weight smaller than d is found or all slave nodes returns. (either 0 or negative)

Originally I thought like this when I didn't give one second to reading the requirement.

Here it goes:

I think the task is just mission impossible because I only finished case 5 with 22945 seconds which is roughly six hours more. But please note that I am using MPI to run this program on 9 high performance servers and I am already using the optimize "xor" strategy mentioned by prof. in class which means I only "xor" the last result with two changed vector. Still it costs more then six hours. And case 6 is more than 10 times bigger in size of subset combination than case 5. And case 7 is even six thousand times bigger than case 5 . (Purely in number of subset combination: 2 ^(52-36)) And I haven't mentioned about the length of vector which is much longer than case 5.

So, my estimate is that to finish case 7, I may need more than three years. Is that correct estimation?

Here it comes:

I took a quick look at the attached program (hopefully deleted from
this email). I think the program can still be speeded up. I think you
are only using 8 bits per word, whereas one should use all 32 bits.
For example, for case 5, with n=72, you should only need 3 words to
store the bit array, but your program seems to be using 72/8=9 words.
You can speed up your program by a factor of 4 just by packing the bits
tighter.

Another point is that your "look" table is only of size 2^8=256. If you
use a table of size 2^16, it will go faster (provided you do pack 32
bits into one word).

　

Here it goes:

By the way, I want to take back part of comment last time on error-correcting-code when I said it is mission impossible. First I didn't read question carefully and my program is actually testing all subset combination from size 1 to size k instead of required from 1 to d. This is a big difference. And I modified my program this morning and rerun the test. It looks like this:

test5 runs 649 seconds. test6 has been running for more than several hours.

And I made a simple calculation comparison between test6 and test5.

test5= (36,1)+(36,2)+(36,3)+...(36,12)

test6= (40,1)+(40,2)+(40,3)+...(40,16)

test6/test5=65

This means at least test6 needs to run for about 65x649 seconds

And test7=(52,1)+(52,2)+...(52,20)

test7/test5=126909

So, test7 needs to run for about 126909x649seconds=953(days) It should be running a little less than three years.

Of course I haven't made the optimization mentioned by you last time, however, I feel the speedup is quite limited. My reason is that for each round of operation, I have a lot of work to be done.

1. backup current index array in an old array because I need two array to compare to find out which two index are changed. 2. a rather complicated algorithm to compare two index array to find out which two index array are changed. (Maybe homogeneous order can save the work a bit.)

3. Knuth's algorithm to generate next subset.

4. xor previous result with two changed rows to get new result 5. check order of current result

You see these operations will probably have more than one thousand instructions and what you suggested may save just a few instructions here. By Amdahl's law in speedup in computer achitecture, the total speedup is just very small.

Here it goes:

I thought you might be interested in knowing the running result of my MPI program in test6. It runs roughly 12 hours which is a quite precise match with my prediction in my last email: test5 runs 649seconds and test6 is roughly 65 times bigger than test5. So, I expect it runs for more than 11 hours. At least my program is consistent in two test cases. This makes me suspect how fast is my classmate's program who claimed to finish test6 within a few hours with 4 cpu and 4 threads. How come his desktop can defeat our cluster computer? I am really curious about his program!

I let my program run for test7, but I will probably stop it in a few days because it seems to need to run for more than two years.

1. mpi.cpp  (main for MPI, compiled by make mpi)

2. main.cpp (main for local single node, compiled by make myset)

3. myset.h

4. myset.cpp

5. revolve.h

6. revolve.cpp

7. makefile

to compile:
mpic++ mpi.cpp myset.cpp revolve.cpp -o mpimyset.exe

to run:
>time mpirun -np 9 mpimyset.exe test5.txt >result5.txt &

And here is the data file you need to feed when running

file name: mpi.cpp

#include "myset.h"
#include "revolve.h"
#include <iostream>
#include <mpi.h>
#include <time.h>
//#include <math>

using namespace std;

//#define myabs(x) ((x>0)?x:(-x))
int myabs(int x)
{
	return (x>0)?x:(-x);
}

extern size_t n, k, d;
extern size_t* buffer;//please note the master and slave has different way to initialize buffer
extern MySet* sets;
extern MySet current;
extern size_t bufferSize;
extern size_t* resultArray;//the pointer
extern int resultSize;


int mpi_rank, mpi_size;
bool hasSmaller=false;

size_t context[3]; //n,k,d

int main(int argc, char* argv[])
{
	time_t start;

	MPI_Init(&argc, &argv);                 /* Initialize MPI       */
        MPI_Comm_rank(MPI_COMM_WORLD, &mpi_rank); /* Get my rank          */
    	MPI_Comm_size(MPI_COMM_WORLD, &mpi_size);   

	if (mpi_rank==0)
	{
		start=time(0);
		int i,  counter=0;
		//indicating which node returns
		int* flags=new int[mpi_size-1];
		//contain the result for each node
		int *results=new int[mpi_size-1];
		MPI_Status  status;
		MPI_Request* requests=new MPI_Request[mpi_size-1];
		if (argc<2)
		{
			cout<<"usage: myset.exe filename\n";
			MPI_Finalize();
			return 1;
		}
		readFromFile(argv[1]);
		initialize(n, k, d, buffer);
		context[0]=n;
		context[1]=k;
		context[2]=d;
		MPI_Bcast(context, 3, MPI_INT, 0, MPI_COMM_WORLD);
		MPI_Bcast(buffer, bufferSize, MPI_INT, 0, MPI_COMM_WORLD);
		for (i=1; i<mpi_size; i++)
		{
			flags[i-1]=0;
			MPI_Irecv(results+i-1, 1, MPI_INT, i, 0, MPI_COMM_WORLD, requests+ i-1);			
		}
		while (counter<mpi_size-1&&!hasSmaller)
		{
			for (i=1; i<mpi_size; i++)
			{
				if (!flags[i-1])
				{
					MPI_Test(requests+i-1, flags+i-1, &status);
					if (flags[i-1])//negative means find the combination equal to d
					{
						counter++;//counter inc anyway
						//means we find smaller than d
						size_t resultLength=myabs(results[i-1]);
						//cout<<"resultLength="<<resultLength<<endl;
						int* dataBuf=new int[resultLength];
						if (results[i-1]>0)
						{
							cout<<"node "<<i<<" has smaller weight"<<endl;
							hasSmaller=true;
						}
						else
						{
							if (results[i-1]<0)
							{
								cout<<"node "<<i<<" has found a combination of weight equal to d\n";
							}
						}
						if (resultLength>0)
						{
							MPI_Recv(dataBuf, resultLength, MPI_INT, i, 0, MPI_COMM_WORLD, &status);
							cout<<"The combination is:\n[";
							for (int j=0; j<resultLength; j++)
							{
								cout<<dataBuf[j];
								if (j!=resultLength-1)
								{
									cout<<",";
								}
							}
							cout<<"]\n";
							if (results[i-1]>0)
							{
								break;
							}
						}
					}
				}
			}
			if (counter==mpi_size-1&&!hasSmaller)
			{
				cout<<"no small weight found in any node\n";
				break;
			}
		}
		cout<<"the total time in seconds is "<<time(0)-start<<endl;

	}
	else
	{
		long long start, total, mySize;
		int size;
		size_t subset;
		MPI_Bcast(context, 3, MPI_INT, 0, MPI_COMM_WORLD);
		n=context[0];
		k=context[1];
		d=context[2];
		size=n/32;
		if (n%32!=0)
		{
			size++;
		}
		bufferSize=size*k;
		buffer=new size_t[bufferSize];
		MPI_Bcast(buffer, bufferSize, MPI_INT, 0, MPI_COMM_WORLD);
		initialize(n,k,d, buffer);
		for(subset=1; subset<=d; subset++)
		{
			total=combineNumber(k, subset);
			mySize= total/(mpi_size-1);
			start=mySize*(mpi_rank-1);
			//if I am the last one, I have to take care all of rest part
			if (mpi_rank==mpi_size-1)
			{
				mySize=total-mySize*(mpi_size-2);
			}
			//false means we find the smaller than d
			if (!weightTest(start, mySize, subset))
			{
				//resultSize=subset;//true
				//memcpy(resultArray, newArray, sizeof(size_t)*(n+1));				
				break;
			}
		}
		//cout<<"node "<<mpi_rank<<endl;
		MPI_Send(&resultSize, 1, MPI_INT, 0, 0, MPI_COMM_WORLD);
		//cout<<"myabs="<<myabs(resultSize)<<endl;
		if (resultSize!=0)
		{
			MPI_Send(resultArray, myabs(resultSize), MPI_INT, 0, 0, MPI_COMM_WORLD);
		}
	}
	
	MPI_Finalize();                         
	return 0;
}

file name: myset.h

#ifndef MYSET_H
#define MYSET_H

typedef unsigned char HashTableType;
typedef long long __int64;
typedef unsigned int size_t;

const size_t DefaultGroundSetSize=32;//4*8=32
const size_t DigitBitCount=sizeof(size_t)*8;//one byte
const size_t DefaultBitState=0;
const size_t HashTableSize=256;
const size_t DeleteMask=0xFFFFFFFF;
const size_t ComplementMask=DeleteMask;
const size_t SetOrderMask=0xFF;
const size_t SetOrderMaskCount=sizeof(size_t)/sizeof(HashTableType);

const size_t MaxGroundSetSize=1024;
const size_t MaxDigitCount=MaxGroundSetSize/8/sizeof(size_t);

size_t fact(size_t size);

size_t combine(size_t n, size_t k);

class MySet
{
public:
	//constructor
	MySet(size_t size=DefaultGroundSetSize);
	~MySet();

	//copy constructor
	MySet(const MySet& copy);

	//basic set operations
	
	//unary operations
	bool MemberOfSet(size_t index) const;
	void SetInsert(size_t index);
	void SetDelete(size_t index);
	size_t SetOrder();
	MySet Complement();

	//binary operations
	MySet Union(const MySet& other);	
	MySet Intersection(const MySet& other);	
	MySet Difference(const MySet& other);

	//operator overloading
	//set assignment
	MySet& operator=(const MySet& other);
	MySet& operator=(const size_t* array);

	//xor operation
	MySet& operator^(const MySet& other);
	
	bool operator==(const MySet& other);
	bool operator[](size_t index);
	

	//utility operations
	void display() const;
	void initialize(size_t size);
	void randomSet();
	size_t getSize(){ return groundSetSize;}	
	void flip(size_t index);
	void createByArray(size_t array[], size_t arraySize);

	
	//special set utility operations
	size_t LexRank();	
	void LexUnrank(size_t number);

	//subset utility
	//lexico
	size_t SubsetLexRank(size_t subsetSize);
	void SubsetLexUnRank(size_t subsetSize, size_t rank);
	void SubsetLexSuccessor(size_t subsetSize);

	//colex
	void SubsetColexSuccessor(size_t subsetSize);

	int getDigitCount()const {return digitCount;}
protected:
	static HashTableType hashTable[HashTableSize];
	static bool need2hash;
	int groundSetSize;
	size_t digitCount;
	size_t* digits;
	//int lastDigitSize;
	void initHashTable();
	void checkSize(size_t index)const;
	void checkSize(const MySet& other) const;
	void setSize(size_t size);//this should not be called by user!!!! try initialize(i);
	
	void clear();
	
};

#endif

file name: myset.cpp

#include "myset.h"
#include <iostream>

//typedef  int* set ;

using namespace std;

//pure for testing
long lastResult;



HashTableType MySet::hashTable[256];

bool MySet::need2hash=true;

MySet::MySet(size_t size)
{
	groundSetSize=0;
	initialize(size);
	//because we only need to initialize hash table once for all set objects
	if (need2hash)
	{
		initHashTable();
		need2hash=false;
	}
}

MySet::~MySet()
{
	delete[]digits;
}

//a copy constructor
MySet::MySet(const MySet& copy)
{
	//set groundsetsize to 0 to indicate this is the first time creating set
	groundSetSize=0;
	initialize(copy.groundSetSize);
	for (size_t i=0; i<digitCount; i++)
	{
		digits[i]=copy.digits[i];
	}
}


void MySet::initHashTable()
{
	size_t i, j;
	//int lowIndex=0, highIndex=1;
	//initialize half byte first
	hashTable[0]=0;
	hashTable[1]=1;
	hashTable[2]=1;
	hashTable[3]=2;
	hashTable[4]=1;
	hashTable[5]=2;
	hashTable[6]=2;
	hashTable[7]=3;
	for (i=0; i<8; i++)
	{
		hashTable[8+i]=hashTable[i]+1;
	}
	//initialize one byte and use it as a pattern in following
	for (i=1; i<16; i++)
	{
		for (j=0; j<16; j++)
		{
			hashTable[i*16+j]=hashTable[i]+hashTable[j];
		}	
	}
}

void MySet::display() const
{
	cout<<"\n[";
	for (int i=0; i<groundSetSize; i++)
	{
		if (MemberOfSet(i))
		{
			cout<<"1";
		}
		else
		{
			cout<<"0";
		}
	}
	cout<<"]\n";
}


void MySet::checkSize(size_t index) const
{
	if (index>=groundSetSize)
	{
		cout<<"error exceeding boundary";
		exit(1);
	}
}

//difference=A intersection complement(B)
MySet MySet::Difference(const MySet& other)
{
	checkSize(other);
	MySet result=other;
	result.Complement();
	return Intersection(result);
}

MySet MySet::Complement()
{
	for (size_t i=0; i<digitCount; i++)
	{
		digits[i] ^= ComplementMask;
	}
	return *this;
}

//element starts from 1 instead of 0
bool MySet::MemberOfSet(size_t index) const
{
	size_t i,j;
	checkSize(index);
	i=index/DigitBitCount;
	j=index%DigitBitCount;
	return (digits[i]>>j)&1==1;		
}

//assume element is starting from 1 instead of 0
void MySet::SetInsert(size_t index)
{
	size_t i,j;
	checkSize(index);
	i=index/DigitBitCount;
	j=index%DigitBitCount;
	digits[i]|=(((size_t)1)<<j);
}

//assume element starts from 1 instead of 0
void MySet::SetDelete(size_t index)
{
	size_t i,j;
	size_t mask=DeleteMask;
	checkSize(index);
	i=index/DigitBitCount;
	j=index%DigitBitCount;
	mask^=(1<<j);
	digits[i]&=mask;
}

void MySet::checkSize(const MySet& other) const
{
	if (groundSetSize!=other.groundSetSize)
	{
		cout<<"two set has different ground set size!\n";
		exit(1);
	}
}

//return value instead of dynamic new memory
MySet MySet::Union(const MySet& other)
{
	checkSize(other);
	//choose the bigger groundsetsize of two 

	MySet result(groundSetSize);

	for (size_t i=0; i<digitCount; i++)
	{
		result.digits[i]=digits[i] | other.digits[i];
	}
	return result;
}

//set assignment maybe use set variable with old data
MySet& MySet::operator =(const MySet& other)
{
	//initialize(other.groundSetSize);
	for (size_t i=0; i<digitCount; i++)
	{
		digits[i]=other.digits[i];
	}
	return *this;
}

MySet& MySet::operator =(const size_t* array)
{
	for (size_t i=0; i<digitCount; i++)
	{
		digits[i]=array[i];
	}
	return *this;
}


MySet& MySet::operator ^(const MySet& other)
{
	for (size_t i=0; i<digitCount; i++)
	{
		digits[i]^=other.digits[i];
	}
	return *this;
}

//set is equal only when the ground set size is equal and all element equal	
bool MySet::operator ==(const MySet& other)
{
	if (groundSetSize!=other.groundSetSize)
	{
		return false;
	}
	for (size_t i=0; i<digitCount; i++)
	{
		if (digits[i]!=other.digits[i])
		{
			return false;
		}
	}
	return true;
}

//set intersection
MySet MySet::Intersection(const MySet& other)
{
	checkSize(other);
	MySet result(groundSetSize);
	for (size_t i=0; i<digitCount; i++)
	{
		result.digits[i]=digits[i]&other.digits[i];
	}
	return result;
}


	
size_t MySet::SetOrder()
{
	size_t result=0;
	size_t index;
	for (size_t i=0; i<digitCount; i++)
	{
		for (size_t j=0; j<SetOrderMaskCount; j++)
		{
			index=(digits[i]>>(j*8))&SetOrderMask;

			result+=hashTable[index];
		}
	}

	return result;
}	


void MySet::flip(size_t index)
{
	size_t i,j;
	checkSize(index);
	i=index/DigitBitCount;
	j=index%DigitBitCount;
	digits[i]^=((size_t)1<<j);	
}


void MySet::setSize(size_t size)
{
	groundSetSize=size;

	//DigitBitCount=ByteSize=8
	digitCount=groundSetSize/DigitBitCount;
	//last digit size is stored so that calculation will only be 
	//done once for all

	//if modulo 8==0, we don't need extra digit
	if (groundSetSize%DigitBitCount!=0)
	{
		digitCount++;
	}

}


void MySet::initialize(size_t size)
{
	//DefaultBitState is 0 for unset;
	
	if (size>0)
	{
		if (groundSetSize>=size)
		{
			//we don't have to reallocate memory
			setSize(size);
		}
		else
		{
			if (groundSetSize!=0)
			{
				delete[]digits;
			}
			setSize(size);
			
			digits= new size_t[digitCount];
			if (digits==NULL)
			{
				cout<<"unable allocate memory for ground set size of "<<groundSetSize<<endl;
				exit(1);
			}	
		}
		//memset(digits, DefaultBitState, digitCount);
		for (size_t i=0; i<digitCount; i++)
		{
			digits[i]=0;
		}

		//memset(digits, 0, digitCount);
	}
	else
	{
		groundSetSize=digitCount=0;
		digits=NULL;
	}
}


//colex
void MySet::SubsetColexSuccessor(size_t subsetSize)
{
	size_t result=0, i,j;
	size_t* array;
	if (SetOrder()!=subsetSize)
	{
		cout<<"subset size is not what it should be!\n";
		exit(1);
	}
	//first create an index array
	array=new size_t[subsetSize+1];
	array[0]=0;
	j=0;
	//translate into index array
	for (i=groundSetSize; i>=1; i--)
	{
		if (MemberOfSet(i-1))
		{
			array[j+1]=i;
			j++;
		}
	}

	i=1;
	while (i<=subsetSize&&array[i]==groundSetSize-i+1)
	{
		i++;
	}
	if (i==subsetSize+1)
	{
		cout<<"undefined";		
	}
	else
	{
		result=array[i];
		for (j=i; j<=subsetSize; j++)
		{
			array[j]=result+1-(j-i);
		}
		j=subsetSize;
		for (i=0; i<groundSetSize; i++)
		{
			if (i+1==array[j])
			{
				SetInsert(i);
				j--;
			}
			else
			{
				SetDelete(i);
			}
		}
	}
	delete[]array;
}


//this is limited within groundset size of 32 or less
//the order is lexicographic order
void MySet::SubsetLexUnRank(size_t subsetSize, size_t rank)
{
	size_t x=1, temp;
	if (subsetSize>groundSetSize)
	{
		cout<<"wrong subset size!\n";
		exit(1);
	}
	clear();
	for (size_t i=1; i<=subsetSize; i++)
	{
		temp=combine(groundSetSize-x, subsetSize-i);
		while (temp<=rank)
		{
			rank-=temp;
			x++;
			temp=combine(groundSetSize-x, subsetSize-i);
		}
		SetInsert(x-1);
		x++;
	}
}

//this is limited within groundset size of 32 or less
//the order is lexicographic order
void MySet::SubsetLexSuccessor(size_t subsetSize)
{
	size_t result=0, i,j;
	size_t* array;
	if (SetOrder()!=subsetSize)
	{
		cout<<"subset size is not what it should be!\n";
		exit(1);
	}
	//first create an index array
	array=new size_t[subsetSize+1];
	array[0]=0;
	j=0;
	//translate into index array
	for (i=0; i<groundSetSize; i++)
	{
		if (MemberOfSet(i))
		{
			array[j+1]=i+1;
			j++;
		}
	}

	i=subsetSize;
	while (i>=1&&array[i]==(groundSetSize-subsetSize+i))
	{
		i--;
	}
	if (i==0)
	{
		cout<<"undefined";		
	}
	else
	{
		result=array[i];
		for (j=i; j<=subsetSize; j++)
		{
			array[j]=result+1+j-i;
		}
		j=1;
		for (i=0; i<groundSetSize; i++)
		{
			if (i+1==array[j])
			{
				SetInsert(i);
				j++;
			}
			else
			{
				SetDelete(i);
			}
		}
	}
	delete[]array;
}

		



//this is limited within groundset size of 32 or less
//the order is lexicographic order
size_t MySet::SubsetLexRank(size_t subsetSize)
{
	size_t result=0, i,j;
	size_t* array;
	if (SetOrder()!=subsetSize)
	{
		cout<<"subset size is not what it should be!\n";
		exit(1);
	}
	//first create an index array
	array=new size_t[subsetSize+1];
	array[0]=0;
	j=0;
	//translate into index array
	for (i=0; i<groundSetSize; i++)
	{
		if (MemberOfSet(i))
		{
			array[j+1]=i+1;
			j++;
		}
	}

	
	for (i=1; i<=subsetSize; i++)
	{
		if (array[i-1]+1<array[i])
		{
			for (j=array[i-1]+1; j<array[i]; j++)
			{
				result+=combine(groundSetSize-j, subsetSize-i);
			}
		}
	}
	delete []array;
	return result;
}


void MySet::createByArray(size_t array[], size_t arraySize)
{
	//let's have a simple groundsetsize as for 
	initialize(arraySize*sizeof(size_t)*8);

	for (size_t i=0; i<digitCount; i++)
	{
		digits[i]=array[i];
	}
}

//I would leave big number problem to future
size_t MySet::LexRank()
{
	//temporarily I would assume there is only a small set
	return digits[0];
}


//I would wait for future for big ground set problem
void MySet::LexUnrank(size_t number)
{
	initialize(sizeof(size_t)*8);
	digits[0]=number;
}


void MySet::randomSet()
{
	for (size_t i=0; i<digitCount; i++)
	{
		digits[i]=rand();
	}
}


bool MySet::operator [](size_t index)
{	
	return MemberOfSet(index);
}

void MySet::clear()
{
	size_t mask=0;
	for (size_t i=0; i<digitCount; i++)
	{
		digits[i]=mask;
	}
}


size_t fact(size_t size)
{
	size_t result=1;
	for (int i=2; i<=size; i++)
	{
		result*=i;
	}
	return result;
}


size_t combine(size_t n, size_t k)
{
	size_t result=1;
	if (n<k)
	{
		cout<<"combine number is wrong!\n";
		exit(1);
	}
	for (int i=0; i<k; i++)
	{
		result*=(n-i);
	}
	return result/fact(k);
}

file name: revolve.h

#include <iostream>
#include "myset.h"

using namespace std;

void displayArray(size_t* array, size_t start, size_t size);
void revolveUnRank(long theRank, size_t subSize, size_t* array);
void readFromFile(char* fileName);
bool weightTest(__int64 startRank, __int64 count, int subSize);
void initialize(size_t theN, size_t theK, size_t theD, size_t* array);
__int64 combineNumber(size_t n, size_t k);
void algoR(size_t k, size_t* array);
const int NoResult=-1000;

file name: revolve.cpp

#include "revolve.h"
#include "myset.h"

#include <iostream>
#include <fstream>


using namespace std;


size_t groundsetSize, subsetSize;

size_t n, k, d;
size_t* buffer;
size_t bufferSize;
MySet* sets;
int resultSize=0;



MySet current;
//bool findEqual=false;
size_t* resultArray=NULL; //this is used to return the array with weight of d
size_t* oldArray=NULL;
size_t* newArray=NULL;
size_t* c;


__int64  combineNumber(size_t n, size_t k)
{
	__int64 result =1;
	for (size_t i=0; i<k; i++)
	{
		result=result*(n-i)/(i+1);
	}

	return result;
}


void revolveUnRank(__int64 theRank, size_t subSize, size_t* c)
{
	__int64 last=-1, curr=-1, r=theRank;
	int i;
	size_t x=groundsetSize;
	subsetSize=subSize;

	for (i=subsetSize; i>=1; i--)
	{
		do
		{
			last=curr;
			curr=combineNumber(x, i);
			if (curr>r)
			{
				x--;
			}
			else
			{
				break;
			}
		}while (true);
	
		if (last==r)
		{
			c[i]=x-1;	
		}
		else
		{
			c[i]=x;	
		}
		r=last-r-1;
	}

}

int findSmallest(int& oldIndex, int& newIndex)
{
	while (oldIndex<=subsetSize&&newIndex<=subsetSize)
	{
		if (oldArray[oldIndex]==newArray[newIndex])
		{
			oldIndex++;
			newIndex++;
		}
		else
		{
			if (oldArray[oldIndex]<newArray[newIndex])
			{
				//must make easy for next 
				return oldArray[oldIndex++];
			}
			else
			{				
				return newArray[newIndex++];
			}
		}
	}
	
	if (oldIndex==newIndex&&oldIndex==subsetSize+2)
	{
		cout<<"find smallest failed, check array!\n";
		exit(1);
	}
	else
	{
		if (oldIndex<newIndex)
		{
			return oldArray[oldIndex];
		}
		else
		{
			return newArray[newIndex];
		}
	}
}

void displayArray(size_t* array, size_t start, size_t size)
{
	cout<<"[";
	for (size_t i=start; i<size; i++)
	{
		cout<<array[i];
		if (i!=start+size-1)
		{
			cout<<",";
		}
	}
	cout<<"]\n";
}					

void findChangedVectorIndex(int& first, int& second)
{
	int oldIndex=1, newIndex=1;
	first=findSmallest(oldIndex, newIndex);
	second=findSmallest(oldIndex, newIndex);
}

//return true means there is no exception, and false means we find smaller d
bool weightTest(__int64 startRank, __int64 count,  int subSize)
{
	subsetSize=subSize;
	int left, right;
	for (__int64 i=0; i<count; i++)
	{
		//I cannot use two arrays to alternatively, it is too complicated
		/*
		for (int j=0; j<subSize+1; j++)
		{
			oldArray[j]=newArray[j];
		}
		*/
		memcpy(oldArray, newArray, sizeof(size_t)*(n+1));
		//for the first time, we need this to initialize
		if (i==0)
		{
			revolveUnRank(startRank+i, subSize, newArray);
			//displayArray(newArray, 1, subSize);
		}
		else
		{
			algoR(subSize, newArray);
			//displayArray(newArray, 1, subSize);
		}

		if (i==0)
		{
			//things become more complicated because unrank starts from index 1
			current=sets[newArray[1]];
			for (int j=2; j<subSize+1; j++)
			{
				current^(sets[newArray[j]]);
			}
		}
		else
		{
			findChangedVectorIndex(left, right);//not the index of array
			current^(sets[left]);
			current^(sets[right]);
		}
		size_t temp=current.SetOrder();	
		if (temp<d)
		{
			/*
			cout<<"this means we find the combination with weight smaller than d:\n";
			for (int m=1; m<subSize+1; m++)
			{
				cout<<newArray[m]<<",";
			}
			*/
			resultSize=subSize;
			//resultArray[0]=subSize;
			memcpy(resultArray, newArray+1, sizeof(size_t)*subSize);
			return false;//means we find the 
		}
		else
		{
			if (temp==d)
			{
				//findEqual=true;
				resultSize=-subSize;
				//resultArray[0]=-subSize;
				memcpy(resultArray, newArray+1, sizeof(size_t)*subSize);
			}
		}
	}
	return true;
}

void initialize(size_t theN, size_t theK, size_t theD, size_t* array)
{
	groundsetSize=theK;
	n=theN;
	k=theK;
	d=theD;

	sets=new MySet[k];
	for (int i=0; i<k; i++)
	{
		sets[i].initialize(n);
		//this may seem to be odd, I overload the operator= for size_t* to copy array
		sets[i]=array+(i*sets[i].getDigitCount());
	}
	current.initialize(n);
	oldArray=new size_t[n+1];
	newArray=new size_t[n+1];
	resultArray=new size_t[n+1];	
}


void readFromFile(char* fileName)
{
	//size_t* array;
	size_t temp, size, mask=1;
	int i, j;
	int number;
	ifstream in(fileName);
	in>>n;
	cout<<"n="<<n<<", ";
	in>>k;
	cout<<"k="<<k<<", ";
	in>>d;
	cout<<"d="<<d<<"\n";

	size=n/32;
	if (n%32!=0)
	{
		size++;
	}
	bufferSize=size*k;

	buffer=new size_t[bufferSize];
	for (i=0; i<k; i++)
	{		
		number=-1;//the number of int
		cout<<"[";
		for (j=0; j<n; j++)
		{
			if (j%32==0)
			{
				number++;
				buffer[i*size+number]=0;
			}
			in>>temp;		
			cout<<temp;
			if (temp!=0)
			{
				buffer[i*size+number] |= (mask<<j);				
			}	
		}
		cout<<"]\n";
	}	
}


void algoR(size_t t, size_t* c)
{
	size_t j;

	if (t%2==1)
	{
		if (c[1]+1<c[2])
		{
			c[1]++;
			return;
			//goto R2;
		}
		else
		{
			j=2;
			goto R4;	
		}
	}
	else
	{
		if (c[1]>0)
		{
			c[1]--;
			return;
			//goto R2;
		}
		else
		{
			j=2;
			goto R5;
		}
	}

R4:
	if (c[j]>=j)
	{
		c[j]=c[j-1];
		c[j-1]=j-2;
		return;
		//goto R2;
	}
	else
	{
		j++;
	}
R5:
	if (c[j]+1<c[j+1])
	{
		c[j-1]=c[j];
		c[j]++;
		return;
		//goto R2;
	}
	else
	{
		j++;
		if (j<=t)
		{
			goto R4;
		}
	}
}

file name: makefile

myset : main.cpp myset.cpp myset.h revolve.cpp revolve.h
g++ -g main.cpp myset.cpp revolve.cpp -o myset.exe


mpi : mpi.cpp myset.cpp myset.h revolve.cpp revolve.h
mpic++ mpi.cpp myset.cpp revolve.cpp -o mpimyset.exe