0 Construct Inverted Index1 using MapReduce2 inverted index code3 index index filecode\t2, 0.6020599913279624construct\t0, 0.6020599913279624
file\t3, 0.6020599913279624index\t0,0.12493873660829993index\t2,0.12493873660829993index\t3,0.24987747321659987inverted\t0,0.3010299956639812inverted\t2,0.3010299956639812mapreduce\t1, 0.6020599913279624using\t1, 0.6020599913279624
Proj1Guild the inverted index for a given set of documents (compute the termweights by TF-IDF as shown in slide 5 of Chapter 4, using base 10logarithm). Ignore the letter case, i.e., consider all words as lower case.Input files:Each line is in format of “DocID DOC”, where DocID is the ID of thedocument, and DOC is the document content (a list of terms). For example:Output:
If a term W is contained in N documents, the corresponding output for thisterm contains N lines, in format of: “W\tDocID1, weight1”, “W\tDocID2,weight2”, …, and “W\tDocIDn, weightn”. The DocIDs are sorted inascending order, and the term weights are of double precision (stored indouble/DoubleWritable). Given the example graph, the output file is like(assumed in one file):
Code format:Name your package as “proj1” and name your driver class as“Project1.java”. Your program should receive 3 parameters: the input folder,the output folder and the number of reducers. Finally, package all your javafiles as a zip file with name “InvertedIndex.zip”.Compile:Your java code will be compiled and packaged as a jar file, and we will usethe following commands to check the correctness of your solution:$$HADOOP_HOME/bin/hadoopjarYOURJAR.jarYOURCLASSinputoutput1
$$HADOOP_HOME/bin/hdfsdfs–catoutput/*Please ensure that the code you submit can be compiled and packaged. Anysolution that has compilation errors will receive no more than 5 points forthe entire assignment.Documentation and code readabilityYour source code will be inspected and marked based on readability andease of understanding. The documentation (comments of the codes) in yoursource code is also important. Below is an indicative marking scheme:Result correctness: 18Efficiency: 5Code structure, Readability, andDocumentation: