I want to input a List<string[]>
and
The output is a dictionary where the keys are unique strings used for an index and the values is an array of floats with each position in the array representing the count of the key for a string[]
in the List<string[]>
So far here is what I attempted
static class CT
{
//Counts all terms in array
public static Dictionary<string, float[]> Termfreq(List<string[]> text)
{
List<string> unique = new List<string>();
foreach (string[] s in text)
{
List<string> groups = s.Distinct().ToList();
unique.AddRange(groups);
}
string[] index = unique.Distinct().ToArray();
Dictionary<string, float[]> countset = new Dictionary<string, float[]>();
return countset;
}
}
static void Main()
{
/* local variable definition */
List<string[]> doc = new List<string[]>();
string[] a = { "That", "is", "a", "cat" };
string[] b = { "That", "bat", "flew","over","the", "cat" };
doc.Add(a);
doc.Add(b);
// Console.WriteLine(doc);
Dictionary<string, float[]> ret = CT.Termfreq(doc);
foreach (KeyValuePair<string, float[]> kvp in ret)
{
Console.WriteLine("Key = {0}, Value = {1}", kvp.Key, kvp.Value);
}
Console.ReadLine();
}
I got stuck on the dictionary part. What is the most effective way to implement this?
It sounds like you could use something like:
var dictionary = doc
.SelectMany(array => array)
.Distinct()
.ToDictionary(word => word,
word => doc.Select(array => array.Count(x => x == word))
.ToArray());
In other words, first find the distinct set of words, then for each word, create a mapping.
To create a mapping, look at each array in the original document, and find the count of the occurrences of the word in that array. (So each array maps to an int
.) Use LINQ to perform that mapping over the whole document, with ToArray
creating an int[]
for a particular word... and that's the value for that word's dictionary entry.
Note that this creates a Dictionary<string, int[]>
rather than a Dictionary<string, float[]>
- it seems more sensible to me, but you could always cast the result of Count
to float
if you really wanted to.
See more on this question at Stackoverflow