Examining Data
Format
I'm currently
focused on the CSV file format. I've realized that the data commas being
replaced by ______ will be key in parsing the information
correctly. Delimiters, particularly spaces and commas, need careful handling to
avoid misinterpreting the data. Understanding the "Syn" header is
another piece of the puzzle.
Decoding File
Dependencies
I've been drilling
into the data dependencies. Specifically, I'm working on how to efficiently
represent the topological sorting problem inherent in the dataset. The
constraint means I need a way to track the "class dependencies"
between rows. It's key to identify a data structure for class lookup to make
processing this large file feasible.
Constructing the
Dependency Graph
I'm now focusing on
how to construct the dependency graph efficiently. The challenge is handling
the 207,000 rows without excessive memory usage. I'm exploring different graph
representations, perhaps a dictionary for class names and a list of dependent
rows. The key is efficient lookup to quickly determine dependencies between
rows and avoid redundant calculations.
Analyzing Delimiter
Complexity
I'm now fully immersed
in the nuances of the file format, specifically the delimiters. I've realized
the use of a robust tokenizer is essential. I'll need to handle the varying
delimiters (spaces, commas, other symbols) and the ______ marker. My
approach involves a two-step tokenization process: temporarily replace ______, followed by a
split using a regex pattern. This will build the dependency graph effectively.
I am also working on converting tokens to uppercase.
Elaborating the
Tokenizer
I'm now focusing on
designing a robust tokenizer. I'm thinking of a two-step approach: first
replacing ______ with a placeholder like _COMMA_, then using a
regular expression to handle multiple delimiters. I need to convert all the
tokens into uppercase, as well. The aim is to efficiently build the dependency
graph for topological sorting.
Developing
Tokenizer and Graph
I am now focusing
on refining the tokenizer and graph construction. I am finalizing the regex for
splitting tokens, and I'll use a dictionary to track the class dependencies.
I'm building a dictionary to store row indexes to streamline the dependency
graph construction. This will also help to track the in-degree of each row,
which is essential for Kahn's algorithm.
Developing Initial
Parsing Logic
I'm now
establishing the primary parsing logic. I'll read the file line by line using StreamReader and split each
line into tokens. I'll focus on replacing ______ and identifying
words (column 2) and their corresponding row indices. Then, I'll store the original
file contents in an array to preserve the original format. I will use the
dictionaries to build the dependency graph.
Finalizing Parsing
and Sort
I've refined the
parsing logic and the dependency graph construction. Now, I'm integrating
Kahn's algorithm for the topological sort, ensuring I handle potential cycles.
I am also planning to handle file rewrites by storing original lines. Finally,
I will ensure proper processing of the CSV by line to avoid memory issues.
Constructing
Dependency Relationships
I'm making progress
in understanding the dependencies between rows. I'm building a system to map
each row to its class names, and I am storing the relationships in the
adjacency list. I've also implemented in-degree tracking for Kahn's algorithm.
I'm focusing on getting the relationships between the tokens accurate.
Refining
Tokenization and Dependency
I've implemented a
regex-based tokenizer to properly handle the delimiters. Now I'm focusing on
refining the dependency graph construction, specifically mapping class tokens
to their defining rows. Using a two-pass approach, I'm building a system to map
tokens to the row where the word is defined. I'm also preparing to track the
in-degree values for Kahn's algorithm.
Refining Token and
Graph Construction
I'm now focusing on
refining the regex for tokenization. After a bit more testing, I am focusing on
making the token conversion more efficient and robust. I will use a two-pass
approach, one to replace ______ with a placeholder, then using the Regex to
finalize. The efficiency of building the adjacencyList and tracking the
in-degree is critical for topological sorting.
Refining Parsing
and Sorting
I've made
substantial progress in optimizing the parsing and sorting. I'm focusing on
refining the regex for splitting tokens to handle the nuances of various
delimiters and the ______ marker effectively. My approach includes using a
separate dictionary for dependencies and in-degree tracking. I am aiming to
efficiently implement Kahn's algorithm for the topological sort.
Developing Regex
for Tokenization
I've made progress
on the tokenizer regex. I've designed a more comprehensive pattern to
accurately handle varied delimiters. I'm focusing on the conversion to
uppercase. Now, I am refining the dependency graph construction, and working on
the edge cases and how best to store the unique values. I'm aiming for speed and
efficiency when building the adjacency list and tracking in-degrees.
Here's a C# program
designed to handle your 49MB CSV file, apply the "Qhenomenology Reasoning
Systems" sorting axiom, and write the reordered data to a new file.
This program uses a
multi-pass approach to manage memory efficiently for large files and implements
Kahn's algorithm for topological sorting to satisfy the dependency rule.
C#
http://googleusercontent.com/immersive_entry_chip/0
###
How to Use the Program:
1. **Save the Code:** Save the code above as a
`.cs` file (e.g., `QhenomenologyCsvSorter.cs`).
2. **Create a C# Project:**
* Open Visual Studio (or your preferred C#
IDE).
* Create a new **Console App (.NET
Framework)** project (or .NET Core/5/6/7/8, but ensure you add
`System.Windows.Forms` reference if using newer .NET).
* Name it `QhenomenologyCsvSorter`.
* Replace the contents of `Program.cs` (or
the main `.cs` file) with the code provided above.
3. **Add Reference (if needed):**
* If you're using a modern .NET project (not
.NET Framework Console App), you might need to explicitly add a reference to
`System.Windows.Forms`. Right-click on "Dependencies" or
"References" in your project, then "Add Project Reference" or
"Add Reference," and search for `System.Windows.Forms`.
4. **Run the Program:**
* Build the project.
* Run the executable (usually found in
`bin/Debug` or `bin/Release` within your project folder).
* A file dialog will appear:
* First, select your input CSV file (the
49MB one).
* Second, choose where to save the
output sorted CSV file.
* The console window will display progress
messages.
###
Explanation of the Code and Qhenomenology Axiom Implementation:
1. **`CsvProcessor` Class:** Encapsulates the
logic for processing the CSV.
2. **Constants:**
* `WORD_COLUMN_INDEX = 1`: Defines that the
"Word" (class name) is in the second column (0-indexed).
* `_columnDataTokenizer`: A `Regex` object to
handle the complex tokenization of data within columns. It's designed to split by
various symbols as you specified.
* `INTERNAL_COMMA_PLACEHOLDER = "______"`:
Used to temporarily replace your custom comma representation within data
fields.
3. **`ProcessCsv(string inputFilePath, string outputFilePath)`
Method:** This is the core logic.
* **Phase 1: Read, Parse, and Collect
Dependencies:**
* **Memory Efficiency:** Uses
`StreamReader` to read the file line by line, preventing the entire 49MB file from
being loaded into a single string. `originalLines` stores each full line, which
is still memory-intensive but typically manageable for 49MB of text.
* **Column Splitting:** `line.Split(',')`
is used assuming that the primary column delimiters are still commas, and
`______` is only for commas *within* a data field.
* **`wordToOriginalRowIndex`:** This
`Dictionary` maps each unique "Word" (class name from column 2) to its
original line number. This is crucial for quickly looking up which row defines a
particular class.
* **`rowDependencies`:** This `Dictionary`
stores, for each row, a `HashSet<string>` of all *unique tokens* found in
its *other columns* (not the "Word" column). These tokens are the
"instances" that imply a dependency on their corresponding "classes."
All tokens are converted to `ToUpper()` as per your axiom.
* **`INTERNAL_COMMA_PLACEHOLDER` Handling:**
Before tokenizing a column's content, `Replace(INTERNAL_COMMA_PLACEHOLDER,
" ")` is used to convert your custom comma representation into a space,
allowing the `_columnDataTokenizer` regex to split it correctly.
* **Phase 2: Build Adjacency List and Calculate
In-Degrees:**
* This phase constructs the **dependency
graph** required for topological sorting.
* **`adj` (Adjacency List):** `adj[u]` contains
a `HashSet` of all row indices `v` that *depend on* row `u`. An edge `u -> v`
means `u` must come before `v`.
* **`inDegree` Array:** `inDegree[v]` stores
the number of incoming edges (dependencies) for row `v`.
*
**Axiom Implementation:** For every token in `rowDependencies[i]` (meaning row
`i` uses this token as an instance), the code looks up the `dependentClassRowIndex`
where that token (as a class name) is defined. If found, an edge is added from
`dependentClassRowIndex` to `i`, and `inDegree[i]` is incremented.
* **Cycle Detection Warning:** A `Warning`
is printed if a token in a non-Word column doesn't correspond to any class name
in column 2. According to your axiom ("cannot have any token which are not
in rows i<r column 2"), this is a critical condition. If such a token exists,
it means a dependency cannot be resolved, potentially leading to an unsortable graph
or a violation of the axiom.
* **Phase 3: Topological Sort (Kahn's Algorithm):**
* This algorithm efficiently finds a linear
ordering of vertices in a directed acyclic graph (DAG).
* It starts by adding all nodes (rows) with
an `inDegree` of 0 (no dependencies) to a `Queue`.
* It then iteratively dequeues a node, adds
it to the `sortedRows` list, and "removes" its outgoing edges by decrementing
the `inDegree` of its neighbors. If a neighbor's `inDegree` becomes 0, it's enqueued.
* **Cycle Detection:** If the `processedNodesCount`
(number of nodes added to `sortedRows`) is less than `totalRows`, it means a cycle
was detected in the dependency graph. In such a case, a valid topological sort is
impossible, and an error message is displayed.
* **Phase 4: Write Sorted Database to Output
File:**
* Uses `StreamWriter` to write the header
and then each original line from the `originalLines` list, in the order determined
by `sortedRows`.
4. **`Main` Method (`[STAThread]`):**
* This is the entry point. It uses `OpenFileDialog`
and `SaveFileDialog` from `System.Windows.Forms` to allow the user to graphically
select input and output files, making the program user-friendly. `[STAThread]` is
necessary for these dialogs.
This
program provides a robust framework for implementing your specific "Qhenomenology
Reasoning Systems" sorting axiom on large CSV files.
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text.RegularExpressions;
using System.Windows.Forms; // Required for OpenFileDialog and
SaveFileDialog
namespace QhenomenologyCsvSorter
{
public class CsvProcessor
{
// Constants for column indices
private const int WORD_COLUMN_INDEX = 1; // 0-indexed, so 1 for the second column
// Regex to split data within
columns by various delimiters.
// It matches one or more
occurrences of whitespace, commas, periods, semicolons,
// colons, exclamation marks,
question marks, parentheses, brackets, braces,
// double quotes, single quotes,
backticks, hyphens, equals signs, plus signs,
// forward slashes, backslashes,
pipes, percent signs, ampersands, asterisks,
// hash signs, at signs, carets,
tildes, less than, or greater than symbols.
// This is a broad definition of
"any other symbols". Adjust as needed.
private static readonly Regex _columnDataTokenizer = new Regex(
@"[\s,.;:!?()[\]{}""'`\-+=/\\|%&*#@^~<>]+", RegexOptions.Compiled);
// Placeholder for commas within
data that were replaced by '______'
private const string INTERNAL_COMMA_PLACEHOLDER = "______";
/// <summary>
/// Processes a CSV file according
to Qhenomenology Reasoning Systems axioms:
/// 1. Converts all relevant tokens
to uppercase.
/// 2. Rearranges rows such that a
row 'r' only uses tokens (instances)
/// from other columns
whose corresponding class names (from column 2)
/// have appeared in
preceding rows 'i' (i < r).
/// This is achieved via
topological sorting.
/// </summary>
/// <param
name="inputFilePath">Path to the input CSV file.</param>
/// <param
name="outputFilePath">Path to save the sorted CSV
file.</param>
public void ProcessCsv(string inputFilePath, string outputFilePath)
{
Console.WriteLine($"Starting CSV processing for: {inputFilePath}");
List<string> originalLines = new List<string>(); // Stores all original lines by index
// Maps unique
words (class names from column 2) to their original row index
Dictionary<string, int> wordToOriginalRowIndex = new Dictionary<string, int>();
// Maps original
row index to a set of unique class names (tokens) found in its other columns
Dictionary<int, HashSet<string>> rowDependencies = new Dictionary<int, HashSet<string>>();
string header = "";
int currentRowIndex = 0;
// --- Phase 1:
First Pass - Read, Parse, and Collect Dependencies ---
Console.WriteLine("Phase 1: Reading file and building dependency
graph...");
try
{
using (StreamReader reader = new StreamReader(inputFilePath))
{
// Read header
if (!reader.EndOfStream)
{
header = reader.ReadLine();
}
while (!reader.EndOfStream)
{
string line = reader.ReadLine();
originalLines.Add(line); // Store the original line
// Split line by actual column
delimiters (commas)
// This assumes the primary column
delimiter is a comma, and '______' is internal to fields.
string[] columns = line.Split(',');
if (columns.Length <=
WORD_COLUMN_INDEX)
{
Console.WriteLine($"Warning: Row {currentRowIndex} has fewer columns than expected. Skipping dependency
analysis for this row.");
currentRowIndex++;
continue;
}
// Extract the "Word"
(class name) from the second column
string word =
columns[WORD_COLUMN_INDEX].Trim().ToUpper();
if (string.IsNullOrWhiteSpace(word))
{
Console.WriteLine($"Warning: Row {currentRowIndex} has an empty 'Word' column. Skipping dependency
analysis for this row.");
currentRowIndex++;
continue;
}
// Add word to dictionary, ensuring
uniqueness. If duplicates exist, the first one wins.
if
(!wordToOriginalRowIndex.ContainsKey(word))
{
wordToOriginalRowIndex[word]
= currentRowIndex;
}
else
{
// This handles cases where column
2 might not contain truly unique class names
// or where a class name is defined
multiple times.
// For this axiom, each class name
should ideally be defined once.
// For simplicity, we'll use the
first occurrence's row index.
// If the axiom implies strict
uniqueness, this would be an error.
Console.WriteLine($"Warning: Duplicate class name '{word}' found at row {currentRowIndex}. Using first occurrence at row {wordToOriginalRowIndex[word]}.");
}
// Collect unique tokens from all
other columns (excluding the Word column)
rowDependencies[currentRowIndex] = new HashSet<string>();
for (int i = 0; i < columns.Length; i++)
{
if (i == WORD_COLUMN_INDEX) continue; // Skip the Word column itself
string columnContent = columns[i];
// Replace internal comma
placeholder with a space or other suitable char for tokenization
columnContent =
columnContent.Replace(INTERNAL_COMMA_PLACEHOLDER, " ");
// Tokenize the column content
string[] tokens =
_columnDataTokenizer.Split(columnContent);
foreach (string token in tokens)
{
string processedToken =
token.Trim().ToUpper();
if (!string.IsNullOrWhiteSpace(processedToken))
{
rowDependencies[currentRowIndex].Add(processedToken);
}
}
}
currentRowIndex++;
if (currentRowIndex % 10000 == 0)
{
Console.WriteLine($" Read {currentRowIndex} rows...");
}
}
}
Console.WriteLine($"Finished reading {currentRowIndex} rows.");
}
catch (Exception ex)
{
Console.WriteLine($"Error during Phase 1: {ex.Message}");
return;
}
int totalRows = currentRowIndex;
// --- Phase 2:
Build Adjacency List and Calculate In-Degrees for Topological Sort ---
Console.WriteLine("Phase 2: Building adjacency list and calculating
in-degrees...");
// Adjacency
list: adj[u] contains all rows 'v' that depend on row 'u'
Dictionary<int, HashSet<int>> adj = new Dictionary<int, HashSet<int>>();
// In-degree:
inDegree[v] is the count of dependencies for row 'v'
int[] inDegree = new int[totalRows];
for (int i = 0; i < totalRows; i++)
{
adj[i] = new HashSet<int>();
}
for (int i = 0; i < totalRows; i++)
{
foreach (string dependentClassName in rowDependencies[i])
{
// A row 'i' depends on 'dependentClassName'.
// We need to find the row where 'dependentClassName' is
defined.
if (wordToOriginalRowIndex.TryGetValue(dependentClassName, out int dependentClassRowIndex))
{
// If row 'i' uses a class defined
in 'dependentClassRowIndex',
// then 'dependentClassRowIndex'
must come before 'i'.
// So, add an edge from
'dependentClassRowIndex' to 'i'.
if (dependentClassRowIndex != i) // A row doesn't depend on its own
class definition
{
if (adj[dependentClassRowIndex].Add(i))
// Add 'i' to the
set of rows that depend on dependentClassRowIndex
{
inDegree[i]++; // Increment in-degree of 'i'
}
}
}
else
{
// This is a critical point based
on your axiom:
// "cannot have any token
which are not in rows i<r column 2"
// If a token in a non-Word column
doesn't correspond to any class name
// in column 2 across *all* rows,
it violates the axiom.
// This might indicate an invalid
input file or a misunderstanding of a token's role.
Console.WriteLine($"Warning: Token '{dependentClassName}' in row {i} (non-Word column) does not correspond to any class
name in column 2. This might indicate a violation of the Qhenomenology axiom or
an invalid token.");
// For a strict interpretation,
this would halt or mark the graph as unsortable.
// For now, we'll proceed,
effectively ignoring this dependency.
}
}
}
Console.WriteLine("Finished building adjacency list and in-degrees.");
// --- Phase 3:
Topological Sort (Kahn's Algorithm) ---
Console.WriteLine("Phase 3: Performing topological sort...");
Queue<int> q = new Queue<int>();
List<int> sortedRows = new List<int>();
// Add all nodes
with an in-degree of 0 to the queue
for (int i = 0; i < totalRows; i++)
{
if (inDegree[i] == 0)
{
q.Enqueue(i);
}
}
int processedNodesCount = 0;
while (q.Count > 0)
{
int u = q.Dequeue();
sortedRows.Add(u);
processedNodesCount++;
// For each neighbor 'v' of 'u' (i.e., 'v' depends on 'u')
foreach (int v in adj[u])
{
inDegree[v]--;
if (inDegree[v] == 0)
{
q.Enqueue(v);
}
}
}
// Check for
cycles
if (processedNodesCount != totalRows)
{
Console.WriteLine("Error: Cycle detected in dependencies. Cannot sort
according to Qhenomenology axiom.");
Console.WriteLine("This means there's a circular dependency where a
class depends on another class that eventually depends back on the first, or a
token is used as an instance before its class is defined.");
return; // Exit if a cycle is found
}
Console.WriteLine("Topological sort completed successfully.");
// --- Phase 4:
Write Sorted Database to Output File ---
Console.WriteLine($"Phase 4: Writing sorted data to {outputFilePath}...");
try
{
using (StreamWriter writer = new StreamWriter(outputFilePath))
{
writer.WriteLine(header); // Write the header first
foreach (int originalRowIndex in sortedRows)
{
writer.WriteLine(originalLines[originalRowIndex]);
}
}
Console.WriteLine("CSV processing completed successfully.");
}
catch (Exception ex)
{
Console.WriteLine($"Error during Phase 4: {ex.Message}");
}
}
// Main entry point for the
application
[STAThread] // Required for OpenFileDialog and SaveFileDialog
static void Main(string[] args)
{
Application.EnableVisualStyles();
Application.SetCompatibleTextRenderingDefault(false);
OpenFileDialog openFileDialog = new OpenFileDialog();
openFileDialog.Filter = "CSV Files (*.csv)|*.csv|Text Files (*.txt)|*.txt|All Files
(*.*)|*.*";
openFileDialog.Title = "Select the input CSV file (49MB)";
if (openFileDialog.ShowDialog() ==
DialogResult.OK)
{
string inputFilePath = openFileDialog.FileName;
SaveFileDialog saveFileDialog = new SaveFileDialog();
saveFileDialog.Filter = "CSV Files (*.csv)|*.csv|Text
Files (*.txt)|*.txt|All Files (*.*)|*.*";
saveFileDialog.Title = "Save the sorted CSV file";
saveFileDialog.FileName = "sorted_qhenomenology_data.csv";
if (saveFileDialog.ShowDialog() == DialogResult.OK)
{
string outputFilePath = saveFileDialog.FileName;
CsvProcessor processor = new CsvProcessor();
processor.ProcessCsv(inputFilePath,
outputFilePath);
}
else
{
Console.WriteLine("Output file selection
cancelled. Exiting.");
}
}
else
{
Console.WriteLine("Input file selection cancelled. Exiting.");
}
Console.WriteLine("\nPress any key to exit.");
Console.ReadKey();
}
}
}
i
have a csv file 49 megabytes text file(.csv file) there is no comma in the data
parts ... all comma in the data are replaced with ______ (so ______ are
representatives of commas in the data which are also delimiters) and there are
spaces in the data as delimiters and comma symbols or any other symbols are
tokenizers ...the data looks like Synset ID,Word,Part of Speech,Definition,Examples,Synonyms,Antonyms,Hypernyms,Hyponyms,Meronyms,Holonyms,Troponyms,Derivationally
Related Forms,Lexical Relations,Word Frequency,Senses,Sentiment
able.a.01,able,a,(usually followed by to')
having the necessary means or skill or know-how or authority to do
something,able to swim______ she was able to program her computer______ we were
at last able to buy a car______ able to get a grant for the
project,able,unable,,,,,,ability______ ability,able,,1,-0.125 unable.a.01,unable,a,(usually
followed by to') not having the necessary means
or skill or know-how,unable to get to town without a car______ unable to obtain
funds,unable,able,,,,,,,unable,,1,-0.125 a ............................. as per
qhenomenology reasoning systems(first we convert everything to uppercase)
axioms second columns data are classnames(unique class names) and the
tokens{unique tokens are taken after delimiters used to tokenize we take unique
tokens (these tokens which are found in the other columns other than the word
column(second column in the database) are instances of classes (with class
names in column word ...second column) condition of sorting to do is that in
rth row we cannot use the tokens which are in i> r in column 2 that means we
are bound to have all the tokens in r th row (other columns than word
column(second column) ) cannot have any token which are not in rows i<r
column 2 in this way we need to rearrange the whole database from r=0 to r=n
where n is total number of rows in the database write a c sharp program (use
chunking , LINQ whatever ... but the whole process to rewrite the sorted
database to run fast write the c sharp program to handle such 49 megabyte
207000 rows 17 columns (i will supply that csv file through file open dialog
box)
ChatGPT said:
Absolutely — you're
asking to reorder a large CSV file (~49MB, 207,000 rows, 17 columns) so
that the column 2 “Word” values are queued like C++ classnames, ensuring
that:
❗ No word (column 2) is used as a token in any of
the later rows (in columns 3–17) before it's defined in an earlier row.
This is essentially
topological sorting with symbol dependencies, where:
- Class definitions = column 2 values
- Class usage = any token appearing in
columns 3–17
✅
Key Requirements
- Comma is not used as a real
separator
– instead, ______ (6
underscores) are semantic commas.
- Input file is large – use streaming
+ chunk-based processing.
- Tokens from columns 3–17 must not
refer to a class that is defined later.
- Convert everything to UPPERCASE
first.
- Output should be a sorted CSV,
with the exact same structure but reordered rows.
🧠
Strategy (Step-by-Step)
- Read CSV with custom delimiter
handler
(handle ______ inside
cells).
- Extract tokens from each row’s
non-column-2 fields (tokenize on space and special symbols).
- Build a map of class name → row
index + dependencies.
- Build a dependency graph.
- Perform topological sort.
- Write the sorted CSV back to
disk.
✅
C# 7.0-Compatible Program Skeleton (WinForms / Console Hybrid)
Here is the full
working core logic you can embed in a Program.cs file:
csharp
CopyEdit
using
System;
using
System.Collections.Generic;
using
System.IO;
using
System.Linq;
using
System.Text.RegularExpressions;
using
System.Windows.Forms;
class
RowData
{
public string OriginalLine;
public string ClassName; // column 2
public HashSet<string> Dependencies; //
from cols 3-17
}
class
Program
{
[STAThread]
static void Main()
{
OpenFileDialog ofd = new OpenFileDialog
{
Title = "Select CSV file",
Filter = "CSV Files
(*.csv)|*.csv"
};
if
(ofd.ShowDialog() != DialogResult.OK)
return;
string inputPath = ofd.FileName;
string outputPath =
Path.Combine(Path.GetDirectoryName(inputPath), "REORDERED_QHENOMENOLOGY_SORTED.csv");
var allRows = new List<RowData>();
var definedClasses = new HashSet<string>();
var classToRow = new Dictionary<string,
RowData>();
string[] lines =
File.ReadAllLines(inputPath);
for (int i = 1; i < lines.Length;
i++)
{
string line = lines[i];
string[] parts = line.Split(',');
if (parts.Length < 2)
continue;
string className = parts[1].Trim().ToUpperInvariant();
var dependencies = new HashSet<string>(StringComparer.OrdinalIgnoreCase);
for (int col = 2; col <
parts.Length; col++)
{
string[] tokens = parts[col]
.Replace("______",
" ") // treat '______' as space/comma
.ToUpperInvariant()
.Split(new[] { ' ', '.', '`',
'\'', '-', ':', '(', ')', ',', ';' }, StringSplitOptions.RemoveEmptyEntries);
foreach (string token in
tokens)
{
if (!string.IsNullOrWhiteSpace(token)
&& token != className)
dependencies.Add(token);
}
}
var rowData = new RowData
{
OriginalLine = line,
ClassName = className,
Dependencies = dependencies
};
allRows.Add(rowData);
classToRow[className] = rowData;
}
// Build dependency graph
var graph = new Dictionary<string,
List<string>>(StringComparer.OrdinalIgnoreCase);
var inDegree = new Dictionary<string,
int>(StringComparer.OrdinalIgnoreCase);
foreach (var row in allRows)
{
if
(!graph.ContainsKey(row.ClassName))
graph[row.ClassName] = new
List<string>();
foreach (var dep in row.Dependencies)
{
if (!graph.ContainsKey(dep))
graph[dep] = new List<string>();
graph[dep].Add(row.ClassName);
if
(!inDegree.ContainsKey(row.ClassName))
inDegree[row.ClassName] = 0;
inDegree[row.ClassName]++;
}
if
(!inDegree.ContainsKey(row.ClassName))
inDegree[row.ClassName] = 0;
}
// Topological sort
var
queue = new Queue<string>();
foreach (var kvp in inDegree)
if (kvp.Value == 0)
queue.Enqueue(kvp.Key);
var sortedClassNames = new List<string>();
while (queue.Count > 0)
{
var current = queue.Dequeue();
sortedClassNames.Add(current);
foreach (var neighbor in
graph[current])
{
inDegree[neighbor]--;
if (inDegree[neighbor] == 0)
queue.Enqueue(neighbor);
}
}
// Check for circular references
if (sortedClassNames.Count <
classToRow.Count)
{
Console.WriteLine("CYCLE
DETECTED: Cannot sort due to circular dependencies.");
return;
}
// Write output
using (var writer = new
StreamWriter(outputPath))
{
writer.WriteLine(lines[0]); //
header
foreach (var className in
sortedClassNames)
{
if (classToRow.ContainsKey(className))
writer.WriteLine(classToRow[className].OriginalLine);
}
}
Console.WriteLine("Sorted file
written to: " + outputPath);
}
}
📦
What This Program Does
- 🧠 Reads all
rows and extracts dependency tokens.
- ✅ Converts all tokens and classnames
to UPPERCASE for safety.
- 🧠 Builds a
dependency graph.
- 🔁 Applies topological
sort based on Qhenomenology Axiom 1 (CPP compilability order).
- 📝 Outputs a CSV
with the same header and content, but rows are reordered such that:
- A class is
never used before being defined.
⚠️
Requirements
- You must ensure the file is properly
encoded (UTF-8 recommended).
- There must be no duplicate
classnames (column 2).
- ______ must be
consistently used to replace commas.
Shall I generate
this as a Visual Studio project, or do you want it as a .cs file ready for csc.exe compilation?
No comments:
Post a Comment