SJ

ScaleJet AI

DATA TRANSFORMATION AT JET SPEED

Instructions & Pro Tips

Master the art of AI-powered data transformation

Everything you need to know to transform data like a pro

Quick Navigation

Getting Started

Basic Workflow

  1. Upload your CSV file - Drag and drop or click to browse. Files up to 10MB are supported.
  2. Preview your data - Review the columns and sample data to understand what you're working with.
  3. Describe transformations - Tell the AI what you want to do in plain English.
  4. Review results - Check the transformed data preview to ensure it meets your needs.
  5. Visualize data - Make sure you have things 100% dialled in.
  6. Download - Get your cleaned data with a custom filename.
Speed Up Future Processing: Save your transformations as recipes! Once you've perfected a transformation, save it as a recipe. This makes applying the same transformation to new files incredibly fast - perfect for recurring data processing tasks.

Pro Tips for Better Results

Writing Effective Instructions

✓ Good Example:
"Remove columns A, B, and C. Add a time column starting at 0 and incrementing by 0.001 seconds for each row. Rename 'Voltage_1' to 'U1' and 'Current_1' to 'I1'. Fill any missing values in electrical columns with 0."
✗ Poor Example:
"Clean the data and make it better."

Key Principles

  • Be Specific: Mention exact column names, values, and operations you want
  • Use Clear Language: "Remove column A" instead of "delete the first column"
  • Specify Data Types: "Keep time as numbers" or "convert to text format"
  • Mention Order: "Put time column first" or "arrange columns as: time, voltage, current"
  • Handle Missing Data: Specify what to do with empty cells - fill with 0, remove rows, etc.
Recipe Power User Tip: Create a library of recipes for different data sources. Name them clearly like "Yokogawa WT5000 Cleanup" or "Temperature Sensor Data Processing." This builds a personal toolkit of proven transformations.

Advanced Mode Benefits

  • Custom Prompts: Fine-tune the AI's behavior with your own instructions
  • Model Selection: Choose between GPT-4, GPT-4 Turbo, or GPT-3.5 based on your needs
  • Temperature Control: Lower values (0.1) for consistent results, higher for creative solutions
  • Token Limits: Adjust for simple (500 tokens) or complex (2000+ tokens) transformations
Electrical Data Warning: When working with electrical measurements, negative values are normal for AC systems. Always specify "preserve negative values" if your data contains voltage/current measurements.

Template Library

Here are proven transformation templates you can use as starting points:

Yokogawa WT5000
Clean up Yokogawa WT5000 electrical measurement data: - Remove first 5 columns (keep only the electrical measurement columns) - Add a time column starting at zero increasing in increments of 0.000001 seconds - Put the time column first - If electrical columns U1, U2, U3, I1, I2, I3 don't exist, create them with zero values - Final column order should be: time, U1, I1, U2, I2, U3, I3 - Keep time values as numbers (float), do not convert to strings
Time Series Data
Clean up time series data: - Add proper time column with sequential timestamps - Remove any empty or null rows - Interpolate missing values where appropriate - Sort data by timestamp - Ensure consistent time intervals
Basic Cleanup
Basic data cleanup: - Remove completely empty columns and rows - Remove columns that are mostly empty (>90% missing) - Standardize column names (remove spaces, special characters) - Fill remaining missing values with appropriate defaults - Remove duplicate rows
Sensor Data
Process sensor readings: - Convert timestamps to proper datetime format - Add sensor ID column if missing - Scale temperature readings from Celsius to Fahrenheit - Flag outlier readings beyond normal range - Group readings by hour and calculate averages
Template Customization: Use these templates as starting points, then modify them for your specific needs. Once you've perfected a variation, save it as your own recipe for instant reuse.

Batch Processing Mastery

Perfect Your Recipe First

  1. Use Single File Mode - Test your transformation on one representative file first
  2. Refine Until Perfect - Adjust instructions until you get exactly what you want
  3. Save as Recipe - Once perfect, save it with a descriptive name
  4. Switch to Batch Mode - Use your proven recipe to process hundreds of files
Batch Processing Secret: Recipes make batch processing lightning fast! The system remembers exactly how to transform your data, so subsequent files process almost instantly. This can turn a 30-minute job into a 30-second job.

Batch Best Practices

  • File Naming: Keep original filenames consistent for easier output organization
  • File Limits: Maximum 500 files per batch, 10MB per file
  • Quality Control: Check the processing summary for any failed files
  • Output Organization: Processed files are automatically renamed with "(cleaned)" suffix

Frequently Asked Questions

Q: Why should I save recipes instead of just retyping instructions?
A: Recipes provide massive speed improvements for repeated tasks. They ensure consistency and eliminate the need to regenerate transformations each time. Perfect for recurring data processing workflows.
Q: What file formats are supported?
A: Currently, only CSV files are supported. Files must be under 10MB for single processing, with up to 500 files allowed in batch mode.
Q: Can I see the code that was generated?
A: Yes! After processing, click "View Applied Transformations" to see the exact pandas code that was generated. This is helpful for learning and verification.
Q: What happens if the AI makes a mistake?
A: You can always review the preview before downloading. If results aren't correct, refine your instructions and try again. More specific instructions typically yield better results.
Q: When should I use Advanced Mode?
A: Use Advanced Mode when you need specific AI model settings, want to write custom prompts, or need fine control over the transformation process. Most users find Simple Mode sufficient.
Q: Are my files stored or shared?
A: Files are processed in memory and not permanently stored. Your data remains private and secure throughout the transformation process.
Q: How do I handle very large datasets?
A: For files larger than 10MB, split them into smaller chunks first. The system is optimized for quick processing of reasonably-sized datasets.

Performance Optimization

Speed Up Your Workflow

  • Use Recipes: Saved recipes process much faster than generating new transformations
  • Clear Instructions: Specific instructions lead to faster, more accurate results
  • Template Starting Points: Begin with templates and modify rather than starting from scratch
  • Batch Processing: Process multiple similar files together for maximum efficiency
Workflow Optimization: Create a systematic approach - perfect one file, save the recipe, then batch process the rest. This methodology works for any recurring data transformation task.